d153: Machine Learning & Data Science Skills You Need To Get Hired

Source: Machine Learning & Data Science Skills You Need To Get Hired In Fortune 500 Companies


  1. Proficient in querying and manipulating large data sets for analytical purposes using SQL-like languages (Hive / Impala)

  2. Apache Ecosystem – Hadoop, Hadoop File System (HDFS), MapReduce/YARN (Yet Another Resource Negotiator), Hive (Data warehouse infrastructure), HBase (Distributed Column-oriented NoSQL Database), Oozie Workflow, Sqoop Data Ingestion, Zookeeper, Pig Scripting, Ambari (Hadoop Clusters Management Platform), Spark (Big Data Processing Engine), Flink (Streaming dataflow / analytics engine), Storm (Real-time data processing), Flume (Log data processing), Avro (Data serialization)

  3. Machine learning techniques such as Neural networks, Hidden Markov Model (HMM), Maximum entropy models and other popular algorithms

  4. Feature engineering and statistical modeling methods such as Conditional Random Field (CRF), HMM, Support Vector Machine (SVM), Gradient Boosting Decision Tree(GBDT) etc.

  5. Statistical methods such as Categorical Data Analysis, Multivariate Analysis, Regression Analysis, Survey Sampling Design, Survival/Reliability analysis, Design of experiments, Analysis of variance.

  6. Building machine learning systems for modern parallel-computing environments (GPU, Multicore Symmetric Multiprocessing (SMP), Distributed Clusters); CUDA kernels

  7. Machine learning frameworks such as Caffe, Theano, Torch, TensorFlow, MXNet, Apache Mahout, Spark MLlibscikit-learn, scipy, numpy; Amazon Machine Learning

  8. Convolutional Neural Networks (CNN), Recurrent Neural Network(RNN), Supervised and Unsupervised learning, and optimization techniques

  9. Traditional/Modern statistical techniques, including SVM, Regularization, Boosting, Random Forests, and other Ensemble Methods

  10. Natural language processing(NLP) problems, including predictive typing, input method conversion, tokenization, tagging, language modeling, language identification, sentiment analysis, named entity recognition, lemmatization, summarization

  11. Building solutions for spell corrections, related searches, synonym/acronym expansions, query rewrites, metrics accumulation, spam prevention, ranking, and recommendations

  12. Proficiency in predictive modeling and data mining tools such as SQL, R, SAS, JMP, Python, Watson, and Aster

  13. Experience with data visualization tools such as D3.js, Tableau, Qlikview etc.

  14. Familiarity with commercial ETL platforms like Informatica, SSIS, Talend, etc

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.