Machine Learning Tools or Concepts? where to start? Part 2

Mohamed Elrefaey
3 min readJan 11, 2021

In the previous article, we covered the math part of learning machine learning. in this article, we will be covering the data science part that will lead to regression understanding and some other topics in machine learning.

Once you understood the math parts, you will have the fundamentals that help you understand machine learning better in a decent way.

You need to do the following as a next good step to learn more about data science:

Data Science:

You need to cover the essential techniques to process data within the context of ML pipeline, such techniques include data collection, data analysis and exploration, processing, feature extraction or engineering and then try to use the data to train, evaluate and tune your machine learning models (one of them is regression as has been asked in the beginning).

You want also to see how to visualize data patterns and use statistical analysis methods to detect and resolve model overfitting. at this point of time, Python will be a very helpful language, specially the standard machine learning and data processing libraries comes with it, like SciPy, scikit-learn etc.

So, to summarize this part, you need to:

  • Understand the end-to-end ML process and the role data plays in each step.
  • Be able to effectively use techniques such as sampling, synthetic data creation, and human labeling to collect data for training and evaluating models.
  • Be able to apply appropriate visual and analytical techniques to explore data, analyze model quality, and debug issues such as overfitting.
  • Understand different metrics that can be used for evaluating models including accuracy, precision, recall, F1, and AUC.
  • Be able to use cross-validation to evaluate and select among competing models.
  • Understand how to use a variety of techniques to tune models while combating issues such as overfitting, e.g., regularization, grid-search.
  • Be able to apply data preprocessing techniques such as imputing missing values, standardization, and text cleaning to improve the effectiveness of training.
  • Have a range of feature engineering techniques you can apply to improve generalization including feature selection, feature extraction, and aggregation statistics for vectors and text.

How do you feel now! ? :)

At this point of time, if you want to read a book, to understand the above topics, one of the best books (very organized and easier to read) to read is: “Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2” by Sebastian RaschkaNote: the latest edition (3rd) has been improved compared to the previous ones.

After you are done with this book, and you started to build models and get your hands-on dirty with python and other machine learning tools, you will be ready to learn more about a very interesting topic, which is the linear and logistic regression.

Summary: in this part, we covered the basic and most needed topics as well as tools that helps you play with machine learning and start building some basic models. In the next article, we will talk about techniques mainly used in the industry and academia, which you need to understand them carefully from a mathematical point of view, the linear and logistic regressions. Keep tuned for the next article!

--

--

Mohamed Elrefaey

Pioneering tech visionary: 18+ years in software at Intel, Orange Labs, and Amazon, 5+ US patents, AI enthusiast, shaping the future of smart technology.