Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
machine_learning_roadmap [2023/10/23 16:55] – demiurge | machine_learning_roadmap [2023/10/23 17:02] (current) – [Learn Advanced Python] demiurge | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | # Machine Learning Roadmap | + | ====== |
- | So you want to *learn* | + | So you want to *learn* Machine Learning? It will be a long journey - one that requires a solid grasp of the fundamentals. Try and not skip any of the stages, and move on to the next once you have a full understanding of the current one. Good luck! |
- | --- | + | ===== Mathematics and Calculus ===== |
- | [TOC3] | + | |
- | --- | + | |
- | ### Mathematics and Calculus | + | ==== Linear Algebra ==== |
- | #### 1. Linear Algebra | + | This is what essentially provides the mathematical framework for understanding and manipulating vectors and matrices, which are the building blocks of almost any ML algorithm. A full grasp of these concepts is **essential**. As always, [Khan Academy]([[https:// |
- | This is what essentially provides the mathematical framework for understanding and manipulating vectors and matrices, which are the building blocks of almost any ML algorithm. A full grasp of these concepts is **essential**. As always, [Khan Academy](https:// | + | |
- | 1. [Vectors and Spaces](https:// | + | [Vectors and Spaces]([[https:// |
- | 2. [Matrices](https:// | + | |
- | #### 2. Calculus | + | [Matrices]([[https://www.khanacademy.org/math/precalculus/ |
- | Calculus, and particularly derivatives and gradients, play a key role in optimization algorithms used in ML. You will rely on Calculus for optimization techniques such as [gradient descent](https://en.wikipedia.org/wiki/Gradient_descent), | + | |
- | 1. [Integrals](https:// | + | [Matrix Transformations]([[https:// |
- | 2. [Differential Equations](https:// | + | |
- | 3. [Application of Integrals](https:// | + | ==== Calculus ==== |
- | 4. [Parametric equations, polar coordinates, | + | |
- | 5. [Series](https:// | + | Calculus, and particularly derivatives and gradients, play a key role in optimization algorithms used in ML. You will rely on Calculus for optimization techniques such as [gradient descent]([[https:// |
- | 6. [Gradients](https:// | + | |
+ | [Integrals]([[https:// | ||
+ | |||
+ | [Differential Equations]([[https:// | ||
+ | |||
+ | [Application of Integrals]([[https:// | ||
+ | |||
+ | [Parametric equations, polar coordinates, | ||
+ | |||
+ | [Series]([[https:// | ||
+ | |||
+ | [Gradients]([[https:// | ||
+ | |||
+ | ==== Probability and Statistics ==== | ||
- | #### 3. Probability and Statistics | ||
Another essential building-block. Probability theory provides a math framework for quantifying *uncertainty*. In ML, models often need to make predictions or decisions based on incomplete or noisy data. With probability, | Another essential building-block. Probability theory provides a math framework for quantifying *uncertainty*. In ML, models often need to make predictions or decisions based on incomplete or noisy data. With probability, | ||
- | 1. [The Entire Khan Academy Statistics and Probability course](https:// | + | 1. [The Entire Khan Academy Statistics and Probability course]([[https:// |
- | >You can take only the lessons you think might be important and then take the Course Challenge. | + | 2. Discrete and continuous probability distributions: |
- | 2. Discrete and continuous probability distributions: | + | |
- | 3. [Bayesian Statistics](https:// | + | |
That should probably be enough for Math. I might' | That should probably be enough for Math. I might' | ||
- | ### | + | ===== Programming |
The current programming language dominating the ML community is **Python**. Not surprising, since the ease of use allows you to focus on writing efficient code without needing to spend too much time learning the intricacies of the language' | The current programming language dominating the ML community is **Python**. Not surprising, since the ease of use allows you to focus on writing efficient code without needing to spend too much time learning the intricacies of the language' | ||
- | ##### Learn Python Basics | ||
- | The `roadmap.sh` [Python Developer roadmap](https:// | ||
- | 1. Learn the Basic Syntax and Data Types | + | ==== Learn Python |
- | You'll need to familiarize yourself with Python's syntax, variables, data types (integers, floats, strings, lists, dicts), and basic operations (arithmetic, | + | |
- | 2. Control Flow | + | |
- | Understand conditional statements (`if`, `elif`, `else`), loops (`for`, `while`), and logical operators (`and`, `or`, `not`). Very important for implementing decision-making and repetition in your code. | + | |
- | 3. Functions and modules | + | |
- | Learn how to define and use functions to encapsulate reusable blocks of code. Also, you'll need to understand how to import and utilize modules (libs). | + | |
- | 4. Data Structures and Manipulation | + | |
- | Get yourself acquainted with fundamental data structures like lists, tuples, sets, and dictionaries. Learn how to manipulate and transform data. | + | |
- | 5. NumPy | + | |
- | A fundamental library for scientific computing in Python. You will need to gain proficiency in using NumPy arrays for efficient numerical computations. | + | |
- | 6. Pandas | + | |
- | You will often need Pandas DataFrames to clean, transform, filter, aggregate, and analyze your datasets. | + | |
- | 7. Plotting and Data Visualization | + | |
- | Become familiar with libraries such as [Matplotlib](https:// | + | |
- | ##### Learn Advanced Python | + | `roadmap.sh` [Python Developer roadmap]([[https:// |
+ | |||
+ | 1. Learn the Basic Syntax and Data Types You'll need to familiarize yourself with Python' | ||
+ | |||
+ | |||
+ | ==== Learn Advanced Python | ||
At this stage, you'll be sufficiently familiar with Python and ready to tackle the ML aspects of Python. Very exciting. | At this stage, you'll be sufficiently familiar with Python and ready to tackle the ML aspects of Python. Very exciting. | ||
- | 1. Machine Learning Libraries | + | 1. Machine Learning Libraries Explore the popular ML libraries, such as [PyTorch]([[https:// |
- | Explore the popular ML libraries, such as [PyTorch](https:// | + | |
- | 2. Object-Oriented Programming (OOP) | + | 2. Object-Oriented Programming (OOP) Get yourself comfortable with the principles of OOP, including classes, objects, inheritance, |
- | Get yourself comfortable with the principles of OOP, including classes, objects, inheritance, | + | |
### Machine Learning Concepts | ### Machine Learning Concepts | ||
- | At this point, you can follow whatever ML course you're comfortable with. A popular recommendation is [fastai](https:// | + | At this point, you can follow whatever ML course you're comfortable with. A popular recommendation is [fastai]([[https:// |
+ | |||
+ | 1. **Supervised Learning** This [Coursera]([[https:// | ||
+ | |||
+ | - Classification: | ||
+ | - Regression: Predicting continuous values. | ||
+ | |||
+ | 2. **Unsupervised Learning** | ||
+ | |||
+ | - Clustering: Grouping similar data points together. | ||
+ | - Dimensionality Reduction: Reducing the number of input features while preserving important information. | ||
+ | - Anomaly Detection: Identifying rare of abnormal instances in the data. | ||
+ | |||
+ | 3. **Reinforcement Learning** | ||
+ | |||
+ | 4. **Linear Regression** | ||
+ | |||
+ | - Understanding linear regression models and assumptions. | ||
+ | - Cost functions, including mean squared error. | ||
+ | - Gradient descent for parameter optimization. | ||
+ | - Evaluation metrics for regression models. | ||
+ | |||
+ | 5. **Logistic Regression** | ||
+ | |||
+ | - Modeling binary classification problems with logistic regression. | ||
+ | - Sigmoid function and interpretation of probabilities. | ||
+ | - Maximum likelihood estimation and logistic loss. | ||
+ | - Regularization techniques for logistic regression. | ||
+ | |||
+ | 6. **Decision Trees and Random Forests** | ||
+ | |||
+ | - Basics of decision tree learning. | ||
+ | - Splitting criteria and handling categorical variables. | ||
+ | - Ensemble learning with random forests. | ||
+ | - Feature importance and tree visualization. | ||
+ | |||
+ | 7. **Support Vector Machines (SVM)** | ||
+ | |||
+ | - Formulation of SVMs for binary classification. | ||
+ | - Kernel trick and non-linear decision boundaries. | ||
+ | - Soft margin and regularization in SVMs. | ||
+ | - SVMs for multi-class classification. | ||
+ | |||
+ | 8. **Clustering** | ||
- | 1. **Supervised Learning** | + | - Overview of unsupervised learning and clustering. |
- | This [Coursera](https:// | + | - K-means clustering algorithm and initialization methods. |
- | - Classification: | + | - Hierarchical clustering and density-based clustering. |
- | - Regression: Predicting continuous values. | + | - Evaluating clustering performance. |
- | 2. **Unsupervised | + | 9. **Neural Networks and Deep Learning** |
- | This [course](https:// | + | |
- | - Clustering: Grouping similar data points together. | + | |
- | - Dimensionality Reduction: Reducing | + | |
- | - Anomaly Detection: Identifying rare of abnormal instances in the data. | + | |
- | 3. **Reinforcement | + | - [Deep Learning |
- | Coursera provides this [course](https://www.coursera.org/specializations/reinforcement-learning) on Reinforcement Learning, which should be a good starting point. | + | |
+ | | ||
+ | - [Three Mechanisms of Weight Decay Regularization]([[https:// | ||
+ | - [Layer Normalization]([[https:// | ||
+ | - [Attention Is All You Need]([[https:// | ||
- | 4. **Linear Regression** | + | 10. **Evaluation and Validation** Read the following papers: |
- | This [resource](https:// | + | |
- | - Understanding linear regression models and assumptions. | + | |
- | - Cost functions, including mean squared error. | + | |
- | - Gradient descent for parameter optimization. | + | |
- | - Evaluation metrics for regression models. | + | |
- | 5. **Logistic Regression** | + | - [Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models]([[https://arxiv.org/abs/1806.07139|https://arxiv.org/ |
- | Read through | + | - [Leave-One-Out Cross-Validation |
- | - Modeling binary classification problems with logistic regression. | + | |
- | - Sigmoid function and interpretation of probabilities. | + | |
- | - Maximum likelihood estimation and logistic loss. | + | |
- | - Regularization techniques | + | |
- | 6. **Decision Trees and Random Forests** | + | And [this HuggingFace guide]([[https://huggingface.co/docs/evaluate/ |
- | This incredible resource by [Jake VanderPlas](https://jakevdp.github.io/PythonDataScienceHandbook/05.08-random-forests.html) should be extremely useful. Main outtake are: | + | |
- | - Basics of decision tree learning. | + | |
- | - Splitting criteria and handling categorical variables. | + | |
- | - Ensemble learning with random forests. | + | |
- | - Feature importance and tree visualization. | + | |
- | 7. **Support Vector Machines (SVM)** | + | 11. **Feature Engineering and Dimensionality Reduction** Take a look at [this article]([[https:// |
- | Read through | + | |
- | - Formulation of SVMs for binary classification. | + | |
- | - Kernel trick and non-linear decision boundaries. | + | |
- | - Soft margin and regularization in SVMs. | + | |
- | - SVMs for multi-class classification. | + | |
- | 8. **Clustering** | + | - [Beyond One-hot Encoding: lower dimensional target embedding]([[https://arxiv.org/abs/1806.10805|https://arxiv.org/ |
- | Read through this excellent | + | - [A Tutorial on Principal Component Analysis]([[https:// |
- | - Overview of unsupervised learning and clustering. | + | |
- | - K-means clustering algorithm and initialization methods. | + | |
- | - Hierarchical clustering and density-based clustering. | + | |
- | - Evaluating clustering performance. | + | |
- | 9. **Neural Networks | + | 12. **Model Selection |
- | The heart of the matter. Read through the papers for each: | + | |
- | - [Deep Learning | + | |
- | - [An Introduction | + | |
- | - [Recurrent Neural Networks (RNNs): A gentle Introduction and Overview](https:// | + | |
- | - [Three Mechanisms of Weight Decay Regularization](https:// | + | |
- | - [Layer Normalization](https:// | + | |
- | - [Attention Is All You Need](https:// | + | |
- | 10. **Evaluation and Validation** | + | - Grid search, random search, and Bayesian optimization for hyperparameter tuning. |
- | Read the following papers: | + | - Model selection techniques, including nested cross-validation. |
- | - [Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models](https:// | + | - Overfitting, |
- | - [Leave-One-Out Cross-Validation for Bayesian Model Comparison in Large Data](https:// | + | - Performance comparison of different |
- | And [this HuggingFace guide](https:// | + | |
- | 11. **Feature Engineering and Dimensionality Reduction** | + | ### Train your own model! You're now ready to pre-train your own model, or fine-tune an existing one! For this, you should |
- | Take a look at [this article](https://towardsdatascience.com/feature-selection-and-dimensionality-reduction-f488d1a035de) for a general oveeview. | + | |
- | Also read these papers: | + | |
- | - [Beyond One-hot Encoding: lower dimensional target embedding](https://arxiv.org/abs/1806.10805) | + | |
- | - [A Tutorial on Principal Component Analysis](https://arxiv.org/abs/1404.1100) | + | |
- | 12. **Model Selection and Hyperparameter Tuning** | + | The [HuggingFace Transformers docs]([[https:// |
- | This is where you're finally dabbling in model training. Good job! You will need to learn: | + | |
- | - Grid search, random search, and Bayesian optimization for hyperparameter tuning. | + | |
- | - Model selection techniques, including nested cross-validation. | + | |
- | - Overfitting, | + | |
- | - Performance comparison of different models. | + | |
- | ### Train your own model! | + | ### Stay Updated and Engage in ML community At this point, you know all the essentials. ML is an ever-advancing field, with new innovations emerging everyday. |
- | You're now ready to pre-train your own model, or fine-tune an existing one! For this, you should look into [Transformers](https://github.com/ | + | |
- | The [HuggingFace Transformers docs](https:// | ||
- | ### Stay Updated and Engage in ML community | ||
- | At this point, you know all the essentials. ML is an ever-advancing field, with new innovations emerging everyday. You'll need to stay abreast of the latest developments, |