My ML Classification and Regression Cheatsheet

Madhumita Menon
3 min readApr 15, 2021

I have been spending a good chunk of my past few weeks trying to grasp the concepts of machine learning. Half way through the algorithms, right as I started with Unsupervised learning, I started having doubts about Supervised Learning, about Classification algorithms, about Regression algorithms. (Is this procrastination?). Anyway, now that I missed half of the video that taught about unsupervised learning, I figured out that I can’t really recall which of the algorithms were for classification. (I could barely even recall their names! linear, logistic, KNN and…. nevermind, forgot it again :( ). Fast-forward to today, I have decided to write a cheatsheet for you (for me?). So here it is.

Photo by NeONBRAND on Unsplash

First and the foremost, Classification and Regression models are Supervised Learning Models! Most beginners miss this point which causes them a lot of confusion. (confession: me too).

Classification Vs Regression:
→ Classification involves predicting discrete categories or classes. It divides the dataset into different categories and labels them accordingly.
→ Regression involves predicting continuous, real-value quantities. It investigates the relationship between dependent and independent variables. To pick the right predictive model, understanding what kind of situation at hand is very essential. Once you can decide on the type of problem you are dealing with, you are half way there.

Often we tend to forget what models belong where. So here’s a list that you can remember…

Classification Models:

  1. Logistic Regression (ironic, isn’t it? Some ML engineers argue that this model is a regression model, but in my opinion it belongs right here. What do you think?):
    It utilizes the power of regression to perform classification. It is applied wherever a binary, yes or no (male/female) output is expected. Logistic Regression is used when your dependent variable can take only two values.
  2. Decision Trees:
    Tree models are used when there is a high non linearity and a complex relationship between dependent and independent variables.
  3. Random Forests:
    They are a collection of Decision Trees.
  4. Naive Baye’s:
    This model is based on Naive Baye’s probability rule. It can be used when the available dataset is small and the input variables are categorical variables rather than numerical variables.
  5. K Nearest Neighbors:
    KNN is used in visual pattern identification. Like scanning, detecting and billing an item in the shopping cart (automated billing practice) as in the case of Amazon Go Stores (Do google this if you don’t know because, man! this is so good for an introvert XD).
  6. Support Vector Machines:
    It is used for two-group classification problems. SVMs are used it text classification problems where keywords are to be searched ( whether a certain word is present or not)

Regression Models:

  1. Linear Regression (This is usually the only regression model that beginners learn, but I’m gonna throw in a few more) :
    Used in evaluating trends and sales estimates, analyzing impact of price changes and studying financial services and insurance domain.
  2. Polynomial Regression
  3. Stepwise Regression
  4. Ridge Regression

So this is basically my compilation of the various Supervised Learning models a beginner might need to learn. Please note that a lot of these can also be used with the opposite category and what I have mentioned is what it more popularly is used for (Random Forests, SVMs etc.). Also, consider this more as a revision sheet that you can refer to after doing an in-depth tutorial on Machine Learning, and not as a quick (and lazy) tutorial.

so here I am signing off!

--

--