Classification and Regression – Both the techniques are part of supervised machine learning. Principally both of them have one common goal i.e. to make predictions or take decision by using the past data as underlined foundations. There is one major difference as well; classification predictive output is a label and for regression its a quantity.
Machine Learning – Basics
AILabPage defines machine learning as “A focal point where business, data, experience meets emerging technology and decides to work together”.
ML instructs an algorithm to learn for itself by analysing data. Algorithms here learn a mapping of input to output, detection of patterns or by reward. The more data it processes, the smarter the algorithm gets.
Thanks to statistics, machine learning became very famous in 1990s. Machine Learning is about the use and development of fancy learning algorithms. The intersection of computer science and statistics gave birth to probabilistic approaches in AI. This shifted the field further toward data-driven approaches. Data science is more about the extraction of knowledge (KDD) from data through algorithms to answer particular question or solve particular problems.
In other words Machine learning algorithms “learns” from the observations. When exposed to more observations, the algorithm improves its predictive performance. You can follow below post for more details on Machine Learning.
Classification – Class Of An Object
In classification, predictions are made by classifying output into different categories. Classification outputs falls into discrete categories hence algorithms used here are for desired output as discrete labels.
Classification made business life easy as output or predictions are set of possible finite values i.e. outcomes. Some of the useful examples in this category are
- Determining an email is spam or not
- Out come can be binary classification / logistic regression
- Something is harmful or not
- Rain fall tomorrow or no rain fall.
Segmenting business customers, audio and video classification, and text processing for sentiment analytics are few more examples where we get multiple labels as output. Beside logistic regression the most famous algorithm in the method are K-nearest neighbours and decision tree.
Regression – Numbers Game
The system attempts to predict a value for an input based on past data. Regression outputs are real-valued numbers that exist in a continuous space. It’s very useful method for predicting outputs that are continuous. In regression predictive model; the predictions comes out as quantity so performance of the model is evaluated at error margin level i.e errors in predictions.
One of the most used and underrated for its simplicity algorithm here is linear regression. As its simple to understand and use; it has gained highest popularity. Linear regression is extremely versatile method that can be used for predicting
- Temperature of the day or in an hour
- Likely housing prices in an area
- Likelihood of customers to churn
- Revenue per customer.
One of the another good example for regression problem is when we have time-ordered inputs or so-called time-series data for forecasting problems. In simple terms, regression is useful for predicting outputs that are continuous. The predictive model here is kind of task for approximating a mapping function for mapping input to continuous output variable.
Function Approximation in Classification and Regression
Finding the best function to map inputs variables to output variables is the main task for any machine learning algorithm.
In both the cases i.e. Classification and Regression function approximation is different hence are two different tasks.
Why function are important; you can read How Machine Learning Algorithms Work to know more details.
After finding best model with best algorithm as underline foundation it becomes super simple to get best mapping function. System resources, time in hand, amount of data and quality of data are critical part for algorithm of choice.
The end goal is to build predictive modelling which is built on past data to find answers from new data or future data. This work as basic mathematical problem.
Classification vs Regression
As clear with above explanations of classification and regression definitions, these two methods used in supervised learning depending upon input variable mapped to output values/labels.
Example– Imagine there is need to launch a product by your company and as head of product roll out you want to do some research and analytics work to know whether product will be successful or not. What is required here.
Data of similar products running in market and data of similar products failed and went out of market including
- Price charged
- Marketing budget
- Competitor price
- Target customer segmentation i.e education level, earnings, age, home & work address, any buying habit etc
- Market size
There could be several other variables to add for making our predictive model error free and successful etc.
Classification and Regression – Algorithms
Some of the algorithms used under regression and classification are mentioned below. We will not define algorithms mentioned in above picture, These were defined in our previous post .
In upcoming posts we will try to take each one of them in a detailed way including their definition, use cases and flow etc. For now lets just keep our focus on the two supervised learning method. Example a pair consisting of an advice object i.e. typically a vector and desired output value.
Conversion Between Classification and Regression
In some cases, it is possible to convert a regression problem to a classification problem and vis-a-versa
- Regression to Classification -: In short this is conversion of cardinal numbers to ordinal range by giving class names to values. Here continuous values gets converted into discrete buckets. Creating bucketing system i.e classifying spending through credit card for value range of $0-$1000 range in to classes as below
- $0 to $200 assigned to Class-1
- $201 to $500 assigned to Class-2
- $501 to $1000 assigned to Class-3
- Classification to Regression -: Converting ordinal range into cardinal values i.e discrete buckets to continuous values. After reversing above example by changing class value to continues range we get results as below
- Class-1 assigned to value range $0 to $200
- Class-2 assigned to value range $201 to $500
- Class-3 assigned to value range $501 to $1000
Word of caution – Mapping error for continuous range often occurs which results in bad performance of model.
Books + Other readings Referred
- Research through Open Internet – NewsPortals, Economic development report papers and conferences.
- AILabPage (group of self-taught engineers) members hands-on lab work.
Feedback & Further Question
Do you have any questions about Machine Learning, Data Science or Data Analytics? Leave a comment or ask your question via email . Will try my best to answer it.
Points to Note:
All credits if any remains on the original contributor only.
Conclusion – We have elaborated our earlier posts on Machine learning algorithms for understanding classification and regression techniques under supervised learning. Main focus was on highlighting the difference between classification and regression problems. In short its easy to say in regression problem; the system attempts to predict a value for an input based on past data. In classification, predictions are made by classifying them into different categories.
Machine learning and its algorithms that are either supervised or unsupervised. Traditional machine learning focuses on feature engineering, deep learning focuses on end-to-end learning based on raw features.
============================ About the Author =======================
Read about Author at : About Me
Thank you all, for spending your time reading this post. Please share your opinion / comments / critics / agreements or disagreement. Remark for more details about posts, subjects and relevance please read the disclaimer.