L1 and L2 Regularization in ML – The emergence of AI has brought about a significant transformation in the realm of worldwide technology. The field was fragmented into several categories, such as robotics, machine learning, natural language processing, and several additional areas. The production and advancement of smart gadgets such as computers, smartphones, software, and other intelligent devices have largely been shaped by these fields.
Machine learning is a distinct branch within the wider realm of artificial intelligence that concentrates on analyzing and assessing computational algorithms. These directives are utilized to create a blueprint for artificial intelligence, which forms the foundation for machines capable of self-directed learning. This model empowers machines to effectively handle real-world issues by utilizing both “training data” and “testing data” to forecast and make decisions that surpass their programmed limitations. The expanding field of data science is highly advantageous because of the tremendous usefulness provided by machine learning.
What are the limitations of ML?
The basic function of machine learning is to autonomously test and find a solution that is most accurate or fit for a problem. The process is not as smooth as it seems, machine learning has its limitations. Sometimes, the algorithm applied complicates the process of problem-solving rather than simplifying it. There are situations when the accuracy of the model is considerably low due to various possibilities. The main problems arise due to the model’s mismatch with the dataset provided; it can occur in two ways, termed as Underfitting and Overfitting.
Underfitting– If a model is oversimplified to learn the general pattern of a data set, then the phenomenon is termed as underfitting. This occurs due to applying an inappropriate machine learning algorithm on the given dataset; the data points are fed into an algorithm that does not fulfill the parameters. For instance, a dataset results in a sine curve, but a Linear Regression algorithm is applied. The model’s output will be a linear slope as predicted, which will not be fit, and therefore, the accuracy of the model will be very low. It is called underfitting because it tries to oversimplify the sine curve into a linear line, not practically possible.
Overfitting – This model tends to learn more than required from the ‘training data’ and gives similar results for other datasets. Its accuracy is perfect on the training data, but decreases considerably when applied to any other dataset. It is very specific to one set of data that the model even targets the noise, the data points which are not essential. Learning this can make your model more flexible, but the outcome reduces on other datasets. For instance, using a Polynomial Regression model on a dataset of polynomial feature transfer. The model learns a lot more than required when there is a vast scope of data value changes, which overcomplicates the model. This type of model gives an output specific to the ‘training data’ and not functional on unseen data.
In equational terms, overfitting is a result of large values and canceling out of basis function coefficients. Regularization is a technique used to prevent overfitting by decreasing the value of the coefficients.
What is Regularization?
It is used to resolve overfitting issues in machine learning by reducing the error. It includes modifying the algorithm equation to appropriately fit the function in a training set, not underfit or overfit. Overfitting, as mentioned above, occurs due to large values of the coefficients, so regularization decreases the value of the coefficients to make sure they don’t cancel out. It helps achieve a balanced fit between the data and the algorithm so that simple linear equations can be used. The resultant program is more appropriate and fluid. The equation’s change is minimizing some parameters of the overfit equation to create a balanced fit equation.
Regularization, denoted by ‘lambda’, functions by penalizing all the parameters except the intercept to enable the machine learning model to generalize the data and not overfit the data points. It adds a penalty to the model if the complexity of the model increases.
The fitting procedure involves a loss of function (RSS) in the relation of linear regression by choosing appropriate coefficients to minimize the loss function. Several ways of regularization, below mentioned norms, L1 and L2, are useful for dealing with overfitting.
Also termed as Ridge Regression, this variation adds penalties to the equation, so the model ends with the larger values of the function. The penalty is equal to the sum of the squared value of the coefficients. This larger value will be added to the MSE, which restricts the model to take larger values.
This ultimately leads to a simpler equation and prevents cancelling out of the coefficients. The increase in flexibility of a model is directly proportional to the value of the coefficients. The Ridge Regression minimizes the coefficients resulting in a smaller value of the function, which prevents the flexibility of the model. A tuning parameter, say lambda, decides how much restriction is supposed to impose on the machine learning model’s flexibility. Therefore, the value of the tuning parameter is crucial in the process. Except for the mean value of the response, the intercept, the association between the other response values and the variables is also minimized.
Also termed as Lasso, this variation performs a similar function to Ridge Regression by minimizing the function’s value. This differs from the L2 regularization in the method of penalizing the coefficient. The L1 norm or Lasso uses the modulus or absolute value as a penalty to be added to the MSE, which further restricts the model from becoming flexible and taking up undesirable values from the data points. The tuning parameter, assuming as lambda, is also crucial for this norm.
Let the constant be ‘a’ for each value of the tuning parameter. In the L2 regularization or Ridge Regression equation, the summation of the coefficient’s square will be less than or equal to ‘a’. In the L1 regulation or Lasso equation, the summation of modulus of coefficients will be less than or equal to ‘a’. The L1 and L2 equations are also known as constraint functions as they constrain the models from being overfitting. The main difference lies between L1 and L2 because L1 uses the median for estimation, while L2 uses the mean for the estimation to prevent overfitting of the machine learning model.
Limitations of L1 and L2 regularization
The Ridge Regression model shrinks the least significant predictor points’ coefficients, but they can never become exactly zero. This leads to a final model that includes all the predictors even after shrinking the values. In the case of Lasso, the L1 norm or equation converts some coefficient estimates into zero only when the value of the tuning parameter is quite large. It randomly selects data points and renders a scattered final model. These techniques differ in their efficacy based on the requirement.
Advantages of Regularization
Regularization is a super useful technique to boost the accuracy of the regression models. As per the functioning, L1 is useful for feature selection as unwanted variables can be excluded from the final model. On the other hand, L2 is useful for collinear features as it reduces the coefficient variance, which the codependent features tend to increase.
L1 and L2 Regularization is a great technique that must be comprehended by all the machine learners to develop more competent and accurate models.
This is a guest post