Considerations: Why you should avoid Regularization models

This is a technical subject that may warrant looking at the complete study (33 slides Powerpoint). You can find it at the two following links.

Regularization study at Slideshare.net

Regularization study at SlidesFinder.com

If you have access to Slideshare.net, it reads better than at SlidesFinder.

Just to share a few highlights on the above.

The main two Regularization models are LASSO and Ridge Regression as defined below.

The above regularization models are just extension of OLS Regression (yellow argument) plus a penalization term (orange) that penalizes the coefficient levels.

Regularization models are deemed to have many benefits (left column of table below). But, they often do not work as intended (right column of table below).

In terms of forecasting accuracy, the graphs below show the penalization or Lambda level on the X-axis. As Lambda level increases from left to right, penalization increases (regression coefficients are shrunk and eventually even zeroed out in the case of LASSO models). And, the number of variables left in the LASSO models decreases (top X-axis). The Y-axis shows the Mean Squared Error of those LASSO models within a cross validation framework.

The above graph on the left shows a very successful LASSO model. It eventually keeps only 1 variable out of 46 in the model, and achieves the lowest MSE by doing so. By, contrast the LASSO model on the right very much fails. Close to the best model is when Lambda is close to Zero which corresponds to the original OLS Regression model before any Regularization (before any penalization resulting in shrinkage of the regression coefficients).

Revisiting these two graphs and giving them a bit more meaning is insightful. The LASSO model depicted on the left graph below was successful as it clearly reduced model over-fitting as intended as it increased penalization and reduced the number of variables in the model. The LASSO model on the right failed as it increased model under-fitting the minute it started to shrink the original OLS regression coefficients and/or eliminated variables.

Based on firsthand experience the vast majority of the Ridge Regression and LASSO models I have developed resulted in increasing model under-fitting (right graph) instead of reducing model overfitting (left graph).

Also, when you use Regularization models, they often destroy the original explanatory logic of the original OLS Regression model.

The two graphs below capture the regression coefficient paths as Lambda increases, penalization increases, and regression coefficients are progressively shrunk down to close to zero. The graph on the left shows Lambda or penalization increasing from left to right. The one on the right shows Lambda increasing from right to left. Depending on what software you use, those graphs respective directions can change. This is a common occurrence. Yet, the graphs still remain easy to interpret and are very informative.

The above graph on the left depicts a successful Ridge Regression model (from an explanatory standpoint). At every level of Lambda, the relative weight of each coefficient is maintained. And, the explanatory logic of the original underlying OLS Regression model remains integer. Meanwhile, on the right graph we have the opposite situation. The original explanatory logic of the model is completely dismantled. The relative weight of the variables dramatically change as Lambda increases. And, numerous variables coefficients even flip sign (from + to - or vice versa). That is not good.

Based on firsthand experience several of the Regularization models I have developed did dismantle the original explanatory logic of the underlying OLS Regression model. However, this unintended consequence is a bit less frequent than the increasing of model under-fitting shown earlier.

Nevertheless for a Regularization model to be successful, it needs to fulfill both conditions:

a) Reduce model overfitting; and

b) Maintain the explanatory logic of the model.

If a Regularization model does not fulfill both conditions, it has failed. I intuit it is rather challenging to develop or uncover a Regularization model that does meet both criteria. I have yet to experience this occurrence.

Another source of frustrations with such models is that you can get drastically different results depending on what software package you use (much info on that subject within the linked Powerpoint).

One of the main objectives of Regularization is to reduce or eliminate multicollinearity. This is such a simple problem to solve by simply eliminating the variables that appear superfluous within the model (much info on that within the Powerpoint) and are multicollinear to each other. This is a far better solution than using Regularization models that are highly unstable (different results with different packages) and that more often than not fail for the mentioned reasons.

Considerations

Friday, December 10, 2021

Why you should avoid Regularization models

No comments:

Post a Comment

Compact Letter Display (CLD) to improve transparency of multiple hypothesis testing