Showing posts with label math. Show all posts
Showing posts with label math. Show all posts

Monday, February 28, 2022

How overvalued is the Stock Market?

Caveat: this analysis was conducted before the Russian invasion of Ukraine. 

You can find the complete analysis at the following two URLs:

Stock Overvaluation at Slideshare.net

Stock Overvaluation at SlidesFinder 

As a first cut, one looks at PE ratios and quickly infer that the Stock Market is much overvalued. 

Whether you look at a regular PE ratio or Shiller PE ratio (using 10 years of inflation adjusted earnings), PE ratios are pretty high right now.  But, on a stand alone basis a PE ratio does not tell you much if at all.  If the PE ratio of the S&P 500 is around 25, does it mean that stocks are expensive relative to bonds or other assets?  Does it mean that stocks are overvalued relative to the inflation rate or other economic indicators?  Frankly, you have no idea. 

To start our analysis, let's first look at whether stocks are overvalued or not relative to 10 Year Treasuries.  To render both investments comparable, we are going to flip the PE ratio upside down, and instead look at the EP ratio (Earnings/Price) that is commonly referred to as the Market Earnings Yield.  And, we are going to compare this Market EP with 10 Year Treasuries yield.  

As shown above, when we compare the S&P 500 earnings yield (EP) with 10 Year Treasuries yield (by dividing the former vs. the latter), we can observe that based on historical data the Stock Market appears relatively cheap or undervalued relative to 10 Year Treasuries yield. 

We can extend this analysis to all different types of bonds, and the result is the same.  Currently, stock are actually a lot cheaper than bonds. 

The table above shows on the first row that the EP multiple of 10 Year Treasury is currently 2.2.  It is much higher than the long term average of 1.3.  Also, the EP - 10 Year Treasury yield spread is 1.73%, which is 1.47% higher than the long term average of 0.27%.  Given that, stocks are currently a lot cheaper than 10 Year Treasuries.  And, the story is the same for 30 Year Treasuries, Moody's Baa corporate bonds, and S&P BB and B rated bonds.  Thus, despite the S&P 500 having a pretty high PE, stocks are actually really cheap relative to bonds.  But, does that mean that stocks are truly cheap or undervalued?  Or, that bonds are even more overvalued than stocks.  

The table below discloses that stocks and bonds are actually all rather extraordinarily expensive relative to the current inflation rate. 

Looking at the first column from the left, the S&P 500 Earnings Yield (EP) is - 3.65 percentage points lower than the inflation rate over the past 12 months.  And, that is - 3.64 standard deviation below the long term average for this spread that stands at + 2.17%.  Thus, from this standpoint the stock market is greatly overvalued.  Notice, that it is the same story for all the bonds.

One can reasonably argue that the stocks and bonds overvaluation is very much due to a one-off abrupt spike in inflation that should abate somewhat over the next year or so.    

In January 2022, inflation, measured as the 12 month change in CPI, jumped to 7.5%.  This was the highest inflation rate since the early 1980s.  

Next, let's look at a more stable measure of inflation.  It is also forward looking which makes it very relevant for the stock market.  That measure is the 10 Year Inflation Expectation derived by measuring the spread between regular 10 Year Treasuries and Inflation Indexed 10 Year Treasuries.  The current spread between the two is 2.45%, in line with long term average.  Now, let's look at the valuation of stocks and bonds relative to this inflation expectation measure. 

As shown, even using this more stable measure of inflation, stocks and bonds are still very much overvalued relative to inflation expectations. 

We can look at two linear regression models to explore in more detail how overvalued are stocks relative to inflation and inflation expectations. 

The linear regression on the left shows that given a current inflation rate of 7.5%, the estimated stock market EP is 9.0% vs. an actual figure of 3.83%.  For the mentioned reasons, we won't focus much on this model and this inflation measure.  Instead, we will focus more on the less volatile and more forward looking inflation expectation measure and the related model within the scatter plot on the right.  

The linear regression on the right shows that given a current 10 year inflation expectation of 2.45%, the estimated stock market EP is 5.33% vs. an actual figure of 3.83%.  Focusing on the 10 year inflation expectation measure, it would entail a potential market correction of: 3.83%/5.33% - 1 = - 28%.  Notice that this regression model is not that explanatory (R Square 0.27).  So, there is much uncertainty around this potential market correction estimate.  Nevertheless, the current EP of 3.83% is 1.4 standard error below the estimate of 5.33%, indicating that 92% of this regression model residuals are higher than for this current observation.  That is pretty far out on the left-tail.  

      

 

 

Wednesday, February 2, 2022

Will we soon live to a 100?

 We are talking here of life expectancy at birth.   And, it represents the average (or probably the median) number of years one can expect to live when born in a given year.  This estimate is based on the current relevant mortality rate for each age-year.  

We already have centenarians now.  As a % of the population, the proportion of centenarians is likely to increase somewhat due to continuing progress in health care.  However, health care improvement may be partly countered by deterioration in health trends (rising diabetes, obesity rate, and declining fitness levels). 

To advance that in the near future we may reach a life expectancy of 100 is incredibly more challenging and unlikely than having a rising minority of the population reaching 100.  Here is why... for each person who dies at a more regular age of 70, you need 3 who make it to 110.  For each one who dies at birth, you need 10 who make it to 110. 

How about 90?  For each who dies at 70, you either need 2 who make it to a 100, or 1 who makes it to 110.  

You can see how the average life expectancy arithmetic is very forbidding. 

You can see my research on the subject at Slideshare.net and SlidesFinder.  

Live to a 100 at Slideshare    

Live to a 100 at SlidesFinder  

The above is a 35 slides presentation that is very visual and reads quickly.  Nevertheless, let me go over the main highlights. 

I looked at the life expectancy of just a few countries with very long life expectancy plus China and the US. 

                                                                                                                                                                           I observed an amazing amount of convergence between numerous countries that are geographically and genetically very distant.  These countries have also very different culture, lifestyle, and nutrition.  Yet, they all fare very well and have a converging life expectancy above 80 years old (several years higher than China and the US).  And, also several of those countries started from dramatically lower starting points.  This is especially true for Korea (South) that had a life expectancy much under 40 back in 1950.  And, now Korea's life expectancy is nearly as long as Japan, much above 80 years old


Next, I looked at the UN forecasts of such life expectancy out to 2099.  And, I found such forecasts incredibly optimistic. 

As shown, all countries' respective life expectancy keeps on rising in a linear fashion by 1.1 year per decade.  This seems highly unlikely.  The longer the life expectancy, the harder any further increase becomes.  The forecasts instead should probably be shaped as a logarithmic curve reflecting smaller improvements as life expectancy rises. 


I did attempt to generate forecasts for a few countries (Japan and the US) using linear-log regressions to follow the above shape, but without much success.  This was in part because the historical data from 1950 to 2020 is often pretty close to being linear ... just like the first half of the logarithmic curve above is also very close to being linear.  Maybe if I had modeled Korea, I may have had more success using a linear-log model.  But, there was no way I could have successfully used this model structure for all countries covered because the country-level historical data had often not yet entered its logarithmic faze (slower increase in life expectancy).  The UN forecasts entailed that if the history was linear, the forecasts would be linear too ... a rather questionable assumption.   

Also, as mentioned current deteriorations in health trends are not supportive of rising life expectancy...  especially life expectancy keeping on rising forever in a linear fashion.  I call this questionable forecasting method the danger of linear extrapolation. 

As shown below, the rate of diabetes is rising worldwide. 


Also, BMI is rising worldwide. 


This deterioration in health trends represents material headwinds against life expectancy keeping on rising into the distant future. 

The full presentation includes much more coverage on all the countries, more info on health trends; and it also looks at healthy life expectancy, a very interesting and maybe even more relevant subject than life expectancy.  Who wants to live to a 100 if it entails 30 years of disability.  Healthy life expectancy is what we really want.  At a high level, healthy life expectancy is typically a decade shorter than life expectancy.  For more detailed information go to the full presentations.    


Wednesday, December 29, 2021

Standardization

 The attached study answers three questions: 

  1. Does it make a difference whether you standardize your variables before running your regression model or standardize the regression coefficients after you run your model? 
  2. Does the scale of the respective original non-standardized variables affect the resulting standardized coefficients? 
  3. Does using non-standardized variables vs. standardized variables have an impact when conducting regularization? 

The study uncovers the following answers to those three questions:

  1. It makes no difference whether you standardize your variables first or instead standardize your regression coefficients afterwards. 
  2. The scale of the original non-standardized variables does not make any difference.
  3. Using non-standardized variables when conducting regularization (Ridge Regression, LASSO) does not work at all.  In such a situation (regularization) you have to use standardized variables. 

To check out the complete study (very short just 7 slides) go to the following link.  

Standardization study at Slideshare.net

Thursday, December 23, 2021

Is Tom Brady the greatest quarterback?

 If you want to review the entire study, you can view it at the following links: 

Football study at Slideshare.net

Football study at SlidesFinder.com 

The above studies include extensive use of the binomial distribution that allows differentiating how much of the quarterbacks' respective records are due to randomness vs. how much is due to skills.  This statistical analysis is not included within this blog post.  (The study at SlidesFinder may not include this complete section right now, but it should within a few days). 

The quarterbacks I looked at include the following: 

 

Performance during the Regular Season.

If we look at Brady's performance during the regular season at mid career (34 years old), he actually is far behind many of his peers.  

First, let's look at cumulative yards passed by 34 years old. 


Next, let's look at number of touch downs by 34 years old. 


As shown above in both yards and touch downs, at 34 years old Brady is way behind Manning, Marino, Brees, and Favre.  

At this stage of his career and on those specific counts, Brady does not look yet earmarked to become a legendary number 1.  

However, Brady's career longevity and productivity is second to none.  And, when you compare the respective records over an entire career, the picture changes dramatically. 

 

 Brady's ability to defy traditional age sports curve is remarkable.  He just has not shown any decline in performance in age.  At 44, he is just as good as 34... unlike any of his peers who have been out of the game for years. They all retired by 41 or earlier. 
 
 
Track record during the Post-Season.  

During the Post-Season it is a very different story.  Brady has been dominant throughout and since early on in his career.  He leads in number of Play Offs. 

 







 

 

He is way ahead in number of Super Bowl games. 


And, way ahead in Super Bowl wins. 


The table below discloses the performance of the players during the Post-Season. 

Given the number of teams in the NFL (32), and number of seasons played, the above players have a random proportional probability of winning one single Super Bowl ranging from 50% (for Montana) to 66% (for Brady).  That probability based on just randomness drops rapidly to close to 0% of winning 2 Super Bowls.  Notice that Marino, Brees, and Favre actual records are in line with this random proportional probability.  This underlies how truly difficult it is to win more than one Super Bowl.  Manning and Elway do not perform much above this random probability.  Only Montana and Brady perform a heck of a lot better than random probabilities would suggest based on the number of seasons they played. And, as shown Brady with 7 is way ahead of Montana.  And, he is not done!

When looking at the Post-Season track record, there is no doubt that Brady is the greatest.  Under pressure, and when it counts he scores.  Also, interesting even when he looses in a Super Bowl game, it is a close game.  He does not get wiped out.  By contrast some of the other quarterbacks (including Marino, and Elway among others) suffered truly humiliating lopsided defeats in the Super Bowl... not Brady.


Friday, December 10, 2021

Why you should avoid Regularization models

 This is a technical subject that may warrant looking at the complete study (33 slides Powerpoint).  You can find it at the two following links. 

Regularization study at Slideshare.net

Regularization study at SlidesFinder.com 

If you have access to Slideshare.net, it reads better than at SlidesFinder. 

Just to share a few highlights on the above.  

The main two Regularization models are LASSO and Ridge Regression as defined below. 


 

  

 

 

The above regularization models are just extension of OLS Regression (yellow argument) plus a penalization term (orange) that penalizes the coefficient levels.  

Regularization models are deemed to have many benefits (left column of table below).  But, they often do not work as intended (right column of table below).

 

In terms of forecasting accuracy, the graphs below show the penalization or Lambda level on the X-axis.  As Lambda level increases from left to right, penalization increases (regression coefficients are shrunk and eventually even zeroed out in the case of LASSO models).  And, the number of variables left in the LASSO models decreases (top X-axis).  The Y-axis shows the Mean Squared Error of those LASSO models within a cross validation framework. 



 




 

The above graph on the left shows a very successful LASSO model.  It eventually keeps only 1 variable out of 46 in the model, and achieves the lowest MSE by doing so.  By, contrast the LASSO model on the right very much fails.  Close to the best model is when Lambda is close to Zero which corresponds to the original OLS Regression model before any Regularization (before any penalization resulting in shrinkage of the regression coefficients). 

Revisiting these two graphs and giving them a bit more meaning is insightful.  The LASSO model depicted on the left graph below was successful as it clearly reduced model over-fitting as intended as it increased penalization and reduced the number of variables in the model.  The LASSO model on the right failed as it increased model under-fitting the minute it started to shrink the original OLS regression coefficients and/or eliminated variables.







 

Based on firsthand experience the vast majority of the Ridge Regression and LASSO models I have developed resulted in increasing model under-fitting (right graph) instead of reducing model overfitting (left graph). 

Also, when you use Regularization models, they often destroy the original explanatory logic of the original OLS Regression model. 

The two graphs below capture the regression coefficient paths as Lambda increases, penalization increases, and regression coefficients are progressively shrunk down to close to zero.  The graph on the left shows Lambda or penalization increasing from left to right.  The one on the right shows Lambda increasing from right to left.  Depending on what software you use, those graphs respective directions can change.  This is a common occurrence.  Yet, the graphs still remain easy to interpret and are very informative. 






 

The above graph on the left depicts a successful Ridge Regression model (from an explanatory standpoint).  At every level of Lambda, the relative weight of each coefficient is maintained.  And, the explanatory logic of the original underlying OLS Regression model remains integer.  Meanwhile, on the right graph we have the opposite situation.  The original explanatory logic of the model is completely dismantled.  The relative weight of the variables dramatically change as Lambda increases.  And, numerous variables coefficients even flip sign (from + to - or vice versa).  That is not good. 

Based on firsthand experience several of the Regularization models I have developed did dismantle the original explanatory logic of the underlying OLS Regression model.  However, this unintended consequence is a bit less frequent than the increasing of model under-fitting shown earlier. 

Nevertheless for a Regularization model to be successful, it needs to fulfill both conditions: 

a) Reduce model overfitting; and

b) Maintain the explanatory logic of the model.  







 

If a Regularization model does not fulfill both conditions, it has failed.  I intuit it is rather challenging to develop or uncover a Regularization model that does meet both criteria.  I have yet to experience this occurrence. 

Another source of frustrations with such models is that you can get drastically different results depending on what software package you use (much info on that subject within the linked Powerpoint). 

One of the main objectives of Regularization is to reduce or eliminate multicollinearity.  This is such a simple problem to solve by simply eliminating the variables that appear superfluous within the model (much info on that within the Powerpoint) and are multicollinear to each other.  This is a far better solution than using Regularization models that are highly unstable (different results with different packages) and that more often than not fail for the mentioned reasons.

Tuesday, November 23, 2021

Is the 3-points game taking over NBA basketball

 The short answer is not yet.  The graph below shows that 2-points still make over 50% of overall points.  Granted, 3-points have steadily risen since the 1979-1980 NBA season when 3-points were first introduced in the NBA.  It took a while for the players to adapt their skills and coaches to evolve their strategies to leverage the benefits of 3-points shots. 

 
The big difference over time is how much more aggressive players have become in attempting 3-points shots.  Until the 2011 - 2012 season, teams were attempting less than twenty 3-points shots per game.  The number has exploded to over 35 during the most recent two seasons. 
 

 Something to keep in mind is that the 3-points shooting skill of a team has only a rather moderate to weak relationship with a team's overall performance or ranking.  And, that is another way to consider that 3-points shooting is not dominant in the NBA or even determinant in NBA team's success. 


The graph above (using the NBA 2020-2021 season data) indicates that 3-points ranking of a team explains only 15% of the variance in the overall ranking of a team (R Square = 0.1485) and vice versa.  If 3-points ranking explained 100% of the overall ranking, the red regression trend line would be perfectly diagonal across the squares on the grid.  And, the regression equation would be: y = 1(x) + 0.  Or in plain English: 3-points ranking = Overall ranking.  As shown, this is far from this situation.  
 
Here are the top 5 leaders in 3-points baskets.  

Notice that two of them are still active: Stephen Curry (33 years old), and James Harden (32).  One would expect Curry to soon become the top leader; and, James Harden to move into the third spot.  By the end of their respective career, Curry and Harden may very well occupy the top 2 spots. 

A closer look at the top 5 record on a per game basis. 

What tables A and B indicate is that the contemporary players (Curry and Harden) have been far more productive in scoring 3-pts shots.  And, the main reason behind their success is that they have been far more aggressive in attempting 3-pts shots (see table B). 

In terms of accuracy (table C for 3-pts success rate), Kyle Korver, a player from another generation pretty much towers over the field.  But, his higher success rate did not matter much given that he made so fewer 3-points attempts per game than Curry and Harden (see table B). 

Curry's 3-points talent is in good part not reflected in any of the above statistics.  Curry differentiates himself from the field with his unique ability to score 3-points baskets from "way downtown", often at or even past mid-court.  Unfortunately, this superlative achievement is not rewarded with any scoring points benefits.  

Harden is a very different player.  While nearly as aggressive as Curry in attempting 3-points shots (table B).  He is not nearly as accurate (lower success rate as shown in table C).  In recent seasons, Harden has also somewhat lessened his focus on 3-points shots attempts (table B).  On the other hand, Harden is a very dynamic and diversified player.  And, his claim to fame may not be just his 3-points shooting skills, but his mesmerizing dribbling across his legs in a crouching tiger type position that has rendered him the most "unguardable" player in the NBA.  

Next question worth considering is how long can we expect Curry to perform at top level in 3-points shooting?  

Well, the short answer is for a pretty long time.  The graph below shows the record of Ray Allen, Reggie Miller, and Kyle Korver who rounded the top 5 in 3-points shooting.  We looked at their 3-points success per game (number of baskets) and their related success rate.  The graph shows their respective performance as they aged.  We used the average of their respective performance over 6 seasons when they were from 28 to 33 years old.  We used this average as a baseline index = 100.  And, next we divided each year specific performance by the 28 -33 average and multiplied it by a 100.  This allowed us to measure precisely how their respective performance declined as they aged beyond 33 years old. 




The left hand graph shows that Miller and Korver maintained their 3-points success per game very well as they aged.  At 38 years old, they were still performing at 80% of their average level at 28 to 33 years old. 

The right hand graph shows that all three players maintained their respective 3-points success rate remarkably well as they aged.  Shooting accuracy just does not seem to deteriorate with age.  
 
Curry is now 33.  In view of the above, it is rather likely that he would be very close to or at top form over the next three years (34, 35, 36).  Beyond 36, he may experience a mild decline in 3-points success per game.  But, he may still be relatively formidable in that category compared to other players. 
 
We could say the same thing for Harden (32).  However, Harden has apparently been much less focused on 3-points shooting during the most recent two seasons. 
 
I actually do not follow basketball.  Seeing everyday pictures of Curry on the cover of the sport page of my daily newspaper, I eventually caught Curry fever.  In view of that, I welcome comments, corrections.  And, I would gladly edit and improve this blog entry over time.  
 
If you want to read my complete study on the subject, check the two links below. 
 





Friday, November 5, 2021

Will we likely keep temperature increase at or below + 1.5 degree Celsius by the end of the Century?

 The short answer is that it is most unlikely that we will be able to do so.  

As we speak our global temperature is already at about 1.1 degree Celsius (over the average from 1850 - 1900).  So, we have only 0.4 degree Celsius to play with. 

Over the past 40 years, our temperature has risen by 0.7 degrees.  If the past is representative of the future, this suggests that over the next 23 years our temperature may very well increase by 0.4 degree over current levels.  And, going forward we would cross the + 1.5 degree threshold.   

This back of the envelope estimate is very much in line with the most recent scenarios generated by the IPCC, as shown below.


It is also in line with a forecast generated by a Vector Autoregression (VAR) model I had introduced in a recent post, as shown below. 

Thus, as described using three completely different methods ranging from rudimentary to pretty complex, we are most likely to run into trouble during the 2040s when we well could cross that + 1.5 degree Celsius threshold. 

One can still advance the argument that going forward everything will change.  We are decarbonising our World economy, etc.  

Well, the U.S. International Energy Agency (IEA) most recent forecast is really not encouraging on this ground.  They foresee a continued rapid rise in CO2 emission that will contribute to ongoing temperature rise. 

 
 
 
Quoting the EIA: 

“If current policy and technology trends continue, global energy consumption and energy-related carbon dioxide emissions will increase through 2050 as a result of population and economic growth.

 

Oil and natural gas production will continue to grow, mainly to support increasing energy consumption in developing Asian economies.”

 

If you want more information on this topic, please view the link to my presentation on the subject. 


2100 Temperature Forecast

 


 


Saturday, October 30, 2021

Climate Change Models

 I am just sharing here some climate change models.  The main objectives included: 

1) being able to fit the historical World temperature data; 

2) being able to forecast World temperature using true out-of-sample or Hold Out testing; and 

3) being able to demonstrate causality between CO2 temperature concentration and temperature level. 

The models disclosed within this following link:

Climate Change Models 

... were surprisingly successful in meeting objectives 1) and 2).  They did very well at fitting the historical temperature data and forecasting temperature (out-of-sample).  By just using CO2 concentration (in either nominal or log transformation) as the main independent variable, the models could reasonably accurately estimate or predict temperature level.  

The most surprising model was a Vector Autoregression (VAR) model using just one single lag (1-year lag given the yearly frequency of the data).  And, this same model using historical data up to 1981 was able to predict reasonably accurately yearly temperatures from 1982 to 2020!  In decades of modeling time series, I have never encountered a model that works so well (either developed by myself or anyone else).  The most surprising thing is that this same VAR model does not even use the known values of the independent variable (natural log of CO2 concentration) from 1982 to 2020.  Without feeding any information to the VAR model over the out-of-sample period, it still could predict temperature pretty well.  

Notice how the VAR forecast over the 1982 to 2020 period is typically much under + or - 0.2 degree Celsius off.  

Going back to the third objective of the climate change models regarding confirming statistical causality between CO2 concentration and temperature, the modeling results using Granger causality methodology were far more humble.  Establishing Granger causality was rather challenging.  This was probably due to the temperature level variable being so autocorrelated.  Notice that this was not a technical flaw within any of the developed OLS regressions or VAR models because the two variables were very much cointegrated (as tested using Cointegration regressions). 



Friday, October 29, 2021

Medical Decision Making with Clinical Tests

The following analysis (Test Decision) provides an analytical framework on how to interpret clinical tests you may undertake.

Test Decision

At a high level, I can give you a summary of the whole concept. 


 

Compact Letter Display (CLD) to improve transparency of multiple hypothesis testing

Multiple hypothesis testing is most commonly undertaken using ANOVA.  But, ANOVA is an incomplete test because it only tells you ...