Sunday, February 20, 2022

The relationship between interest rates and the stock market.

 Both the Futures markets and the Federal Reserve expect the Fed Funds rate to reach around 1.50% during the first half of 2023.  Given that, investors are concerned that the market is in for a serious correction.  

You can read my complete study of the relationship between rates and the stock market at: 

Stock Market Study at Slideshare.net 

Stock Market Study at SlidesFinder.com

Higher rates entail a higher discount rate of prospective earning streams, therefore reducing the net present value of a company's stock.  On the other hand, higher rates are also often associated with economic growth  resulting in faster earnings growth.  Thus, higher discount rate vs. potentially faster earnings growth are countervailing forces.  Consequently, the relationship between (rising) interest rates and the stock market may not be as one-sided as anticipated.  

Using quarterly data going back to 1954, visually, we can observe that the relationship between interest rates and the stock market is rather weak, approaching randomness.  Whether we focus on the Feds Fund rate (FF) or the 10 Year Treasury one, a quarterly change in such rate is not that informative regarding quarterly change in the S&P 500 level.  

The correlation between the change in the S&P 500 and change in FF is only negative - 0.19.  And, it is negative - 0.24 with the 10 Year Treasury rate.  Both translates into R Squares that are very close to Zero, indicating that rates explain very little of the S&P 500 behavior.  That is pretty much what the graphs above are conveying.  The regression line has a slope that is very flat.  The confidence interval of the data-points location along that line is pretty wide.  The red ellipse show an image of near randomness.  

Focusing on 4-quarter change in the FF rates vs. the S&P 500, we compiled the following table based on data going back to 1954.  And, we focused on various FF rate increase ranges that reflect the prospective ones we are facing over the next year. 

As shown on the table above, FF increase levels do not clearly differentiate between S&P 500 level changes over 4 quarters.  Based on investment theory and monetary policy, you would expect that the higher the rise in FF, the lower the rise in S&P 500 over the reviewed period.  But, the empirical data does not quite support these common assumptions.  

Next, I built an OLS regression that also factored the influence of economic growth (rgdp), inflation (CPI), and quantitative easing (qe).  All variables were fully detrended on a quarterly basis. 


The regression shows a conundrum often encountered in such econometrics models.  The independent variables are statistically significant.  This suggests you have a pretty good model.  But, not so fast ...  this model is actually pretty poor with an Adjusted R Square of only 0.19.  It also has a very high standard error that is nearly as high as the standard deviation of the dependent variable, the change in the S&P 500 level. 

 

The facet graph above shows how mediocre this econometrics model is.  The residuals (red) capture a lot more of the volatility and the trend of the S&P 500 changes (black), than the model estimates (green). 

Next, I developed a couple of Vector Autoregression (VAR) models, one with 1-lag, and the other with 3-lags.  These VAR models were pretty much disastrous.  Probably the most efficient way to demonstrate that is to show that the VAR models were hardly any better than simply using a Naive model that would use as a single estimate the S&P 500 average quarterly change, and take its standard deviation as the standard error of this Naive model estimates. 


As shown above, the standard error of the two VAR models is hardly any lower than the standard deviation of the S&P 500.  The OLS regression model is clearly better than the VAR models.  Yet, its performance in terms of true error reduction (- 10.2%) is nothing to write home about. 

Additionally, the VAR models generated Impulse Response Functions that went in the wrong direction.  See below within the VAR model with 3 lags, the change in the S&P 500 over 8 quarterly periods in response to an upward 1 percentage point shock in FF.  It is positive.  That's clearly the wrong direction (as far as both investment theory and monetary policy are concerned.  

All of the above suggests that interest rate rises are not that deterministic in anticipating stock market downturns.  That may be in part because prospective FF rises or declines are already priced in the stock market through the Futures market.  Going forward, the market may very well encounter rough waters (as of this writing it already has).  But, it is for many more reasons than interest rates and even overall monetary policy alone.

 

Wednesday, February 2, 2022

Will we soon live to a 100?

 We are talking here of life expectancy at birth.   And, it represents the average (or probably the median) number of years one can expect to live when born in a given year.  This estimate is based on the current relevant mortality rate for each age-year.  

We already have centenarians now.  As a % of the population, the proportion of centenarians is likely to increase somewhat due to continuing progress in health care.  However, health care improvement may be partly countered by deterioration in health trends (rising diabetes, obesity rate, and declining fitness levels). 

To advance that in the near future we may reach a life expectancy of 100 is incredibly more challenging and unlikely than having a rising minority of the population reaching 100.  Here is why... for each person who dies at a more regular age of 70, you need 3 who make it to 110.  For each one who dies at birth, you need 10 who make it to 110. 

How about 90?  For each who dies at 70, you either need 2 who make it to a 100, or 1 who makes it to 110.  

You can see how the average life expectancy arithmetic is very forbidding. 

You can see my research on the subject at Slideshare.net and SlidesFinder.  

Live to a 100 at Slideshare    

Live to a 100 at SlidesFinder  

The above is a 35 slides presentation that is very visual and reads quickly.  Nevertheless, let me go over the main highlights. 

I looked at the life expectancy of just a few countries with very long life expectancy plus China and the US. 

                                                                                                                                                                           I observed an amazing amount of convergence between numerous countries that are geographically and genetically very distant.  These countries have also very different culture, lifestyle, and nutrition.  Yet, they all fare very well and have a converging life expectancy above 80 years old (several years higher than China and the US).  And, also several of those countries started from dramatically lower starting points.  This is especially true for Korea (South) that had a life expectancy much under 40 back in 1950.  And, now Korea's life expectancy is nearly as long as Japan, much above 80 years old


Next, I looked at the UN forecasts of such life expectancy out to 2099.  And, I found such forecasts incredibly optimistic. 

As shown, all countries' respective life expectancy keeps on rising in a linear fashion by 1.1 year per decade.  This seems highly unlikely.  The longer the life expectancy, the harder any further increase becomes.  The forecasts instead should probably be shaped as a logarithmic curve reflecting smaller improvements as life expectancy rises. 


I did attempt to generate forecasts for a few countries (Japan and the US) using linear-log regressions to follow the above shape, but without much success.  This was in part because the historical data from 1950 to 2020 is often pretty close to being linear ... just like the first half of the logarithmic curve above is also very close to being linear.  Maybe if I had modeled Korea, I may have had more success using a linear-log model.  But, there was no way I could have successfully used this model structure for all countries covered because the country-level historical data had often not yet entered its logarithmic faze (slower increase in life expectancy).  The UN forecasts entailed that if the history was linear, the forecasts would be linear too ... a rather questionable assumption.   

Also, as mentioned current deteriorations in health trends are not supportive of rising life expectancy...  especially life expectancy keeping on rising forever in a linear fashion.  I call this questionable forecasting method the danger of linear extrapolation. 

As shown below, the rate of diabetes is rising worldwide. 


Also, BMI is rising worldwide. 


This deterioration in health trends represents material headwinds against life expectancy keeping on rising into the distant future. 

The full presentation includes much more coverage on all the countries, more info on health trends; and it also looks at healthy life expectancy, a very interesting and maybe even more relevant subject than life expectancy.  Who wants to live to a 100 if it entails 30 years of disability.  Healthy life expectancy is what we really want.  At a high level, healthy life expectancy is typically a decade shorter than life expectancy.  For more detailed information go to the full presentations.    


Wednesday, January 19, 2022

Comparing R vs. Python graphing capabilities for time series data

 I used a simple time series data set on the number of touch downs for seven different quarterbacks achieved over the years.  The x-axes of the graphs are the quarterbacks' respective age.  The y-axes are their respective cumulative number of touch downs.  

You can see the complete presentation at the link below: 

R vs. Python comparison

And, I compare the two software using different types of graphs, including:

1) Time series graph of a single variable (the number of touch downs for one single quarterback);
2) Time series graph of multiple variables (including all 7 quarterbacks); and 
3) Facet graphs when you generate a separate graph for each of the quarterbacks. 

For the first two types of graphs, the two software were pretty competitive.  R was a bit more efficient in generating legends almost automatically.  Meanwhile, constructing a legend using Python was a lot longer and manual.  But, otherwise the respective Python graphs were pretty competitive with the R ones in terms of look and feel.  And, the coding difficulty (besides the legend bit) was fairly similar. 

When it came to Facet graphs, there was no comparison.  R was far easier and better.  Python facet graph capabilities appear more structured for scatter plots and not so much for time series plots.  Doing the latter in Python was truly a miserable experience.  And, the result was so poor relative to the R facet graphs, that I don't even dare to show them here.  I show them within the presentation link above.  With superior Python coding skills, maybe facet-time series graphs are doable.  But, be warned.  There is high hurdle rate there in terms of coding skills.  

Here is a multi variables regular Python graph that came out very well.


 

Here is the comparable R graph that came out equally well. 


 

Here is an R facet graph that came out very well. 



Thursday, January 13, 2022

Will stock markets survive in 200 years? Some won't make it till 2050


Within a related study “The next 200 years and beyond” (see URLs below), 

 

The next 200 years at Slideshare

 

The next 200 years at SlidesFinder

 

... we disclosed that population and economic growth can’t possibly continue beyond just a few centuries.

 

Just considering what seems like a benign scenario: 

 

 Zero population growth with a 1% real GDP per capita growth … 

 

… would result in the World economy becoming 8 times greater within 288 years and 16 times greater within 360 years.  Thus, the mentioned scenario, as projected over the long term, is not feasible.  

 

This study contemplates how will stock markets survive in the absence of any demographic and economic growth.  The whole body of finance supporting stock markets (CAPM, Dividend Growth model, Internal Rate of Return, Net Present Value) evaporates in the absence of a growth input (market rate of return, dividend growth, etc.). 

 

And, current trends over the past few decades confirm the World is already heading in that direction.  In our minds, this raised existential considerations for stock markets. 

 

This study uncovered several stock markets that already experience current and prospective growth constraints.  And, the survival of several of those markets till 2050 appear questionable. 

 

Place yourself in the shoes of college graduates entering the labor force and investing in their 401K for retirement.  The common wisdom is to invest the majority of such funds in the stock market to reap maximum growth over the long term.  Such a well established strategy, would most probably not work out for the majority of the 11 markets reviewed.  And, it could be devastating if the college grad lives in Greece, Italy, or Ukraine. 

 

Similar considerations, within the same mentioned countries, would affect any institutional investors focused on the long term such as pension funds, endowment funds, insurers, retail index fund investors, etc.

 

In the US, we may be spared these bearish considerations, but for how long?  A century or two from now, we in the US may be affected by the same considerations.  

 

You can see the complete study at the following link below: 

 Stock market in 200 years at Slideshare

 

 

    

 

  

Wednesday, December 29, 2021

Standardization

 The attached study answers three questions: 

  1. Does it make a difference whether you standardize your variables before running your regression model or standardize the regression coefficients after you run your model? 
  2. Does the scale of the respective original non-standardized variables affect the resulting standardized coefficients? 
  3. Does using non-standardized variables vs. standardized variables have an impact when conducting regularization? 

The study uncovers the following answers to those three questions:

  1. It makes no difference whether you standardize your variables first or instead standardize your regression coefficients afterwards. 
  2. The scale of the original non-standardized variables does not make any difference.
  3. Using non-standardized variables when conducting regularization (Ridge Regression, LASSO) does not work at all.  In such a situation (regularization) you have to use standardized variables. 

To check out the complete study (very short just 7 slides) go to the following link.  

Standardization study at Slideshare.net

Thursday, December 23, 2021

Is Tom Brady the greatest quarterback?

 If you want to review the entire study, you can view it at the following links: 

Football study at Slideshare.net

Football study at SlidesFinder.com 

The above studies include extensive use of the binomial distribution that allows differentiating how much of the quarterbacks' respective records are due to randomness vs. how much is due to skills.  This statistical analysis is not included within this blog post.  (The study at SlidesFinder may not include this complete section right now, but it should within a few days). 

The quarterbacks I looked at include the following: 

 

Performance during the Regular Season.

If we look at Brady's performance during the regular season at mid career (34 years old), he actually is far behind many of his peers.  

First, let's look at cumulative yards passed by 34 years old. 


Next, let's look at number of touch downs by 34 years old. 


As shown above in both yards and touch downs, at 34 years old Brady is way behind Manning, Marino, Brees, and Favre.  

At this stage of his career and on those specific counts, Brady does not look yet earmarked to become a legendary number 1.  

However, Brady's career longevity and productivity is second to none.  And, when you compare the respective records over an entire career, the picture changes dramatically. 

 

 Brady's ability to defy traditional age sports curve is remarkable.  He just has not shown any decline in performance in age.  At 44, he is just as good as 34... unlike any of his peers who have been out of the game for years. They all retired by 41 or earlier. 
 
 
Track record during the Post-Season.  

During the Post-Season it is a very different story.  Brady has been dominant throughout and since early on in his career.  He leads in number of Play Offs. 

 







 

 

He is way ahead in number of Super Bowl games. 


And, way ahead in Super Bowl wins. 


The table below discloses the performance of the players during the Post-Season. 

Given the number of teams in the NFL (32), and number of seasons played, the above players have a random proportional probability of winning one single Super Bowl ranging from 50% (for Montana) to 66% (for Brady).  That probability based on just randomness drops rapidly to close to 0% of winning 2 Super Bowls.  Notice that Marino, Brees, and Favre actual records are in line with this random proportional probability.  This underlies how truly difficult it is to win more than one Super Bowl.  Manning and Elway do not perform much above this random probability.  Only Montana and Brady perform a heck of a lot better than random probabilities would suggest based on the number of seasons they played. And, as shown Brady with 7 is way ahead of Montana.  And, he is not done!

When looking at the Post-Season track record, there is no doubt that Brady is the greatest.  Under pressure, and when it counts he scores.  Also, interesting even when he looses in a Super Bowl game, it is a close game.  He does not get wiped out.  By contrast some of the other quarterbacks (including Marino, and Elway among others) suffered truly humiliating lopsided defeats in the Super Bowl... not Brady.


Friday, December 10, 2021

Why you should avoid Regularization models

 This is a technical subject that may warrant looking at the complete study (33 slides Powerpoint).  You can find it at the two following links. 

Regularization study at Slideshare.net

Regularization study at SlidesFinder.com 

If you have access to Slideshare.net, it reads better than at SlidesFinder. 

Just to share a few highlights on the above.  

The main two Regularization models are LASSO and Ridge Regression as defined below. 


 

  

 

 

The above regularization models are just extension of OLS Regression (yellow argument) plus a penalization term (orange) that penalizes the coefficient levels.  

Regularization models are deemed to have many benefits (left column of table below).  But, they often do not work as intended (right column of table below).

 

In terms of forecasting accuracy, the graphs below show the penalization or Lambda level on the X-axis.  As Lambda level increases from left to right, penalization increases (regression coefficients are shrunk and eventually even zeroed out in the case of LASSO models).  And, the number of variables left in the LASSO models decreases (top X-axis).  The Y-axis shows the Mean Squared Error of those LASSO models within a cross validation framework. 



 




 

The above graph on the left shows a very successful LASSO model.  It eventually keeps only 1 variable out of 46 in the model, and achieves the lowest MSE by doing so.  By, contrast the LASSO model on the right very much fails.  Close to the best model is when Lambda is close to Zero which corresponds to the original OLS Regression model before any Regularization (before any penalization resulting in shrinkage of the regression coefficients). 

Revisiting these two graphs and giving them a bit more meaning is insightful.  The LASSO model depicted on the left graph below was successful as it clearly reduced model over-fitting as intended as it increased penalization and reduced the number of variables in the model.  The LASSO model on the right failed as it increased model under-fitting the minute it started to shrink the original OLS regression coefficients and/or eliminated variables.







 

Based on firsthand experience the vast majority of the Ridge Regression and LASSO models I have developed resulted in increasing model under-fitting (right graph) instead of reducing model overfitting (left graph). 

Also, when you use Regularization models, they often destroy the original explanatory logic of the original OLS Regression model. 

The two graphs below capture the regression coefficient paths as Lambda increases, penalization increases, and regression coefficients are progressively shrunk down to close to zero.  The graph on the left shows Lambda or penalization increasing from left to right.  The one on the right shows Lambda increasing from right to left.  Depending on what software you use, those graphs respective directions can change.  This is a common occurrence.  Yet, the graphs still remain easy to interpret and are very informative. 






 

The above graph on the left depicts a successful Ridge Regression model (from an explanatory standpoint).  At every level of Lambda, the relative weight of each coefficient is maintained.  And, the explanatory logic of the original underlying OLS Regression model remains integer.  Meanwhile, on the right graph we have the opposite situation.  The original explanatory logic of the model is completely dismantled.  The relative weight of the variables dramatically change as Lambda increases.  And, numerous variables coefficients even flip sign (from + to - or vice versa).  That is not good. 

Based on firsthand experience several of the Regularization models I have developed did dismantle the original explanatory logic of the underlying OLS Regression model.  However, this unintended consequence is a bit less frequent than the increasing of model under-fitting shown earlier. 

Nevertheless for a Regularization model to be successful, it needs to fulfill both conditions: 

a) Reduce model overfitting; and

b) Maintain the explanatory logic of the model.  







 

If a Regularization model does not fulfill both conditions, it has failed.  I intuit it is rather challenging to develop or uncover a Regularization model that does meet both criteria.  I have yet to experience this occurrence. 

Another source of frustrations with such models is that you can get drastically different results depending on what software package you use (much info on that subject within the linked Powerpoint). 

One of the main objectives of Regularization is to reduce or eliminate multicollinearity.  This is such a simple problem to solve by simply eliminating the variables that appear superfluous within the model (much info on that within the Powerpoint) and are multicollinear to each other.  This is a far better solution than using Regularization models that are highly unstable (different results with different packages) and that more often than not fail for the mentioned reasons.

Compact Letter Display (CLD) to improve transparency of multiple hypothesis testing

Multiple hypothesis testing is most commonly undertaken using ANOVA.  But, ANOVA is an incomplete test because it only tells you ...