Monday, September 5, 2022

Compact Letter Display (CLD) to improve transparency of multiple hypothesis testing

Multiple hypothesis testing is most commonly undertaken using ANOVA.  But, ANOVA is an incomplete test because it only tells you if several variables, or factors, have different Means.  But, it does not tell you which specific ones are truly different.  Maybe out of 5 variables (A, B, C, D, E) only E is truly different.  And, this sole variable causes the ANOVA F test to be statistically significant.  The other 4 variables could have similar Means.  

 

The Tukey Highly Significant Difference test (Tukey HSD) remedies the above situation.  This is a post-ANOVA test that tests whether each variable is different from any of the other ones.  And, Tukey HSD is conducted on a one-on-one matched variable basis just like an unpaired t test.  So, Tukey HSD tests the difference in Means for A vs. B, A vs. C, A vs. D, etc.  While the Tukey HSD test provides an abundance of supplementary information to ANOVA, its output is overwhelming for non-statisticians. 

 

Compact Letter Display (CLD) dramatically improves the clarity of the ANOVA & Tukey HSD test output. 

 

CLD can be used to improve tabular data presentation and visual data presentation.  For instance, if we want to compare the average rainfall data of five West Coast cities, we could first represent the tabular data as shown below. 

 


The above table is sorting the cities rainfall data in alphabetical order.  As is, that table is not very informative.  You have no idea which cities average rainfall level is statistically different from any of the other cities.  

 

If we replicate the above table, but improve it using the CLD methodology, it is now far more informative as shown below. 


Now the table using the CLD methodology ranks the cities by mean (or average) rainfall in descending order.  Additionally, it groups the cities by clusters of cities that do not have statistically different means. 

 

For instance, Seattle and Portland are both classified as "b" because their mean are not statistically different (using an alpha of 0.05).  

 

Similarly, San Francisco and Spokane are classified as "c" because their respective means are also not statistically different. 

 

When it comes to visual data, the CLD enhancement is also quite spectacular.  See below, a starting box plot describing the rainfall level of the five mentioned cities.  The cities are again sorted alphabetically from left to right on the X-axis.  As structured this box plot is not that informative.  You can't readily identify the cities that have similar vs. dissimilar mean rainfall levels. 



 Now, if we restructure this same box plot using the CLD methodology, it immediately becomes far more informative. 

 


As shown above, we can now readily identify the cities with the higher rainfall levels.  They are sorted in descending order from left to right on the X-axis.  We can also identify the cities that do not have statistically different means with Seattle and Portland both classified as "b", and San Francisco and Spokane classified as "c". 

 

You can read a more detailed explanation of this CLD methodology at the following URLs:

 

CLD at Slideshare.net

CLD at SlidesFinder 

CLD at ResearchGate 

 

 

 

No comments:

Post a Comment

Compact Letter Display (CLD) to improve transparency of multiple hypothesis testing

Multiple hypothesis testing is most commonly undertaken using ANOVA.  But, ANOVA is an incomplete test because it only tells you ...