A story of poor statistical intuition
In continuing my futile quest to elevate the extent of discussion within the quantitative funding network I idea I'd have a cross at another smart and really wealthy man, Cliff Asness, founder of giant fund AQR.
![]() |
| Cliff Asness. There is nothing wrong with his statistical intuition. Or his suit. Nice suit Cliff. (www.bloomberg.com) |
www.Aqr.Com/~/media/documents/papers/aqr-opportunity-questioning-3q15.Pdf
And, to disappoint you similarly, I'm not genuinely going to 'have a cross' on the authors of the paper.
I truely agree absolutely and wholeheartedly with the sentiments and the conclusions of the paper. I also realize that the authors have performed and additionally posted different, excellent, research which helps the consequences.
What I have a tiny little problem with is the way that the analysis is presented . More specifically it is about the interaction of a particular human weakness with a particular way of presenting data.
I should additionally say upfront that that is a problem by no means confined to this paper. It is endemic within the funding industry; and lots worse past it*. It in order that happens that I was sent this paper through an ex-colleague of mine very lately, and it got me a chunk riled.
* Here's a great book on the subject of mis representing scientific statistical evidence which is worth reading.
So this post is not a criticism of AQR particularly; and I must reiterate that I even have a massive amount of admire for his or her studies and what they do usually.
What does the paper say
If you're too idle to read the paper it seems on the quarterly returns of a gaggle of assets and style elements, conditioned on there being a horrible zone in eithier the bond or fairness markets (in addition they do a little stuff with overlapping 12 month returns). For example in case you skip to exhibit 1 it shows that for the ten worst quarters for equities when you consider that 1972 fixed earnings made cash in eight out of 10.
The ex colleague of mine who despatched me the paper made the following remark:
"Hello Robert,
Have you seen the paper attached yet? It is interesting that international bonds had a first rate performance inside the 10 worst quarters for global stocks in 1990-2014, however no longer the opposite manner round... Trend following appears to have properly performance when either stocks or bonds suffer."*
* I've decided to leave my friend anonymous, although he has kindly given permission for me to use this quote.
My first notion changed into "hmm... Sure that is thrilling".
Then after a few minutes I had a 2nd concept:
"Hang on. There are most effective 10 observations. Is this even statistically widespread? "
This become a critical trouble. More so due to the fact the authors of the paper had also highlighted a key locating, which relates to something I pointed out in my closing submit, trend following. From the paper:
"Trend was on average profitable in all asset classes returns during these equity tail events... As noted, Trend has often performed well in the worst equity quarters... Trend has been a surprisingly good equity tail hedge for more than a century"
I stared at the numbers, but I still could not determine whether they had been significant or no longer. The underlying hassle here is that people are rubbish at intuitively judging statistical significance - even ones like my buddy and I who virtually recognize the concept.
A little bit of small pattern facts
Before intending permit me briefly provide an explanation for my outburst on statistical significance. Those who, in contrast to me, noticed immediately whether or not the consequences have been statistically widespread or not can smugly skip ahead.
If we summary faraway from the specifics we will reword my buddies statement as follows:
"Hypothesis 1: The common return for Bonds, conditional on bad returns for Equities, is fine."
"Hypothesis 2: The common go back for Equities, conditional on bad returns for Bonds, is negative"
"Hypothesis 3: The average go back for Trend, conditional on negative returns for Equities, is wonderful"
"Hypothesis four: The average go back for Trend, conditional on terrible returns for Bonds, is effective"
The third hypothesis is also one of the important factors the author's flagged up.
(By the manner, and that is only a small criticism, it would have been more intuitive if the graphs inside the paper have been carried out with Sharpe Ratio gadgets in place of mean returns on the y axis; even though do the authors quote the Sharpe Ratio's inside the heading. Specifically it'd in all likelihood have made feel to normalise the quoted returns by the volatility of the entire pattern.
However I wager what's greater intuitive for me won't be to many different human beings; so I can stay with this.)
Notice that in every of the 4 speculation we've got an asset we are seeking to are expecting returns for, and some other asset that we are conditioning on. We can summary similarly to avoid having to version the joint distribution in an specific manner (you can try this of course, however it might take longer to provide an explanation for):
"Hypothesis 1: The average return for Bonds, in situation X, is high quality."
"Hypothesis 2: The average go back for Equities, in scenario Y, is bad"
"Hypothesis three: The common return for Trend, in state of affairs X, is tremendous"
"Hypothesis four: The average go back for Trend, in state of affairs Y, is fantastic"
Obviously Scenario X is bad equity returns, and scenario Y is poor bond returns. The next thing we need to think about is what econometricians would call the data generating process (DGP). This isn't so much where the data is coming from, but where we pretend it's coming from.
We'll treat scenario X and scenario Y individually. Scenario X then consists of a sample of 10 returns drawn from a much larger population which we can't see. Scenario Y is another 10 returns drawn from a different population. The sample mean return for bonds in X is +3.9%; and for equities in Y is -2.7% (from exhibits 1 and 3 respectively). For Trend it's 6.4% for X, and 3% for Y.
I'm additionally going to assume that the underlying populace is Gaussian, with some unknown suggest; but with a widespread deviation same to that of the sample wellknown deviation*; which for bonds X is ready three% 1 / 4; for equities Y around 5.2% 1 / 4, and for Trend 7% (X) and 5.3% (Y). This is all a touch unrealistic, but once more it might be extra complex to do it some other way, and it does not trade the middle message.
* Interestingly the full period standard deviation for bonds is 2.6%*** a quarter, equities 6.95%, and trend 5%. Risk seems to be a little higher than normal in equity crisis, but not so much when bonds are selling off.
** derived from annualised figures assuming no auto correlation among quarterly returns
Looking at my speculation the null I'm seeking to disprove in both cases is that the true population suggest go back is zero (I should do it different approaches, however this is easier). So permit me generate via repeated randomness the distribution of the sample mean statistic for 10 observations, given the envisioned wellknown deviation:
In python:
import numpy as np
import random as rnd
import matplotlib.Pyplot as plt
stdev_dict=dict(BOND=3.0, EQUITY=5.2, TRENDX=7, TRENDY=five.3)
sample_mean_dict=dict(EQUITY=-2.7, BOND=3.9, TRENDX=6.Four, TRENDY=three.0)
suggest=0.0
monte_carlo=a hundred thousand
sample_size=10
assetname="EQUITY"
stdev=stdev_dict[assetname]
estimate_mean=sample_mean_dict[assetname]
ans=[np.Mean([rnd.Gauss(mean, stdev) for x in range(sample_size)]) for unused_idx in variety(monte_carlo)]
ans.Type()
if assetname in ["TRENDX", "TRENDY", "BOND"]:
p_value=float(len([x for x in ans if x>estimate_mean]))/monte_carlo
elif assetname=="EQUITY":
p_value=float(len([x for x in ans if x<estimate_mean]))/monte_carlo
else:
enhance Exception("unknown assetname %s" % assetname)
ax2=plt.Gca()
component=plt.Hist(ans, 50)
ax2.Get_yaxis().Set_visible(False)
plt.Name(assetname)
ax2.Annotate("%.4f" % p_value, xy=(estimate_mean, 0),xytext=(estimate_mean,max(component[0])), arrowprops=dict(facecolor='black', decrease=0.05))
plt.Show()
First for equities:

This result is simply shy of importance, using the ordinary five% criteria. We can not reject the null hypothesis.
Then for bonds:

Now for Trend, conditioned on terrible fairness returns:


Not pretty as good, however simply creeps into being vast at the five% stage. In summary:
"Hypothesis 1: The common return for Bonds, conditional on bad returns for Equities, is fine." - we can say this is very likely to be true.
"Hypothesis 2: The common go back for Equities, conditional on bad returns for Bonds, is negative" - we cannot say if this is true or not.
"Hypothesis 3: The average go back for Trend, conditional on negative returns for Equities, is wonderful" - we can say this is quite likely to be true
"Hypothesis four: The average go back for Trend, conditional on terrible returns for Bonds, is effective" - we can say this is probably true
So my pal changed into in most cases proper; 3 out of 4 is pretty correct. AQR have been spot on; in reality the key findings they highlighted have been speculation 1 and three, the most incredibly enormous ones. What's greater my very own private emotions approximately allocating to trend following are nonetheless justified. However it has taken a fair little bit of work to prove this!
Why we've got to tell stories to explain stuff
The crux of the problem is that it's really, really hard to judge what is significant or not in small samples. Most people don't carry around an intuition about these distributions in their heads. But using small samples is quite common in papers like these. The reason is due to a flaw in the human brain, a cognitive bias, narrative fallacy . Or to put it another way we like to hear stories.
If I display you a mass of facts points you may likely be wondering 'yeah charming. Now what Rob?'. But if I display you a pleasing graph as in showcase 1 of the AQR document you may be thinking '4Q 1987. Black Monday! Ah I recall that / I've study about that (delete depending on age)...'.
![]() |
| 1987 crash. Yes children it's genuine. In the olden days investors wore suits and ties; monitors were absolutely, honestly massive; and the only shade they may show was green (usnews.Com) |
The facts becomes more interesting. Clever researchers recognise this, and so gift statistics in a way which makes it less complicated to hang a narrative off.
Why that is terrible
This is bad because a story can be each unrepresentative and additionally statistically meaningless. If I display you a tale about an plane crashing you are more likely to keep away from flying, even though I sooner or later display you some dry information at the relative protection of various forms of shipping.
![]() |
| A sample of 1. (www.Cnn.Com) |
Stories, or if you decide on small samples, can lead us to the wrong judgement*.
* I'm aware that 'you can prove anything with statistics'. However it's true to say that a rigorous analysis of a large sample set, properly presented, is always going to be more meaningful than the inferences a small sample.
Sometimes this is deliberate; as in maximum tabloid newspapers reporting on medical studies. Sometimes it's unintended.
Of course it might be that the small pattern is statistically giant, in which case we are able to draw a conclusion approximately the general populace, as inside the case of three out of the four speculation we've examined.
However if I see a paper with a few small pattern effects in it, but no indication of importance I do not know if:
- The authors have deliberately shown an unrepresentative and insignificant sample, and the results are wrong
- The authors have got an unrepresentative and insignificant sample by accident, haven't realised it and the conclusions are wrong
- The authors have got a representative sample, but not a significant one. We can't prove the conclusion either way.
- The authors have got a significant and representative sample (the authors may, or may not realise this. I expect the AQR authors did know. These guys aren't sloppy). The authors are correct, but I have no way of knowing this.
It's for this reason that instructional papers are plagued by p-values and other records (although that doesn't suggest you can accept as true with them entirely). I'm not pronouncing that a 'famous finance' paper like this need to be festooned with statistical confetti. But a footnote might have been satisfactory.
Conclusion
Don't be scared of explaining the uncertainty in estimates. Talk approximately it. Explain it. Let human beings visualise it. And when you have got giant outcomes, shout about it.
If you're concerned that this blog is going to hold on this vein (criticising the studies findings of hedge fund billionaires), don't worry. Next time I'll speak about something stupid and worthy, like estimating transaction expenses; or I'll provide you with a few thrilling python code to read.
But in case you're simplest now following me within the expectation that I'll be writing a post subsequent week about David Shaw's incapability to do stochastic calculus, or Ray Dalio's insistence on assuming returns are Gaussian, then I'm sorry you will be disappointed (and if their legal professionals are analyzing, neither of these things are true).


