Skew and Kurtosis as trading rules
This is part X of my series of blog posts on skew and kurtosis, where 2<X<5. Part X, because it depends on how you number them! If you were to read them in a logical order then the series looks something like this:
- A post on skew: measuring, and it's impact on future returns
- A post on kurtosis: measuring, it's impact on future returns, and it's interaction with skew.
- A post ontrend following and skew (which I actually wrote first, hence the confusion!)
- This post: on using skew and kurtosis as trading rules
This put up may also show how we must test an ensemble of trading rules with out committing the sins of implicit becoming (essentially dropping versions that do not paintings from our backtest). And it's going to use pysystemtrade. There might be a few quite geeky stuff showing you a way to put into effect novel buying and selling policies in the aforementioned python library.
The trading guidelines
In the final couple of posts I defined that if we realize what skew and kurtosis have been recently (which we do) we are able to use that as conditioning information on what returns may be within the future (which we do not commonly know). The obvious element to do with that is turn it right into a trading rule, in truth there may be 12 trading guidelines. This is because I have three kinds of policies:
- a pure skew rule ('skew')
- a skew conditioned on kurtosis rule ('skewK')
- a kurtosis conditioned on skew rule ('kurtS')
And every of those rules may be carried out in 4 distinct approaches (basically 4 kinds of demeaning):
- Absolute: versus the average across all assets and time periods [an alternative for pure skew is to use zero as the average here, but let's be consistent] ('_abs')
- Relative to this particular assets history (where history was the last 10 years) ('_ts' for time series)
- Relative to the current cross sectional average across all assets ('_cs')
- Relative to the current cross sectional average within the relevant asset class ('_rv' i.e. relative value)
Finally each of these regulations can have 6 versions, for the six intervals over which skew/kurtosis might be measured:
- 7 days ('_7D')
- 14 days ('_14D')
- 1 month ('_30D')
- 3 months ('_90D')
- 6 months ('_180D)
- 12 months ('_365D')
Thus an absolute skew with kurtosis conditioning over 3 months can be referred to as 'skewK_abs_90'. Catchy. That's a complete of seventy two distinctive opportunities to don't forget!
Some of those will possibly be too costly to exchange on one or more devices, however pysystemtrade will cope with that for us. Still we are able to need to winnow this right down to a more sensible determine.
A brief geeky diversion
A precursor to the usage of pysystemtrade to check new trading rules is to add any uncooked records they'll get right of entry to inside the applicable python code, or particularly for futures here. If you're doing all of your own experiments you should do this by using inheriting from the bottom object inside the relevant report and adding bells and whistles inside the shape of extra methods, however considering 'my gaff (code) -my policies' I've updated the actual code. So as an example, if we calculate the skew right here then we will re-use it in many instances throughout the numerous rules.
However there's a weak point with this code, which is that we can not skip arguments into the uncooked records function. So we could not for example bypass inside the period of time used.
This isn't a problem in most cases since we can do the relevant work inside the actual trading rule, pulling in raw percentage returns as the input into our function. This is slower, but it works. It is however a problem for anything that needs access to the skew (or kurtosis) for other instruments (cs and rv rules), since trading rule functions work on the forecast for a single instrument at a time.
There are two options here; one is to modify the pysystemtrade code so it can deal with this, and the second is to fix the horizon length used in the cs and rv rules. Indeed this is the approach I use in the relative carry rule, discussed here.
I'm typically towards making things extra complex, but I think that changing the code is the right thing to do right here. See right here to peer how this become carried out.
Coding up the guidelines
Here then is the raw records code. As you may see I've coded up multiple strategies to calculate skew and kurtosis, after which stored things common with an entire bunch of 'component' strategies that may be used for any predictive factor (in some unspecified time in the future I'll replace the relative deliver code so it makes use of this pattern).
Notice that we have a superb skew, and a bad skew technique. The latter will be used for the standalone skew rule, and the previous as a conditioning method.
Here are the trading regulations:
from syscore.Algos import robust_vol_calc
def factor_trading_rule(demean_factor_value, easy=90): vol =robust_vol_calc(demean_factor_value) normalised_factor_value = demean_factor_value / vol smoothed_normalised_factor_value = normalised_factor_value.Ewm(span=easy).Imply() return smoothed_normalised_factor_value def conditioned_factor_trading_rule(demean_factor_value, condition_demean_factor_value, smooth=90): vol = robust_vol_calc(demean_factor_value) normalised_factor_value = demean_factor_value / vol sign_condition = condition_demean_factor_value.Practice(np.Signal) sign_condition_resample = sign_condition.Reindex(normalised_factor_value.Index).Ffill() conditioned_factor = normalised_factor_value *sign_condition_resample smoothed_conditioned_factor = conditioned_factor.Ewm(span=clean).Suggest() go back smoothed_conditioned_factor
As you may see these are clearly pretty ordinary trading regulations, that is a effect of the way I've written the uncooked information methods. This additionally means we are able to do lots of our paintings inside the configuration degree, instead of through writing many unique policies.
Notice all that I've added a smoothing function that wasn't in the original code. When I examined the output originally it was quite jumpy; this is because the skew and kurtosis estimators aren't exponentially weighted, and when one exceptionally bad or good return drops in or out of the window it can cause a big change. This meant that even the very long windows had high turnover, something that is undesirable. I've set the smooth at a tenth of the length of the lookback (not tested, but seems sensible).
Here is a gist showing the way to set up the 72 guidelines (in code, however ought to effortlessly be carried out as a configuration). A snippet is below:
easy = int(np.Ceil(lookback_days/10.Zero))
kurtS_rv = TradingRule(conditioned_factor_trading_rule,
facts=['rawdata.Get_demeanded_factor_value', 'rawdata.Get_demeanded_factor_value'], other_args=dict(smooth = easy, _factor_name="kurtosis", _demean_method="average_factor_value_in_asset_class_for_instrument", _lookback_days = lookback_days, __factor_name="skew", __demean_method="average_factor_value_in_asset_class_for_instrument", __lookback_days = lookback_days ))
You can really see here how the very generic functions are being configured. For the conditioned rule we pass two types of data; both are factors which have been demeaned hence the identical names. In the other args the smooth is passed to the trading rule itself, the single underscore prefixes (_factor_name, _demean_method, _lookback_days) are passed to the first method in the data list 'rawdata.demeanded_factor_value'; and the double underscores are passed to the second method (which happens to be the same here). On the second call the lookback and demeaning method are identical, but the factor names are different - we use skew as the main factor and kurtosis as the conditioning factor.
Checking behaviour, correlation and prices
Before piling into seeing whether any of these 72 (!) putative strategies makes sense from a behaviour, cost and correlation perspective. Hopefully we can drop some of the numerous variations. Now, I've been very vocal in the past about the use of fake data to do this part of fitting trading strategies.
However in this case we'd need to generate data that had interesting skew and kurtosis properies that were time varying. To avoid this I decided to use a single market, S&P 500. I chose the S&P because it has a reasonable length of history, and it's the second cheapest market I trade (the NASDAQ is slightly cheaper but doesn't have the same history). So if the S&P can't trade a particular rule, we can definitely ignore it.
This is slightly cheating, but I won't use any performance data to make in sample decisions.
First let's set up the backtest (assuming we've already got the trading rules using the gist code above):
ordered_rule_names = list(all_trading_rules.keys()) config = temp_config config.use_forecast_div_mult_estimates = True
config.use_forecast_scale_estimates = True
config.use_instrument_div_mult_estimates = True
config.use_instrument_weight_estimates = False
config.use_forecast_weight_estimates = True
del(config.instrument_weights) system = futures_system(trading_rules=all_trading_rules, config=config)
Now let's check the costs:
SR_costs_for_rules=[] for rule in ordered_rule_names: SR_costs_for_rules.append((rule,
system.accounts.get_SR_cost_for_instrument_forecast("SP500", rule))) SR_costs_for_rules.sort(key=lambda x: x[1])Looking at the last few observations, all the rules with a 7 day lookback have costs greater than my normal cuttoff (0.13 SR units, see "Systematic Trading" to understand why). So we can drop this from our consideration.
Now for correlations:
rule_returns=system.accounts.pandl_for_instrument_rules_unweighted("SP500").to_frame() rule_returns = rule_returns[ordered_rule_names] corr_matrix = rule_returns.corr()First let's look at the 'internal' correlations within each rule. For example:
select_rules = ['skew_abs_14', 'skew_abs_30', 'skew_abs_90', 'skew_abs_180', 'skew_abs_365'] corr_matrix.loc[select_rules, select_rules]
skew_abs_14 skew_abs_30 skew_abs_90 skew_abs_180 skew_abs_365 skew_abs_14 1.000000 0.530610 0.158682 0.104764 0.022758 skew_abs_30 0.530610 1.000000 0.445712 0.218372 0.039874 skew_abs_90 0.158682 0.445712 1.000000 0.619104 0.305271 skew_abs_180 0.104764 0.218372 0.619104 1.000000 0.580179 skew_abs_365 0.022758 0.039874 0.305271 0.580179 1.000000
It looks like there are pleasingly low correlations between adjacent trading rules. I checked this for all the rules, with similar results.
Now let's check for variations of the skew rule, eg:
skew_abs_14 skew_rv_14 skew_ts_14 skew_cs_14
skew_abs_14 1.000000 0.542259 0.996992 0.952949 skew_rv_14 0.542259 1.000000 0.543386 0.562397 skew_ts_14 0.996992 0.543386 1.000000 0.948784 skew_cs_14 0.952949 0.562397 0.948784 1.000000
Wow! Looks like the absolute, time series and cross sectional variations are basically doing the same thing. Checking the other rules I see similarly high correlations, although they tend to be a bit lower for longer lookbacks.
Whipping out Occams razor, it seems to make most sense to drop the time series and cross sectional rules completely since they are more complex implementations of the basic 'abs' rule but add little diversification. We'll keep the cross asset class relative value for now, since that does something quite different.
Now let's check across styles:
carry ewmac4_16 skew_abs_14 skewK_abs_14 kurtS_abs_14 carry 1.000000 0.079025 -0.020398 0.018712 0.053978 ewmac4_16 0.079025 1.000000 0.129336 0.077702 0.080301 skew_abs_14 -0.020398 0.129336 1.000000 0.184635 0.120404 skewK_abs_14 0.018712 0.077702 0.184635 1.000000 0.821673 kurtS_abs_14 0.053978 0.080301 0.120404 0.821673 1.000000Skew conditioned on Kurtosis, and kurtosis conditioned on skew, seem to have a highish correlation. That's also true for the cross sectional variants:
carry ewmac4_16 skew_cs_30 skewK_cs_30 kurtS_cs_30 carry 1.000000 0.079025 0.039870 0.032401 0.053643 ewmac4_16 0.079025 1.000000 0.118919 0.012837 0.044516 skew_cs_30 0.039870 0.118919 1.000000 0.151807 0.000230 skewK_cs_30 0.032401 0.012837 0.151807 1.000000 0.843337 kurtS_cs_30 0.053643 0.044516 0.000230 0.843337 1.000000
That pattern holds true all the way up the longest lookbacks. It probably doesn't make sense to have two skew rules, so let's drop the skew conditioned on Kurtosis - again this is the more complex rule.
This leaves us with the following rules:
- a pure skew rule ('skew')
- a kurtosis conditioned on skew rule ('kurtS')
And each of these rules can be applied in two different ways (essentially two kinds of demeaning):
- Absolute: versus the average across all assets and time periods [an alternative for pure skew is to use zero as the average here, but let's be consistent] ('_abs')
- Relative to the current cross sectional average within the relevant asset class ('_rv' i.e. relative value)
Finally each of these rules will have 5 variations, for the five periods over which skew/kurtosis will be measured:
- 14 days ('_14D')
- 1 month ('_30D')
- 3 months ('_90D')
- 6 months ('_180D)
- 12 months ('_365D')
Trading rule allocation
Proceeding with S&P 500 for now, let's see how my handcrafting method allocates weights:
portfolio = system.combForecast.calculation_of_raw_estimated_forecast_weights("SP500").results[-1].diag['hc_portfolio'] portfolio.show_subportfolio_tree()[' Contains 3 sub portfolios', ['[0] Contains 3 sub portfolios', (Skew and RV kurtosis) ['[0][0] Contains 3 sub portfolios', (Slower skew rules) ["[0][0][0] Contains ['skew_abs_180', 'skew_abs_365', 'skew_abs_90']"], ["[0][0][1] Contains ['skew_rv_180', 'skew_rv_90']"], ["[0][0][2] Contains ['skew_rv_365']"]], ['[0][1] Contains 2 sub portfolios', (Faster skew rules) ["[0][1][0] Contains ['skew_abs_14', 'skew_rv_14']"], (very fast skew) ["[0][1][1] Contains ['skew_abs_30', 'skew_rv_30']"]], (fastish skew) ['[0][2] Contains 3 sub portfolios', (Mostly RV kurtosis) ["[0][2][0] Contains ['kurtS_rv_180', 'kurtS_rv_365']"], ["[0][2][1] Contains ['kurtS_abs_14', 'kurtS_rv_14']"], ["[0][2][2] Contains ['kurtS_rv_30', 'kurtS_rv_90']"]]], ['[1] Contains 3 sub portfolios', (Carry and most absolute kurtosis) ["[1][0] Contains ['carry', 'kurtS_abs_180', 'kurtS_abs_365']"], ["[1][1] Contains ['kurtS_abs_30']"], ["[1][2] Contains ['kurtS_abs_90']"]], ['[2] Contains 3 sub portfolios', (Momentum) ["[2][0] Contains ['ewmac2_8', 'ewmac4_16']"], (Fast mom) ["[2][1] Contains ['ewmac32_128', 'ewmac64_256']"], (Slow mom) ["[2][2] Contains ['ewmac16_64', 'ewmac8_32']"]]] (medium mom)
I've added some notes manually, the algo doesn't do this labelling for us.
Summary of weights:
[(rule, weight) for rule,weight in zip(list(portfolio.all_instruments), portfolio.cash_weights)]
Carry 9.1%
EMWAC 12.9%
skew_abs 19.9%
skew_rv 17.0%
kurtS_abs 10.8%
kurtS_rv 29.2%
Performance
Okay, it's time for the moment of truth. How well do these trading rules actually perform?
First let's check out the skew rules:
select_rules = ['skew_abs_14', 'skew_abs_30', 'skew_abs_90', 'skew_abs_180', 'skew_abs_365']
system.accounts.pandl_for_all_trading_rules_unweighted().to_frame()[select_rules].cumsum().plot() |

The best performing 'vanilla' skew rule is the one with a 365 day lookback. A one year lookback is also what was used in the canonical paper on skew / futures (more on this later). It has a SR of 0.33. Not up there with the EWMAC and carry rules with SR of 0.9 plus (excluding the fastest EWMAC that comes in at just 0.5), but positive at least. Thereafter there is a very clear pattern with faster skew rules doing worse.
Incidentally the 'flat spot' on the blue line is because it can only be traded by the cheaper markets, none of which have data before the year 2000.
What about RV skew?

Now for kurtosis (conditioned on skew):


Hmmm. Nothing to shoot the lights out there eithier.
Rule selection part N
The holy grail is a trading rule that is negatively correlated to something we've already got, and has a positive Sharpe Ratio. In my original post ontrend following and skew I noted that skew for interesting reasons was likely to be negatively correlated with momentum at certain speeds, and seems to have positive performance.
In this post the negative correlation seems to have been borne out (or at least the correlation is basically zero), but the positive performance is patchy. Nevertheless, in my 'ideas first' paradigm (described here), I will sometimes use rules that don't have statistically significant performance if their original motivation is well founded. So it might be worth chucking some skew and kurtosis into the mix.
The slower skew rules (of both flavours) do a reasonable job, and they are logical and straightforward rules with a well motivated reason as to why they should work. Thanks to my prior work, I also have a good understanding of how they interact with momentum.
I'm a little less comfortable with the kurtosis rules; the conditioning makes it a little more complex than something I'd normally contemplate using. I think here I got a little carried away with demonstrating how clever I could be (okay K - K_mu * sign(S - S_mu) isn't exactly the general theory of relativity, but it's much more complex than EWMA_f - EWMA_s). On balance I would prefer not to use the kurtosis rules, even though their cumulative SR is similar to skew.
Some thoughts on fitting
It's worth noting that the 365 day skew rule, which did the best here, is the same lookback used by this paperhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=2671165. Here is an opportunity for me to quickly (remind / tell) you about my framework for three kinds of fitting.
Tacit fitting would have happened if I had used the 365 day rule having read it in that paper. We know that academic papers which don't have useful results are rarely published. Therefore there is a chance that the academics in question tried different formulations before deciding on 365 days. Of course this might not be true, and they could have just used 365 days; realised it worked, and moved on*. The fact this is a 365 day lookback, and not 275.4, makes this more plausible. Still the risk is there.
* And they could also have got the 365 days from another paper, whose authors tried different variations. Same problem.
Implicit fitting would be if I had run these backtests and chosen the best performing rule variations to use in my trading system (which as it happens were skew_abs_365 and skew_rv_180 if you're interested). Then when I ran my backtest again it would have looked pretty dammn good.
Explicit fitting is what I've actually done; used a mechanical rule to decide which rules are good, and should get more capital; and which are poor. This is the best kind, as long as you do it in a robust way that understands signal:noise ratios, and in a backward looking rolling out of sample fashion.
Having stated I will, going forward, only use the two skew rules am I guilty of implicit fitting? After all I have modified the configuration of my backtest after peeking at all the data. To a degree this is true. But I offer two defenses. Firstly, I'm still using all the different variations of the rules from 14 day to 365 day lookbacks and allowing the system to weight them appropriately. Secondly, removing the kurtosis rules doesn't really affect the performance of the system one way or another. So it's not like I'm biasing my backtest SR upwards by 50 basis points.
Portfolio level results
Having done all this, what effect does adding the two types of skew rule to my standard backtest have?

Conclusion
This exercise has been a little dissapointing, as I had hoped the skew rules would be a little more exciting performance-wise, but I've demonstrated some important practices. I've also had some fun adding extra flexibility to pysystemtrade.