Systems building - deciding positions
This is the 1/3 put up in a series giving hints on the nuts and bolts of building systematic buying and selling systems.
A not unusual myth is that the maximum essential a part of a systematic trading device is the 'algo'. The procedure, or set of policies that essentially says 'given this facts, what function do I need to preserve or trade do I need to do?'
To some quantity that is proper. Any 'secret sauce' which you have may be inside the 'algo'. All the rest of the code is 'just engineering'. There is not any alpha from having a sturdy database gadget (although there is probably from best futures rolling).
However the 'simply engineering' is more or less ninety five% of my code base. And despite the fact that there is no alpha from engineering, the consequences of screwing it up are sizable - there is an lousy lot of capability negative alpha from say wrongly thinking your position in BP.LSE is zero, while you need it to be 100 stocks, after which time and again buying one hundred shares over and over once more due to the fact you still think you don't have any stocks at all, and who knows you might become well-known.
Because matters constantly cross incorrect we do want to have the 'engineering': checks and balances whilst statistics is not ideal.
A typical algo
Because peoples idea of what constitutes a buying and selling 'algo' varies so widely, here's a well-known function which shows a number of the exciting inputs we ought to have in such a aspect. You may need to put sun shades on now, as they're a few hideous coloration coding happening right here.
Optimal_position_instrument_x_t =
f ( price_x_t, price_x_t-1, ....,
bid_price_quant_L1_x_t-3, ..., offer_price_quant_L2_x_t-7, ...
Some_data_x_t, some_data_x_t-1,
other_data_x_t-1, ....,
price_y_t, price_y_t-1, ...., some_data_y_t, ...,
price_z_t-1,..., some_data_z_t-2,... ,
position_x_t-1,
a_parameter, another_parameter, yet_another_parameter, ...)
Interpreting this into english, the forms of inputs into an 'algo' to workout the first-class function for tool X could consist of any of the subsequent things:
- The current price of instrument X
- A history of previous prices of X
- The state of the order book of X
- The last piece of 'some data' for X and optionally a history of 'some data' for X.
- 'Other data' for X for which the last observation hasn't yet been received.
- The current price of instrument Y (and perhaps it's history), and some data for Y (and again perhaps history).
- Some prices and data, for other instruments like Z, for which we don't yet have a current price or even the one before that
- The last position for X
- Some parameters
First factor to be aware right here is that we can have a unmarried information collection (maximum usually price), or more than one facts series.
Thing 1 and issue 2 could be used for Technical systems. I'm ignoring the question of 'what is charge' (mid of interior unfold, ultimate alternate, closing near, ...) which I've in brief addressed right here.
Market statistics intensity, cherished of high frequency traders and people the use of execution algos, is the stuff of thing 3.
Fundamental structures could have things of kinds 4 and five [See here to understand the difference].
Things of class 6 and seven are obviously used for move sectional trading systems. Think of a single u . S . Intra enterprise lengthy/brief fairness approach. You would in no way determine your function in GSK.L with out additionally understanding all the facts referring to AZN.L.
However they'll also be pass sectional information in other structures. So as an instance my positions rely upon the danger of my normal portfolio (any other put up to come...). That in turn relies upon on correlations. Which manner every role depends, to a point, at the fee records of the whole lot it truly is in my portfolio.
Thing 8 could manifest in two cases. The first is if you have a country based algo. For example some thing with separate entry and exit guidelines. It would behave in another way relying on whether you presently don't have any role, or are lengthy or brief.
This is not my factor, by the way (if you'll excuse the pun). State dependence makes back checking out difficult and unpredictable (small adjustments in the start date of your records series will decide your position history and therefore profitability), slow (you can not use green loops as without problems or separate calculations directly to cluster nodes) and live buying and selling erratic and intuitive (at the least to my thoughts).
The 2d case is if you have to pay trading fees. If you need to pay to change then it is only worth trading if your most desirable position is adequately extraordinary out of your cutting-edge one. 'Sufficiently different' is some thing complete phd thesis had been written on, but once more will be the challenge of some other publish sooner or later.
Thing 9 could be not unusual to all buying and selling systems, and is with a bit of luck simple.
What do they have in commonplace?
Now a close perusal of things of category 1 to 7 reveals something they have in common. Unlike in an ideal world we don't always know the current value of an input 'thing'; where thing could be the price of X, part or all of the order book for X, some other data for X, the price of another instrument Z, or some other data for Z.
We can expect that we'll usually know our function in X - if not we should not be trading - and that barring catastrophic disk failure our parameters will continually be available.
So for instance, if I'm the use of the whole L2 order ebook then it's unlikely that each one the bids and gives inside the device have been entered at precisely the identical time. It's not going I'll have up to date profits forecasts in a fundamental equity system; such forecasts are up to date sporadically over the path of several months and surely not minute via minute. Few matters, except specific unfold trades, change exactly at the equal time. Similarly fundamental records for one-of-a-kind international locations or units is not often launched in a single second.
Imagine the a laugh we might have if US non farm payrolls, GDP, interest charge choices and the identical for every other united states of america within the international have been all released at the identical time every month. Do you think that might be an amazing idea?
I bet he is not so eager.
This is the general problem of non-current data. If you have more than one thing in your 'algo' inputs, it will also be a problem of non-concurrent data; for example I might have the current price of X, but not a concurrent price of Y.
By construction this also means that in the history of your things there will probably be gaps. These gaps could be natural (the gaps in traded price series when no trades happen) or caused by up-sampling (If I'm using daily data then there will appear to be big gaps between monthly inflation figures in a global macro system).
Dealing with non-modern, non-concurrent and gaps in facts money owed for almost all of the engineering in position selection code.
Defining non -cutting-edge greater precisely - the scent of staleness
Of direction contemporary or non-modern-day, may be very a great deal in the eye of the beholder.
If you're a high frequency technical dealer then a quote that may be a few seconds stale is historical records. A slow shifting asset allocating investor might not be stricken through the reality that for example UK inflation came out months ago.
Implicit in what is going on here is that you're going to take your data, and ultimately resample it to a frequency that makes sense for your trading system. Let's call that the 'natural frequency'. For a medium speed trend follower like myself, daily is fine. But that bucketing could be done on a microsecond level, ..., second, ..., minute, ...., hourly, ... and so on perhaps up to quarterly (that's four times a year, by the way, not four times a second/minute/hour/day).
Alternatively you every so often pay attention investors speaking in 'bars'. For example "I use five 2nd bars" or one minute bars. Same idea; except of route they may be looking at the Open/High/Low/Close inside a bar instead of only a final price or average.
Anything that falls within the last time bucket at the natural frequency is effectively current. So if you're a fast day trader using a natural frequency of 5 seconds, and you haven't got an updated quote in the last 5 seconds, you would consider that to be a non current price. But if you had a quote 1 second ago then that would be fine.
In my particular gadget I use daily data. So earlier than I do some thing I down pattern my charges to a day by day frequency. I'd best don't forget myself to be missing a fee if I hadn't got anything since the start of the current day.
Quick definition:
Down sampling - to go from a high frequency data series to a lower frequency series eg minute by minute data down to daily
Note that down sampling may be accomplished in one-of-a-kind methods; i.E. Fundamental techniques are to take a mean over the slower term, or use the very last price that befell.
Up sampling: The reverse. Eg pass from month-to-month to each day.
Non-concurrent
We want to be very cautious how we resample and align information whilst we've a couple of assets of information.
To take a unnecessary and trivial instance, you may be the usage of dividend yield as a records factor (dividend in pennys in step with yr, divided via charge in the identical gadgets). Assume for simplicity annual dividends, and no dependable forecasts of dividends are available. It might be eleven months for the reason that ultimate dividend declaration become made.
On that day the rate became say $10 and the dividend changed into $1, giving a yield of 10%. Is 10% simply the most reliable and up to date fee of the dividend yield? No; due to the fact we've got the modern-day fee. It's regular practice to say that the (historical) dividend yield may be the closing available dividend parent, divided with the aid of the most up to date rate. We'd up sample the dividend figure in your natural frequency for rate, say each day, after which forward fill it. Then we exercise session what the yield might be each day.
Here is a greater complicated example. Suppose we are trying to work out the actual yield on a bond equal to nominal yield to adulthood in % minus inflation fee in %. Further think we're fairly gradual investors for whom month-to-month data is excellent.
Nominal yield to maturity is available any time we have a price from quotes or trades. For simplicity let's assume we're trading slowly, and using daily closing prices from which we will derive the yield. Inflation on the other hand comes out monthly in most advanced economies (with the exception of Australia, where it's quarterly; and Greece where most national statistics are just made up). For the sake of argument suppose that the inflation figure for May 2015 came out on 5th June 2015, June 2015 will come out on 5th July 2015 and so on.
If we looked at these two data series in daily data space, we'd see yields every day and then an inflation figure on the fifth of each month. In this case the right thing to do would be to first down sample the yields to a monthly frequency, using an average over the month. So we'd have an average yield for May 2015, June 2015, July 2015 and so on. We'd then compare the average yield for May 2015 to the inflation figure for the same month.
Note that the average actual yield for May 2015 isn't in reality to be had till fifth June. The published date on your facts ought to reflect this, or your again checking out will be the use of forward records.
Let's take another example. Suppose you want the spread between two prices, for two different instruments. Suppose also that the spread isn't traded itself. You could, as I do here, use closing prices. Alternatively you could be starting with an intraday irregular series of prices from each instrument. The right thing to would be to up sample them to some relatively high frequency; perhaps somewhere between microseconds and hours depending on the speed of your trading system and your tolerance for the accuracy of the spread.
You can then calculate the spread at the upsampled frequency, which should be quite a bit faster than you would really need. Depending on the nature of what you're doing this could be quite noisy (mismatched bid-ask bounce, different liquidity or delayed reaction to news for example). It's probably worth down sampling back to the frequency you actually want, and using an average when you do that to smooth out the noise.
Here is every other one. Imagine you need to examine the price of LIBOR futures with the underlying LIBOR spot fee. The former is available tick via tick in the course of the complete of a buying and selling day. The latter is 'constant' (unfortunate preference of terminology, given what's happened) and posted day by day.
In this example we best realize the 'authentic' price of the futures - spot spread at a single second. The accurate technique is to resample the futures expenses, getting each day a fee as near as viable to the 11am fixing time.
Then day by day you compute the spread between the fix and the 11am fix. In our database that should have a published time stamp of when the actual fix came out (normally around 11:45am).
Dealing with non-current information
Okay, so we've got one or extra non-current records objects. It is probably that we're missing the last price, or that it is been multiple weeks when you consider that we had an a determine for actual yields (for a gadget whose herbal frequency is every day), or that any other market which we need a charge from is closed these days.
No position
A very organization stance to tackle the difficulty of non contemporary records is to haven't any function in any respect in case you're lacking any records in any respect. If you're a excessive frequency day dealer who needs to look the whole order e-book to determine what position to have, and its crammed full of stale charges, then it's probable reasonable to do nothing for some time.
However for the substantial majority of humans that is insanity, and would lead to extremely high buying and selling prices if you closed your position whenever you had been lacking one lil' bit of facts.
Forward filling
The handiest, and in all likelihood maximum not unusual, method is to ahead fill - take the ultimate recognised suitable cost of your information thing and use that. Philosophically we are pronouncing that we can not observe information continously, and the great forecast of the contemporary unknown value of the data is the remaining value we noticed.
It can once in a while be useful to have a idea of maximum staleness. So think you are buying and selling the usage of month-to-month inflation information at a herbal frequency of every day. It will be everyday to have to ahead fill this for a few weeks. On the opposite hand if you haven't had any inflation facts whilst you ultimate expected to, and you are fairly having to forward fill for two months, then you definitely may want to treat that in another way and use one of the other strategies.
Be cautious approximately premature or wrong forward filling, that may cause wrong calculations across multiple statistics collection. Returning to the quite simple example of a dividend yield calculation, ahead filling the dividend yield as opposed to the dividend could have given us the wrong solution.
Extrapolation
We might be able to do better at forecasting the current value of an unobserved variable by extrapolating from the past data we have.Extrapolation may want to contain fitting a complicated time collection model of the information and operating out the next probable factor, or some thing as easy as 'drawing a line' via the ultimate couple of factors and lengthening it.
Temporary de-weighting
This is only applicable where you've got more than one data item.I'll want to in brief provide an explanation for this. Imagine you have got a gadget wherein your most efficient role is a function which depends on a linear weighted average of a few other functions, each the usage of a exclusive set of predictive statistics.
So you might have some thing like
position_t = f [ A x g (data_1_t) + B x h (data_2_t) ]
... Where f,g,h are features, A and B are weights, and information are boringly information. Now what if one of these bits of information is missing?
One option is to quickly set the weight (A or B) to 0. With missing records you are effectively announcing you've got less self assurance for your role, so its affordable to make it smaller to mirror that. The drawback of this method is the greater buying and selling it generates.
Note that is not the same as 'if a bit of statistics is lacking, set it to zero'. That might cause very whacky results, if for example you have got an extended collection of charges of one hundred, 101, 102, ... And then a 0.
Temporary re-weighting
Again this is only applicable where you've got more than one data item.Let's go back to the final equation once more:
position_t = f [ A x g (data_1_t) + B x h (data_2_t) ]
Suppose that usually A and B are both zero.50.
Rather than deweighting and getting a smaller function by means of placing B to 0 whilst data_2 is lacking, we ought to get data_1 to 'take up the slack'. In this example we would set B to zero, but then inflate A to one.0.
This method produces more trading, however now not as tons as full de-weighting.
Inference
It's now and again viable to infer what a lacking facts fee is from values of different information objects that we do have. So as an instance you would possibly have a sequence of interest prices of different maturities; and occasionally you find that the 7 yr point is not to be had.
Given a few model of ways hobby rates are linked you may likely be capable of do a quite first rate activity of inferring what the 7 12 months fee is.
I don't have any idea what this image method. But it looks quite.
Missing statistics
Non current data is effectively a special case of a broader problem: missing data. We can also have missing data which isn't current, eg historical data. Missing historical data can be for example a missing day in a series of closing prices.Generally you can adopt the same approaches - no position, deweighting, reweighting, forward filling and extrapolation. Depending on exactly what you're doing you could also use interpolation of course. So if for example you have a missing price one day in your history, you could do something simple like a linear interpolation, or use a brownian bridge if you were worried about understating volatility.
Just be careful you don't become again checking out the usage of ahead searching records that wasn't to be had when trades were executed.
Event driven systems - any higher?
A good deal of people are laughing at this publish so far, and that is the ones the use of event driven systems (but then they usually chuckle at everything I write ).
With an occasion driven gadget you watch for your data to arrive and then you definitely act on it. You might think therefore which you are immune from the lacking statistics issue. No facts - no action. This is quite different from the paradigm I'm generally running with; checking now and then to see what the ultra-modern records is, after which primarily based on that deciding what to do.
The hilarity of the occasion driven humans might be honest, but with massive large caveats. Firstly they should nevertheless be concerned with non-present day stale data. If you're working a few whacky intra day machine with an expected protecting length of five mins in which you search for a specific pattern to go into and some other to exit, what occurs in case you've had your function for ten minutes and you are for anything purpose no longer getting costs up to date?
You might also pooh pooh the possibilities of this happening however computerized system engineering is all about dealing with the surprising.
Secondly in case you're running on a couple of facts then it's perfectly viable to have concurrence issues; a fee would possibly arrive triggering an event however in case your function additionally is predicated on any other piece of information you still want to fear about whether or not that is stale or not.
So - prevent guffawing at the lower back. You've got nothing to be boastful about.
Outliers
There is one greater set of engineering that needs to be done in position era code; and that is dealing with outliers. Naturally you'll have screened your outliers to make certain they're authentic, and not bad facts.
You can get true outliers in both unmarried information collection, and also in more than one collection.
Let's take a single statistics collection instance. Suppose you are calculating volatility from a sequence of day by day returns. The ultimate each day return is.... As an alternative massive.
Is it reasonable to use the complete go back unadjusted for your calculation, resulting in a large spike in volatility? Or must we restriction the size of the return to avoid biasing the cost we get?
In this example there is additionally the option of using a better calculation. So as an instance you may use an interquartile range as opposed to a popular deviation; you may use medians in place of method, and so on.
Here's an instance from more than one series. This time we're calculating an index of industry returns from man or woman equity expenses. One corporation has a huge return, pushing the whole index up. Again do we live with this, or take action?
There is no proper or wrong way to deal with those, and the definition of 'outlier' (the factor at which a big point turns into too excessive) is scenario precise and all the way down to choice, however you need to don't forget and have a policy for them.
Done
Congratulations. You can now build a slightly safer trading algo than you could when started reading this. Now for the easy bit - designing one that will consistently make money :-)This is the third put up in a series on building systematic trading systems.
The first two posts are:
http://qoppac.Blogspot.Co.Uk/2015/04/machine-building-facts-seize.Html
http://qoppac.Blogspot.Co.Uk/2015/05/structures-constructing-futures-rolling.Html
The next put up is right here.