Positional Option Trading. Euan Sinclair
at historical data, but behavioral finance can help to identify real inefficiencies. For example, post-earnings announcement drift can be explained in terms of investor underreaction. Together with historical data, this gives me enough confidence to believe that the edge is real. The data suggest the trade, but the psychological reason gives a theoretical justification.
High-Level Approaches: Technical Analysis and Fundamental Analysis
Technical analysis is the study of price and volume to predict returns.
Technical Analysis
Aronson (2007) categorized technical analysis as either subjective or objective. It is a useful distinction.
Subjective technical analysis incorporates the trader's discretion and interpretation of the data. For example, “If the price is over the EWMA, I might get long. It depends on a lot of other things.” These methods aren't wrong. They aren't even methods. Subjectivity isn't necessarily a problem in science. A researcher subjectively chooses what to study and then subjectively chooses the methods that make sense. But if subjectivity is applied as part of the trading approach, rather than the research, then there is no way to test what works and what doesn't. Do some traders succeed with subjective methods? Obviously. But until we also know how many fail, we can't tell if the approach works. Further, the decisions different traders who use ostensibly the same method make won't be the same or even based on the same inputs. There is literally no way to test subjective analysis.
Some things that are intrinsically subjective are Japanese candlesticks, Elliot waves, Gann angles, trend lines, and patterns (flags, pennant, head, and shoulders, etc.). These aren't methods. In the most charitable interpretation, they are a framework for (literally) looking at the market. It is possible that using these methods can help the trader implicitly learn to predict the market. But more realistically, subjective technical analysis is almost certainly garbage. I can't prove the ideas don't work. No-one can. They are unfalsifiable because they aren't clearly defined. But plenty of circumstantial evidence exists that this analysis is worthless. None of the large trading firms or banks has desks devoted to this stuff. They have operations based on stat arb, risk arb, market-making, spreading, yield curve trading, and volatility. No reputable, large firm has a Japanese candlestick group.
As an ex-boss of mine once said, “That isn't analysis. That is guessing.”
Any method can be applied subjectively, but only some can be applied objectively. Aronson (2007) defines objective technical analysis as “well-defined repeatable procedures that issue unambiguous signals.” These signals can then be tested against historical data and have their efficacy measured. This is essentially quantitative analysis.
It seems likely that some of these approaches can be used to make money in stocks and futures. But each individual signal will be very weak and to make any consistent money various signals will need to be combined. This is the basis of statistical arbitrage. This is not within the scope of this book.
However, we do need to be aware of a bad classic mistake when doing quantitative analysis of price or return data: data mining.
This is where we sift through data using many methods, parameters, and timescales. This is almost certain to lead to some strategy that has in-sample profitability. When this issue is confined to choosing the parameters of a single, given strategy it is usually called overfitting. If you add enough variables, you can get a polynomial to fit data arbitrarily well. Even if you choose a function or strategy in advance, by “optimizing” the variables you will the get the best in-sample fit. It is unlikely to be the best out of sample. Enrico Fermini shared that the mathematician and economist John von Neumann said, “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk” (Dyson, 2004).
This mistake isn't only made by traders. Academics also fall into the trap. The first published report of this was Ioannidis (2005). Subsequently, Harvey et al. (2016) and Hou et al. (2017) discussed the impact of data mining on the study of financial anomalies.
There are a few ways to avoid this trap:
The best performer out of a sample of back-tested rules will be positively biased. Even if the underlying premise is correct, the future performance of the rule will be worse than the in-sample results.
The size of this bias decreases with larger in-sample data sets.
The larger the number of rules (including parameters), the higher the bias.
Test the best rule on out-of-sample data. This gives a better idea of its true performance.
The ideal situation is when there is a large data set and few tested rules.
Even after applying these rules, it is prudent to apply a bias correcting method.
The simplest is Bonferroni's correction. This scales any statistical significance number by dividing by the number of rules tested. So, if your test for significance at the 95% confidence level (5% rejection) shows the best rule is significant, but the rule is the best performer of 100 rules, the adjusted rejection level would be 5%/100 or 0.005%. So, in this case, a t-score of 2 for the best rule doesn't indicate a 95% confidence level. We would need a t-score of 2.916, corresponding to a 99.5% level for the single rule. This test is simple but not powerful. It will be overly conservative and skeptical of good rules. When used for developing trading strategies this is a strength.
A more advanced test is White's reality check (WRC). This is a bootstrapping method that produces the appropriate sampling distribution for testing the significance of the best strategy. The test has been patented and commercial software packages that implement the test can be bought. However, the basic algorithm can be illustrated with a simple example.
We have two strategies, A and B, which produce daily returns of 2% and 1% respectively. Each was developed by looking at 100 historical returns. We can use WRC to determine if the apparent outperformance of strategy A is due to data mining:
Using sampling with replacement, generate a series of 100 returns from the historical data.
Apply the strategies (A and B) to this ahistorical data to get the pseudo-strategies A' and B'.
Subtract the mean return of A from A' and B from B'.
Calculate the average return of the return-adjusted strategies, A” and B”.
The larger of the returns of A” and B” is the first data point of our sample distribution.
Repeat the process N times to generate a complete distribution. This is the sampling distribution of the statistic, maximum average return of the two rules with an expected return of zero.
The p-value (probability of our best rule being truly the better of the two) is the proportion of the sampling distribution whose values exceed the returns of A, that is, 2%.
A realistic situation would involve comparing many rules. It is probably worth paying for the software.
There is also a totally different and complementary way to avoid overfitting. Forget about the time series of the data and study the underlying phenomenon. A hunter doesn't much care about the biochemistry of a duck, but she will know a lot about their actual behavior. In this regard a trader is a hunter, rather than a scientist. Forget about whether volatility follows a GARCH(1,1) or a T-GARCH(1,2) process; the important observation is that it clusters in the short term and mean reverts in the long term. If the phenomenon is strong enough to trade, it shouldn't be crucial what exact model is used. Some will always be better in a sample, but that is no guarantee that they will work best out of a sample.
As an example, this is the correct way to find a trading strategy.
There is overwhelming evidence that stocks have momentum. Stocks that have outperformed tend to continue outperforming. This has been observed for as long as we