Originally published on techgyo.com
To have confidence in the validity of your systematic trading algorithms, you must test them. You need to ensure the resulting calculations meet your requirements before setting them loose in the market. In this article, Daniel Calugar, a data-driven investor, takes a look at the benefits of backtesting and the critical attributes for your historical testing data.
Comparing something that is known against an unknown is a time-tested method for proving the validity of calculations. If you know the result of a past set of circumstances and then can build a mechanism that will reliably predict that result, you can have high confidence that given a similar set of events in the future, the new calculation will again be able to predict the outcome.
Testing a trading algorithm model over historical data is a low-risk and way to measure reliability and accuracy. If you have the proper data set, you can gain a measure of confidence that your strategy should outperform in the future. But be very careful, with the number crunching power of today’s personal computers it is not too difficult to find what appears to be reliable patterns in purely random data. Unless a pattern is exploiting an identifiable market edge your out of sample returns are likely to vastly underperform the historical back tested results.
A key to accurate backtesting is high-quality historical data. Accurate data will tell you how your strategy would have performed if implemented over the time period covered by the data. If the information used for backtesting is inaccurate or incomplete, the testing can be ineffective and lead to poor results in future trading.
Time and effort invested in ensuring your historical data sets are high-quality will pay big dividends. For historical data to be used with confidence for algorithm backtesting, it must have the following four attributes:
Accuracy: The GIGO principle dictates that inaccurate data will lead to misleading backtesting results. When sourcing your data, the provider’s reputation for accurate information can be invaluable. Never assume that your data is good enough. Even a slight inaccuracy can lead to recommendations that will cost you money.
Normalization: Data normalization eliminates anomalies that can cause the analysis of the data to be more complicated. If your data provider has added, deleted, or updated data, anomalies may have been introduced. These structural errors can cause delays in backtesting or produce unreliable results.
Comprehensive: Just as the data must be accurate to be helpful, it must also be comprehensive. Of the utmost importance when selecting historical data for testing algorithms for systematic trading models is how well every possible aspect of the data you will measure is represented in the historical data.
Compatible: Backtesting will be easier if the data source is compatible with the software you intend to use. Even if the data you acquire is accurate, normalized, and comprehensive, if it requires a conversion process to make it compatible with your testing software, you’ve added more complication and room for error. If building your own data testing program, you may have more flexibility.
Historical backtesting is the best way to gain confidence in your trading algorithms, but it is only as valuable as the data is accurate, normalized, comprehensive, and compatible. Build your systematic trading models by testing over high-quality historical data, and you will be able to rely on the results.
In addition to having high quality data, be very careful not to become “fooled by randomness”. For example, if you have a high frequency day trading model, optimize your trading strategy rules by backtesting using just even market days’ data and then test your optimized set of trading criteria on the odd market days’ data. If your strategy lacks a real market edge, the odd days’ returns will fall far short of what you expected.