Sample Size Fallacy

When building trading systems the general, perhaps not so correct theory is that we should use as much data as we have available. Anyone who has had experience in looking for edges on sub daily data, will appreciate the struggle with finding anything that has worked well for > 5 years.

Needless to say that markets have changed greatly over the past few years since the rise of the machines. Most exchanges were not public or electronic until around 05′-06′ so ease of access has increased since then, which also has made for a larger variety of players especially retail growing exponentially in participation.

Certainly there are many traders who have done very well building a system from a small sample size. For example Gill Blake in market wizards only used 2 years of data to build his mutual fund market timing system.

Digging deeper into this topic the more I am starting to think that whilst trade sample size is important, the number of trades required for analysis can be an illusion. After all, it takes over 16,641 flips of a coin to have a 99% confidence level that the next flip will have a 50% chance of being heads. Lets run over the math used to determine this;

 Here are the number of std. dev. of normal distribution for the required confidence levels.

z = 3.08 = 99.8% confidence level
z= 2.58 = 99.0% confidence level
z=1.96 = 95.0% confidence level
z=1.645 = 90.0% confidence level

And here is the formula for computing how many samples n we need for a given confidence level;

n = ((z^2) * (std. dev. of sample^2)) / (( 1 – confidence level required)^2)

n = number of tests we need to run
z = std. dev. of normal distribution for the confidence level
std. dev. of sample = std. dev. from sample size we have seen
1 – confidence level required = how exact do we want it:
.90 confidence = 1 – .9 or .1 for the formula
.95 confidence = 1 – .95 or .05 for the formula
.99 confidence = 1 – .99 or .01 for the formula
.998 confidence = 1 – .998 or .002 for the formula

In the case of our coin toss experiment above we use the following;

n =  ((2.58^2)*(0.5^2))/(0.01^2)

n = 16,641 flips needed to determine that next flip is a 50% chance of heads or tails.

Taking note from the above, we can see that sample size required for high confidence levels are reasonably high. The problem we have here is that by looking for strategies with such large sample sizes we are almost forcing the market to fit our requirements, which will cause many good edges to be overlooked for the simple fact of not having a large enough sample.

More important issues to focus on, rather than trade size should be testing for if a system is curve fit or not and when to start/stop trading a system. I will be digging into these issues in the near future.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: