Testing for statistical significance
Lets say you flip a coin ten times and it comes up heads six out of ten times. Would you think this coin is biased? Probably not.
Now lets say you flip a coin a thousand times and it comes up heads six hundred times. You would probably think it is biased. But why?
In both cases the coin comes up heads 60% of the time. How can we quantify our intuition here? We will turn to the world of hypothesis testing to answer this question.
The null hypothesis is the statement we are trying develop evidence against and the alternative hypothesis is its complement.
Here our null hypothesis is that the coin is fair. The alternative hypothesis is that the coin is biased.
One useful question for us to ask is what is the chance that we see an event this or more extreme due to random chance assuming the coin where unbiased?
The statistical term for this is p-value.
If our p-value is small it means that it is unlikely that we see six hundred heads out of one thousand flips coming up heads if the coin were fair. So a smaller p-value would make us conclude that the coin is indeed biased.
But how small of a p-value is small enough for us to conclude the coin is biased? In statistics, this value is called
It is the chance that we conclude that the coin is biased when in reality it isn’t. This kind of error is called type I error.
So if
Let’s dig into the math of computing p-values. By definition, the p-value is the chance that we see six hundred or more heads out of a thousand flips assuming the coin where unbiased.
Let
The number of heads that we get is the sum of X from 1 to a thousand. We will define this new random variable as
Since H is the sum of a large number of independent and identically distrubuted random variables, we can use the central limit theorem to conclude that in addition to having the expected value and variance written above we know that
To get the p-value from the z-score we use the standard gaussian cumulative density function as follows:
This p-value is less than 0.05 so we can conclude that the coin is biased.
Below is some Julia code that conducts a hypothesis test for a coin flip example. The input parameters are the number of heads and the number of coin flips. The source file can be found here. I encourage you to play around with the number of heads and total number of flips to get an intuition for what is statistically significant and what isn’t.
using Distributions
# Input Parameters
num_heads = 60
num_flips = 100
@assert num_heads <= num_flips "Number of heads must be less than or equal to the number of flips"
alpha = 0.05 # Significance level
# Mean and variance of number of heads if the coin were unbiased
mu = 0.5 * num_flips
var = 0.25 * num_flips
# Z-Score of our observation of num_heads
z = (num_heads - mu) / sqrt(var)
# Compute p-value
p = 2 * (1 - cdf(Normal(), abs(z)))
println("p-value: ", p)
if (p < alpha)
println("p < 0.05 so we can reject the null hypothesis and conclude the coin is biased")
else
println("p > 0.05 so we cannot reject the null hypothesis and we cannot conclude that the coin is biased")
end