High School: Statistics and Probability
High School: Statistics and Probability
Interpreting Categorical and Quantitative Data HSS-ID.C.9
9. Distinguish between correlation and causation.
One of the most common mistakes made when analyzing statistics is confusing correlation and causation, or, more specifically, assuming that correlation implies causation. What happens when we assume something? We don't know, but it's rarely good.
Students should already know that correlation is a measure of the strength of the association of two variables. In particular, they should know that the linear correlation coefficient r can range from -1 to 1, with a value of -1 suggesting a strong negative correlation and a value of 1 suggesting a strong positive correlation. At the very least, they should be aware that this coefficient exists in the world.
As strong as that r guy might be (and his black belt speaks for itself), he's still a correlation coefficient. While r can prove that two variables are strongly correlated, it can't prove that one causes the other.
For instance, let's say we find that there's a strong positive linear correlation between the age of a tree and how many apples it produces. In fact, this correlation is so strong that r = 0.99. Does that mean the age of the tree causes more apples to grow?
Maybe, right? After all, it makes sense that the older a tree gets, the more fruit it will bear, and that explains why r has the strength of the Incredible Hulk. So we can say that a tree's age will cause more apples to grow, right?
Wrong.
While the two variables are strongly correlated, we cannot prove that one causes the other because there may be a zillion other factors we haven't considered. What about rainfall? Did the farmer use fertilizer? Did he prune the trees? What were the summer and winter temperatures? Any one of these factors may have influenced the number of apples.
Basically, students should have it drilled into their heads that it takes a lot more than a strong correlation to prove causation.
The key thing here is that correlation does not imply causation. We'll say it again, because it's that important: Correlation does not imply causation. It's crucial that students understand and remember that. In fact, they should have it tattooed on their foreheads so they never forget it. Correlation does not imply causation. Got it?