The central limit theorem (CLT) is the single most quoted result in intro statistics — and the single most misremembered. It does not say data are normally distributed. It does not even say large samples are normally distributed. It says something narrower and far more useful: the sampling distribution of the mean becomes approximately normal as n grows, no matter what the original distribution looks like. Here is exactly what it claims, when it applies, and why it is the bedrock of inference.
What the Central Limit Theorem Actually Says
For any population with mean μ and finite standard deviation σ, if you take random samples of size n and compute the sample mean x̄, then as n becomes large, the sampling distribution of x̄ is approximately normal with:
- mean μ
- standard deviation σ/√n (the standard error of the mean)
That is the whole statement. Three things are worth pulling apart:
- The population can be anything with a finite variance — skewed, bimodal, discrete, weird. The CLT does not care about the population shape.
- The result is about the sample mean, not the data themselves. Individual observations from a skewed population stay skewed no matter how large n is.
- "Large n" is approximate, not magic. The CLT becomes a good approximation faster for nearly-symmetric populations and slower for very skewed ones.
The CLT is what lets you use a z-test, a t-test, or a normal-based confidence interval on data that are clearly not normal. The procedure works because it is operating on x̄, whose sampling distribution is normal, not on the raw data.
A Demonstration With Dice
Roll one fair six-sided die. The distribution of outcomes is uniform — equal probability 1/6 for each of 1 through 6. It is flat, not bell-shaped, and its mean is 3.5 with σ ≈ 1.71.
Now roll two dice and average the results. The possible averages run from 1 to 6, but the middle values (around 3.5) are much more likely than the extremes — an average of 1 requires both dice to show 1, while an average of 3.5 happens many ways. The histogram of averages already starts to peak in the middle.
Roll thirty dice and average. The histogram of those averages is sharply bell-shaped, centered at 3.5, with most of the mass between about 3.0 and 4.0. The original distribution was as un-bell-shaped as a discrete distribution gets, yet by n = 30 the sampling distribution of x̄ is essentially normal.
This is the CLT in action: averaging mixes outcomes together, the extreme combinations get rare, and the result piles up around the population mean in a normal shape.
How Large Does n Have to Be?
There is no single threshold. The standard rule of thumb is n ≥ 30 for most populations, but the right answer depends on the original distribution's shape.
- Approximately symmetric, no extreme outliers (uniform, mildly skewed): n = 15 to 25 is usually plenty.
- Roughly bell-shaped (already close to normal): the sampling distribution is essentially normal for any n.
- Strongly skewed or heavy-tailed (income data, waiting times, certain financial returns): even n = 30 may not be enough. n = 50 or more is safer.
- Highly skewed with rare large values (insurance claims, lottery payouts): the CLT can take hundreds or thousands of observations to kick in, and you should use a transformation or a nonparametric method instead.
The "n ≥ 30" rule is fine for a textbook problem that says the population is "moderately skewed." Real data require judgment. A histogram of the sample is usually the easiest check: if it is roughly symmetric, you can trust the CLT at smaller n than 30; if it is heavily skewed, you need a larger n.
A Worked CLT Problem
A bus's arrival time is uniformly distributed between 0 and 10 minutes past the hour. So μ = 5 minutes and σ = 10/√12 ≈ 2.89 minutes (variance of a uniform on [a, b] is (b − a)²/12). You record arrival times for 50 buses. What is the probability the average arrival time exceeds 6 minutes?
Step 1 — Conditions. The population is uniform — not normal — but n = 50 is well above the rule of thumb, so the CLT applies and the sampling distribution of x̄ is approximately normal.
Step 2 — Parameters of the sampling distribution.
- Mean of x̄ = μ = 5 minutes
- SE = σ/√n = 2.89/√50 ≈ 2.89/7.07 ≈ 0.408 minutes
Step 3 — Standardize.
z = (6 − 5) / 0.408 ≈ 2.45
Step 4 — Probability. P(z > 2.45) from a standard normal table ≈ 0.0071, or about 0.7%.
So in a sample of 50 buses from this uniform process, the chance the mean arrival time exceeds 6 minutes — a full minute above the population mean — is under 1%. The single-bus probability of an arrival past 6 minutes is 0.4 (any value from 6 to 10 in a uniform on 0 to 10), but the average of 50 buses is so concentrated near 5 that exceeding 6 is rare.
Common Misunderstandings
The CLT is famous, which is why it is also famously misquoted. Three traps:
- "Large samples are normal." No. The data are not normal — they have whatever distribution the population has. It is the sampling distribution of the mean that is normal.
- "Any statistic of a large sample is normal." No. The CLT in its standard form is about sums and means. Other statistics (the median, the maximum) have their own sampling distributions, which may or may not be normal.
- "n ≥ 30 always works." No. With a very skewed population, n ≥ 30 may not be enough. The rule is a default, not a guarantee.
Getting Help
The CLT is one half of the foundation for inference — the other half is the sampling distribution, which is what the CLT describes the shape of. To see the CLT cashed out in an actual procedure, reading a normal distribution table covers how to convert a sampling-distribution z-score into a probability or critical value.
Conclusion
The central limit theorem says the sampling distribution of the sample mean becomes approximately normal as the sample size grows, regardless of the population's shape, with mean μ and standard deviation σ/√n. It is what lets you use z- and t-procedures on data that are obviously not normal, and it explains why every introductory inference formula has a √n in it. Keep three things straight — it is about the sample mean, the approximation depends on the population's shape, and "n ≥ 30" is a default rather than a law — and the CLT will do most of the work in every inference problem you ever see.