A p-value is one of the most used and most misread numbers in all of statistics. Most students can recite "if p is less than 0.05, reject the null" without being able to say what the p-value actually measures. By the end of this guide you will be able to define a p-value precisely, spot the three classic misinterpretations, and explain a result in plain language.

What a P-Value Actually Measures

A p-value answers one narrow question: if the null hypothesis were true, how surprising is the data you collected?

More precisely, the p-value is the probability of observing a result at least as extreme as your sample result, assuming the null hypothesis is true. The phrase "at least as extreme" matters — it includes your observed outcome and every outcome further from the null in the direction your test is checking.

Here is a concrete case. A coin is claimed to be fair (the null hypothesis: probability of heads = 0.5). You flip it 20 times and get 16 heads. The p-value is the probability of getting 16 or more heads (for a two-sided test, 16 or more or 4 or fewer) in 20 flips of a genuinely fair coin. That probability works out to about 0.012. So if the coin really were fair, a result this lopsided would happen only about 1.2% of the time.

A small p-value means your data would be unusual under the null. That is evidence against the null. A large p-value means your data is unremarkable under the null — no contradiction, so no reason to abandon it.

A single coin mid-flip casting a long shadow on a plain surface
A p-value asks how surprising your data would be if the null hypothesis were true.

The Three Things a P-Value Is Not

The misinterpretations below are so common that exam questions are written specifically to catch them.

It is not the probability that the null hypothesis is true. A p-value of 0.012 does not mean there is a 1.2% chance the coin is fair. The p-value is computed assuming the null is true — it cannot then turn around and tell you the probability of that assumption. The null is either true or it is not; the p-value is a statement about data, not about the hypothesis.

It is not the probability your result happened by chance. Every result in a random sample involves chance. The p-value does not measure "how much luck was involved." It measures how well the data fits one specific model of the world — the null model.

It is not a measure of effect size. A tiny, trivial difference can produce a very small p-value if your sample is large enough. P-values shrink as sample size grows, so "p < 0.001" tells you the effect is detectable, not that it is large or important. A statistically significant result can still be practically meaningless.

How to Read a P-Value Against Your Significance Level

Before collecting data you choose a significance level, written as alpha (α), usually 0.05. Alpha is the threshold for how much surprise you will tolerate before rejecting the null. The decision rule is mechanical:

  • If p ≤ α, reject the null hypothesis. The result is "statistically significant."
  • If p > α, fail to reject the null hypothesis.

Note the careful wording: you "fail to reject" the null — you never "accept" or "prove" it. A large p-value means your data did not contradict the null, not that the null is confirmed. Absence of evidence is not evidence of absence.

For the coin example, p = 0.012 is less than α = 0.05, so you reject the null and conclude the coin is not fair. If you had gotten 12 heads instead, the two-sided p-value would be about 0.50 — far above 0.05 — and you would fail to reject.

The 0.05 cutoff is a convention, not a law of nature. A result with p = 0.049 and one with p = 0.051 are nearly identical pieces of evidence; the line between them is arbitrary. Treat p-values as a continuous measure of evidence rather than a hard pass/fail switch.

Writing the Conclusion in Plain Language

On an exam, a p-value question almost always ends with "state your conclusion in the context of the problem." A strong answer has three parts: the decision, the reason, and the real-world meaning.

Weak: "p = 0.012 < 0.05, so reject the null."

Strong: "Because p = 0.012 is below the 0.05 significance level, we reject the null hypothesis. There is statistically significant evidence that the coin is not fair — a sample this extreme would occur only about 1.2% of the time if the coin were truly fair."

The strong version names the decision, ties it to the threshold, and translates the statistics back into the language of the original question. That last step is what graders look for, and it is the step that proves you understand the p-value rather than just the cutoff.

Getting Help

P-values live inside the larger machinery of hypothesis testing, so they make the most sense once you have seen a full test built from scratch. Walk through one in setting up a hypothesis test, and pair this with Type I vs. Type II error to see what can go wrong when you act on a p-value.

Conclusion

A p-value, explained simply, is the probability of data at least as extreme as yours when the null hypothesis is true — nothing more. It is not the chance the null is true, not the chance your result was luck, and not a measure of how big an effect is. Compare it to your significance level to make a decision, but report it as evidence on a sliding scale, and always translate the result back into the context of the question.