When two columns of numbers belong to the same subjects — a "before" and "after," twins, or matched controls — running an ordinary two-sample t-test throws away the most useful information in the data. The paired t-test fixes that by collapsing each pair into a single difference and then running a one-sample test on those differences. Here is the procedure, with one worked example, and the conditions you have to check.

When the Paired T-Test Is the Right Tool

Use a paired t-test when each value in one sample has a natural partner in the other. The classic cases:

  • Before/after on the same subject. Blood pressure on each of 12 patients before and after a drug; reaction time on each of 30 subjects pre- and post-caffeine.
  • Matched pairs. Twins randomly assigned to two conditions; left vs. right eye on the same person; classrooms matched on demographics with one in each condition.
  • Repeated measurements of the same unit. Yield from each of 8 fields under two different fertilizers, where each field is its own control.

If the two groups are independent — different people, no matching — you want a two-sample t-test. Mis-classifying the design is the single biggest error students make on this material.

The Trick: Collapse Pairs Into a Difference Column

The paired t-test is mechanically a one-sample t-test in disguise. For each pair i, compute the difference d_i = X_i (after) − Y_i (before). That gives you one new column of n numbers. From here on you only work with that column.

  • d̄ = mean of the differences
  • s_d = sample standard deviation of the differences
  • n = number of pairs

The hypotheses are now about the mean of the differences, μ_d:

  • H₀: μ_d = 0 (no average change)
  • H₁: μ_d ≠ 0, μ_d > 0, or μ_d < 0, depending on the question

The test statistic is

t = (d̄ − 0) / (s_d / √n)

with df = n − 1 (the number of pairs minus 1, not the total observations).

A spreadsheet with paired before, after, and difference columns
A table showing two columns of paired before-and-after values with a difference column highlighted

The Worked Example

A sleep clinic measures hours of sleep for 8 patients before and after starting a new bedtime routine. The before/after pairs in hours are:

  • Patient 1: 5.5 → 7.0 (diff +1.5)
  • Patient 2: 6.0 → 6.5 (diff +0.5)
  • Patient 3: 6.5 → 6.0 (diff −0.5)
  • Patient 4: 5.0 → 7.0 (diff +2.0)
  • Patient 5: 7.0 → 7.5 (diff +0.5)
  • Patient 6: 5.5 → 6.5 (diff +1.0)
  • Patient 7: 6.0 → 8.0 (diff +2.0)
  • Patient 8: 6.5 → 7.0 (diff +0.5)

Test whether the routine increases average sleep at α = 0.05.

Step 1 — Hypotheses. Right-tailed, because "increases."

  • H₀: μ_d = 0
  • H₁: μ_d > 0

Step 2 — Mean and SD of the differences. Sum of differences: 1.5 + 0.5 − 0.5 + 2.0 + 0.5 + 1.0 + 2.0 + 0.5 = 7.5. So d̄ = 7.5 / 8 = 0.9375 hours.

To get s_d, compute each deviation from d̄, square it, sum, divide by n − 1 = 7, take the square root. The squared deviations are roughly 0.316, 0.191, 2.066, 1.129, 0.191, 0.004, 1.129, 0.191. Sum ≈ 5.218. Variance = 5.218 / 7 ≈ 0.745. s_d ≈ 0.863 hours.

Step 3 — Test statistic.

t = 0.9375 / (0.863 / √8) = 0.9375 / (0.863 / 2.828) = 0.9375 / 0.305 ≈ 3.072

Step 4 — df and p-value. df = 8 − 1 = 7. For a right-tailed test with t = 3.072 and df = 7, the p-value is about 0.009.

Step 5 — Decision and conclusion. p = 0.009 is less than α = 0.05, so reject H₀. There is statistically significant evidence at the 0.05 level that the routine increases average nightly sleep. The estimated increase is about 0.94 hours, and a sample mean difference this large with only 8 patients would occur less than 1% of the time if the routine had no effect.

Why Pairing Helps (When the Design Allows It)

A two-sample t-test on the same 16 numbers — 8 "before" and 8 "after" treated as independent samples — would use the variability between patients in its standard error. But the night-to-night spread across different people (Patient 1 sleeps 5–7 hours, Patient 7 sleeps 6–8) is much larger than the within-patient change from the routine.

Pairing strips that between-patient noise out: each patient is compared to themselves, so the differences vary much less than the raw values do. A smaller standard deviation in the denominator means a larger t-statistic and more power to detect a real effect. The cost is half the df, but the SD reduction usually wins easily.

Conditions to Check

The paired t-test inherits the one-sample t-test's conditions, applied to the difference column:

  • Random and paired correctly. The differences must be a (roughly) simple random sample of the pairs in the population. Each pair must be a genuine match.
  • Normality of the differences. Either n is large (≥ 30 pairs) or the differences are approximately normal. With a small n and obvious skew or outliers in the difference column, consider a nonparametric alternative such as the Wilcoxon signed-rank test.
  • Independence between pairs. Pair 1 and Pair 2 should not be related to each other. The values within a pair are related on purpose; the pairs themselves should be independent.

Common Mistakes

The first is using an independent-samples test on paired data. It costs power and can change the conclusion. If each row of your data lines up two measurements of the same unit, the test is paired.

The second is getting the direction of the difference wrong. "After − before" is the conventional choice, but if you switch it to "before − after" your t-statistic flips sign. Either direction is valid; just make sure your alternative hypothesis matches the direction you computed.

The third is using n = 16 instead of n = 8 in the sleep example. n is the number of pairs, not the total observations. The df is also one less than the number of pairs.

Getting Help

If you are not yet sure whether your design is paired or independent, scan two-sample t-test for the contrast. To see the broader framework these decisions fit inside, setting up a hypothesis test covers null and alternative hypotheses, decision rules, and writing the conclusion.

Conclusion

The paired t-test answers the same kind of question as a two-sample t-test, but on data that comes naturally in matched pairs. Collapse the pairs into a difference column, run a one-sample t-test on it with n − 1 df, and translate the result back into the language of the original problem. On the sleep example the routine added about an hour of sleep, with t = 3.07 and p ≈ 0.009 — a result you would not have detected as cleanly without using the pairing.