A two-sample t-test compares the means of two independent groups — and the only real decision before the formula is whether to pool the variances or not. Courses sometimes show only the pooled version, sometimes only the Welch (unpooled) version, and exam problems often leave the choice to you. This walkthrough covers both, when to use each, and a full worked example.

What the Two-Sample T-Test Actually Tests

The independent-samples two-sample t-test compares two population means, μ₁ and μ₂, using one independent sample from each. The null hypothesis is that the two means are equal: H₀: μ₁ = μ₂, equivalent to μ₁ − μ₂ = 0. The alternative is whichever direction the problem implies.

Use this test when the two groups are independent — different people, different objects, no matching between observations. If the same subjects are measured twice or the data are matched pairs, you want a paired t-test instead. The two samples can be different sizes; that does not block the test.

The other conditions are the usual ones for a t-procedure: each sample is a roughly random sample from its population, and either the populations are approximately normal or the sample sizes are large enough (each n ≥ 30 is a common rule) for the Central Limit Theorem to apply.

The Pooled vs. Unpooled Decision

The two versions of the test differ in how they estimate the standard error of x̄₁ − x̄₂.

Pooled (equal variances assumed). If you believe the two populations have the same variance, you can combine the two sample variances into a single estimate, s²ₚ, which gives a more precise denominator:

  • s²ₚ = ((n₁ − 1)s₁² + (n₂ − 1)s₂²) / (n₁ + n₂ − 2)
  • SE = √(s²ₚ × (1/n₁ + 1/n₂))
  • df = n₁ + n₂ − 2

Unpooled (Welch's t-test). If the variances might differ, you do not pool. You compute the standard error from each sample's own variance and use a corrected degrees of freedom:

  • SE = √(s₁²/n₁ + s₂²/n₂)
  • df is given by the Welch–Satterthwaite formula (calculator output; usually not a whole number)
Two side-by-side normal curves illustrating equal and unequal variance
Two side-by-side normal curves with different spreads to illustrate equal and unequal variance

Which to use. When variances are clearly close (similar sample SDs, similar sample sizes), the pooled test is slightly more powerful. When they are not, the pooled test can be misleading. A quick rule used in many textbooks: pool if the larger sample standard deviation is less than twice the smaller; otherwise use Welch. Many software packages default to Welch because it is robust to unequal variances and only mildly less powerful when they are equal.

The Worked Example

A study compares exam scores for two sections of an intro stats class. Section A (n₁ = 20) has mean x̄₁ = 78 and SD s₁ = 8. Section B (n₂ = 25) has mean x̄₂ = 73 and SD s₂ = 9. Is there evidence that Section A scored higher on average? Test at α = 0.05.

Step 1 — Hypotheses. The instructor suspects A scored higher, so this is right-tailed:

  • H₀: μ_A − μ_B = 0
  • H₁: μ_A − μ_B > 0

Step 2 — Pool or not? The two SDs (8 and 9) are close — 9 is only 1.125 × 8 — so pooling is reasonable. We will use the pooled test.

Step 3 — Pooled variance.

s²ₚ = ((20 − 1)(8²) + (25 − 1)(9²)) / (20 + 25 − 2) = (19 × 64 + 24 × 81) / 43 = (1216 + 1944) / 43 = 3160 / 43 ≈ 73.49

So sₚ ≈ √73.49 ≈ 8.57.

Step 4 — Standard error.

SE = √(73.49 × (1/20 + 1/25)) = √(73.49 × (0.05 + 0.04)) = √(73.49 × 0.09) = √6.614 ≈ 2.572

Step 5 — Test statistic.

t = (x̄_A − x̄_B − 0) / SE = (78 − 73) / 2.572 = 5 / 2.572 ≈ 1.944

Step 6 — df and p-value. df = 20 + 25 − 2 = 43. For a right-tailed test with t = 1.944 and df = 43, the p-value from a t-table or calculator is approximately 0.029.

Step 7 — Decision and conclusion. p = 0.029 is less than α = 0.05, so reject H₀. There is statistically significant evidence at the 0.05 level that Section A's mean exam score is higher than Section B's. A difference of five points or more, with these sample sizes, would occur about 2.9% of the time if the two sections truly had the same mean.

If you had run the Welch version instead, the SE would be √(64/20 + 81/25) = √(3.2 + 3.24) = √6.44 ≈ 2.538, giving t ≈ 1.971 and df ≈ 41.6 — the conclusion would not change. Pooled and unpooled rarely disagree when the variances are close and sample sizes are not tiny.

Confidence Intervals From the Same Machinery

You can also build a confidence interval for μ₁ − μ₂ using the same standard error:

(x̄₁ − x̄₂) ± t* × SE

with t* from the t-distribution at your chosen confidence level and df. For the pooled example with 95% confidence, t* with df = 43 is about 2.017. The interval is 5 ± 2.017 × 2.572 = 5 ± 5.19, or roughly (−0.19, 10.19). Because the interval just barely contains zero, you would fail to reject a two-sided test at α = 0.05 — which is consistent: the one-sided p-value was 0.029, so the two-sided p-value would be about 0.058.

This pattern — a one-sided test rejecting while the corresponding two-sided test does not — is a frequent stumbling block. Make sure your tail matches the alternative the problem actually states.

Common Mistakes

The first is running it on paired data. If each row in the data set has a "before" and "after" for the same subject, the observations are not independent. Use a paired t-test; ignoring the pairing inflates the standard error and hurts power.

The second is pooling when the variances are not equal. With wildly different SDs the pooled formula understates the standard error for the group with larger variance and overstates it for the smaller. When in doubt, use Welch — it costs almost nothing in power and is safer.

The third is confusing the test statistic's df with n − 1. With two samples the pooled df is n₁ + n₂ − 2, not n₁ − 1 or n₂ − 1. Welch's df is a different formula altogether and is rarely a whole number.

Getting Help

If the deeper question is which test to use at all — between a t-test and a z-test, between one sample and two — work through t-test vs. z-test. For the matched-pairs case, the paired t-test walkthrough shows how the procedure changes when the two columns of data go together row by row.

Conclusion

A two-sample t-test compares two independent means using either a pooled or an unpooled standard error. Read the sample SDs to decide which to use, compute the test statistic with the matching standard error and df, and translate the result into a sentence about the original groups. The exam version of this test almost always reduces to plugging four numbers — two means and two SDs — into one of the two formulas above.