A hypothesis test word problem can feel like a wall of text with a number buried in it. The skill is turning that paragraph into a fixed five-step procedure. This walkthrough takes one problem from the first sentence to a written conclusion, so setting up a hypothesis test becomes a routine you can repeat on any exam.
The Problem We Will Solve
A coffee chain claims its large cup contains 16 ounces on average. A skeptical customer suspects the cups are underfilled. She measures 40 large cups and finds a sample mean of 15.8 ounces with a sample standard deviation of 0.6 ounces. At a 5% significance level, is there evidence the cups are underfilled?
Every hypothesis test follows the same five steps. We will apply each to this problem.
Step 1: Write the Hypotheses
Every test has two competing claims about a population parameter — here the true mean fill, μ.
The null hypothesis (H₀) is the "no effect" or status-quo claim. It always contains an equality. The company's claim is the status quo: H₀: μ = 16.
The alternative hypothesis (Hₐ) is what the test is trying to find evidence for. It comes from the question's intent. The customer "suspects underfilling," so she is looking for evidence the mean is less than 16: Hₐ: μ < 16.
The direction of Hₐ sets whether the test is one-tailed or two-tailed. "Less than" or "greater than" gives a one-tailed test; "different from" or "not equal to" gives a two-tailed test. Here, "underfilled" points one direction, so this is a one-tailed (left-tailed) test. Read the wording carefully — this choice changes the critical value.
Step 2: Choose the Test and Significance Level
The significance level, alpha (α), is the threshold for the evidence. The problem hands it to you: α = 0.05.
Choosing the test type follows one question — is the population standard deviation known? Here we only have the sample standard deviation, s = 0.6, computed from the 40 cups. An estimated standard deviation calls for a t-test. The degrees of freedom are n − 1 = 40 − 1 = 39.
A t-test about a mean assumes the data is roughly normal or the sample is reasonably large, and that the observations are independent. With n = 40 the sample is large enough that the normality condition is satisfied through the central limit theorem, and measuring 40 separate cups keeps the observations independent. Always state these conditions on an exam — many rubrics award a point for checking them before any arithmetic.
Step 3: Compute the Test Statistic
The t-statistic measures how many standard errors the sample mean sits from the value claimed in H₀:
t = (x̄ − μ₀) ÷ (s ÷ √n)
Plug in the numbers:
- Standard error = s ÷ √n = 0.6 ÷ √40 = 0.6 ÷ 6.32 = 0.0949.
- t = (15.8 − 16) ÷ 0.0949 = −0.2 ÷ 0.0949 = −2.11.
The sample mean sits 2.11 standard errors below the claimed mean. The negative sign matches the left-tailed alternative — the data leans the direction the customer suspected.
Step 4: Find the Critical Value or P-Value
There are two equivalent ways to make the decision. Pick one.
The critical-value method. Look up the t-value that cuts off 5% in the left tail at df = 39. That critical value is about −1.685. The rejection region is everything to the left of −1.685.
The p-value method. Find the probability of a t-statistic as extreme as −2.11 in the left tail at df = 39. That p-value is about 0.021.
Both describe the same picture. The test statistic of −2.11 is more extreme than the critical value of −1.685, and equivalently the p-value of 0.021 is below α = 0.05.
Step 5: Make the Decision and State the Conclusion
The decision rule:
- Critical-value method: −2.11 falls inside the rejection region (it is less than −1.685), so reject H₀.
- P-value method: 0.021 < 0.05, so reject H₀.
Now write the conclusion in the context of the problem — the step graders weigh most. Do not stop at "reject the null."
Because the test statistic of −2.11 falls in the rejection region (p ≈ 0.021 < 0.05), we reject the null hypothesis. There is statistically significant evidence at the 5% level that the large cups are underfilled — the true mean fill is less than the claimed 16 ounces.
If the statistic had landed short of the critical value, the conclusion would instead read: "we fail to reject the null hypothesis; there is not enough evidence to conclude the cups are underfilled." Note the wording — fail to reject, never accept or prove the null.
Getting Help
The choice in Step 2 is covered in depth in t-test vs. z-test, and Step 4's p-value is unpacked in p-values explained. Together they fill in the two steps students most often rush.
Conclusion
Setting up a hypothesis test is a five-step routine: write H₀ and Hₐ from the wording, choose the test and alpha, compute the test statistic, compare it to a critical value or p-value, and state the decision in plain context. The coffee-cup problem ran straight through that template, and so will the next one. Once the five steps are automatic, the only real work left in any problem is careful arithmetic and reading the question for direction.