What if some of my expected counts are less than 5?

The chi-square approximation breaks down. Two fixes are common: combine related categories so that every expected count is at least 5, or use Fisher's exact test if the table is 2-by-2. Some texts allow one or two cells with E between 1 and 5 as long as no cell has E less than 1, but the safest move is to combine categories.

Why does the test only tell me the variables are related, not how?

Because χ² is a single number that adds up evidence from every cell. The test cannot distinguish between "Freshmen love Library" and "Juniors love Café" — both inflate the same statistic. To describe the association, look at each cell's contribution (O − E)² / E and the standardized residual (O − E) / √E to see which cells deviate most from independence.

Can I use chi-square with percentages instead of counts?

No. The chi-square test requires actual counts of subjects because its sampling distribution depends on the sample size. If you only have percentages, convert them back to counts using the sample size before computing expected counts and χ².

Chi-Square Test of Independence: Expected Counts, Step by Step

Q: What is the difference between chi-square independence and goodness-of-fit?

Independence compares two categorical variables in a contingency table and tests whether they are related (df = (r − 1)(c − 1)). Goodness-of-fit compares one categorical variable to a hypothesized distribution and tests whether the proportions match (df = k − 1). Both use the same χ² formula, but the setup and degrees of freedom are different.

A chi-square test of independence asks one question of a contingency table: are the two categorical variables related, or independent? The procedure is short — build a table of expected counts, sum up a single ratio, look up a p-value — but it relies on getting the expected counts right. This walkthrough builds the entire test from a 2-by-3 table.

What the Test Actually Tests

The chi-square test of independence works with two categorical variables measured on the same subjects. Examples: major and political affiliation; brand preference and age group; treatment and recovery outcome. The data sits in a contingency table where each cell counts how many subjects fall into that row-and-column combination.

The hypotheses are about the relationship between the two variables:

H₀: the two variables are independent in the population.
H₁: the two variables are associated (not independent).

"Independent" here has its usual probability meaning: knowing the row category gives you no information about the column category. If that is true in the population, the proportions inside each row should be the same as the overall column proportions, up to random sampling variation.

Expected Counts: The Heart of the Test

Every cell in the contingency table gets an expected count — what you would expect to see in that cell if the null hypothesis (independence) were true. The formula is:

E = (row total × column total) / grand total

That formula comes straight from the independence definition. If row category and column category are independent, P(row r AND column c) = P(row r) × P(column c). Multiply by the grand total n to convert the joint probability into an expected count: E = n × (row r total / n) × (column c total / n) = (row total × column total) / n.

A 2 by 3 contingency table with observed counts on the left and expected counts on the right — A 2-by-3 contingency table with observed counts on the left and expected counts on the right beside the chi-square formula

Once you have observed counts O and expected counts E for every cell, the test statistic is

χ² = Σ (O − E)² / E

with degrees of freedom df = (rows − 1) × (columns − 1). Large χ² means observed counts diverge from what independence predicts; small χ² means they match closely.

The Worked Example

A survey of 300 students asks for their preferred study spot (Library, Café, Home) and their year (Freshman, Junior). The observed table:

Freshmen — Library: 60, Café: 30, Home: 30. Row total = 120.
Juniors — Library: 50, Café: 90, Home: 40. Row total = 180.
Column totals — Library: 110, Café: 120, Home: 70. Grand total = 300.

Is there an association between year and preferred study spot? Test at α = 0.05.

Step 1 — Expected counts. For each cell, E = (row total × column total) / 300.

Freshmen × Library: (120 × 110) / 300 = 44
Freshmen × Café: (120 × 120) / 300 = 48
Freshmen × Home: (120 × 70) / 300 = 28
Juniors × Library: (180 × 110) / 300 = 66
Juniors × Café: (180 × 120) / 300 = 72
Juniors × Home: (180 × 70) / 300 = 42

A quick sanity check: each row of expected counts should sum to the original row total (44 + 48 + 28 = 120; 66 + 72 + 42 = 180), and each column to the original column total. They do.

Step 2 — Check the condition. A standard requirement is that every expected count is at least 5. The smallest here is 28, well above 5, so chi-square is valid.

Step 3 — Compute (O − E)² / E for each cell.

(60 − 44)² / 44 = 256 / 44 ≈ 5.82
(30 − 48)² / 48 = 324 / 48 = 6.75
(30 − 28)² / 28 = 4 / 28 ≈ 0.14
(50 − 66)² / 66 = 256 / 66 ≈ 3.88
(90 − 72)² / 72 = 324 / 72 = 4.50
(40 − 42)² / 42 = 4 / 42 ≈ 0.10

Step 4 — Sum the contributions.

χ² ≈ 5.82 + 6.75 + 0.14 + 3.88 + 4.50 + 0.10 = 21.19

Step 5 — Degrees of freedom and p-value. df = (2 − 1)(3 − 1) = 2. For χ² = 21.19 with 2 df, the p-value is far below 0.001 (the 0.05 critical value for df = 2 is 5.99).

Step 6 — Decision and conclusion. Reject H₀. There is statistically significant evidence at the 0.05 level that year and preferred study spot are associated. Looking at the largest contributions, Freshmen are over-represented at the Library and under-represented at the Café; Juniors are the mirror image. That is where the association comes from.

Conditions and Common Mistakes

The chi-square test of independence is valid when:

The data are counts of subjects, not percentages or averages.
The sample is random and the subjects are independent of each other.
Every expected count is at least 5. This is the rule of thumb that keeps the chi-square distribution a good approximation. If any expected count is small, combine categories or use Fisher's exact test for a 2-by-2 table.

The most common mistakes are computational. Students subtract on the wrong cells, divide by O instead of E, or forget to compute the expected counts at all. Write expected counts in a separate table next to the observed table — keeping them visually separate cuts down on errors.

The other frequent slip is interpretation. A significant chi-square says the two variables are associated, not which categories drive it. To say "Freshmen prefer the Library," you have to inspect the cells with the largest (O − E)² / E contributions, as we did above.

Goodness-of-Fit Is a Different Test

The chi-square goodness-of-fit test compares one categorical variable to a hypothesized distribution (e.g., are M&M colors equally common?). It uses the same χ² formula but has df = k − 1, where k is the number of categories, and there is only one row of data. The test of independence always has at least two rows and two columns, and its df is (rows − 1)(columns − 1). Same statistic, different setups — make sure you know which one the problem is asking for.

Getting Help

Chi-square tests sit inside the same hypothesis-testing framework as everything else, so setting up a hypothesis test is the right place to refresh the structure of H₀, H₁, decision rule, and conclusion. For an example of a test on means rather than counts, the one-way ANOVA walkthrough covers a comparable multi-group setup.

Conclusion

A chi-square test of independence checks whether two categorical variables are related. Build the expected-counts table from row totals × column totals / grand total, sum (O − E)² / E across all cells, compare to a chi-square distribution with (rows − 1)(columns − 1) df. In the study-spot example χ² = 21.19 on 2 df, far past any reasonable cutoff, so year of study and preferred study spot are not independent in this sample. The whole test is mechanical once you can write the expected-counts table cleanly.

Chi-Square Test of Independence: Building Expected Counts From a Contingency Table

What the Test Actually Tests

Expected Counts: The Heart of the Test

The Worked Example

Conditions and Common Mistakes

Goodness-of-Fit Is a Different Test

Getting Help

Conclusion

Frequently Asked Questions

Clear study guides,
straight to your inbox.

What the Test Actually Tests

Expected Counts: The Heart of the Test

The Worked Example

Conditions and Common Mistakes

Goodness-of-Fit Is a Different Test

Getting Help

Conclusion

Frequently Asked Questions

Keep reading

Probability Rules: Addition and Multiplication, Without the Confusion

Finish McGraw-Hill Homework Faster

Clear study guides,straight to your inbox.

Clear study guides,
straight to your inbox.