Chapter 9: Analysis of Two-Way Tables Lecture Presentation Slides Macmillan Learning 2017 Chapter 9 Analysis of Two-Way Tables 9.1 Inference for Two-Way Tables 9.2 Goodness of Fit 2 9.1 Inference for Two-Way Tables Two-way tables Expected cell counts The chi-square statistic The chi-square distributions The chi-square test Computations Computing conditional distributions Models for two-way tables 3 Two-Way Tables When the data are obtained from random sampling, two-way tables of counts can be used to formally test the hypothesis that the two categorical variables are independent in the population from which the data were obtained.

4 5 The Chi-Square Test Null Hypothesis: The rows of a two-way table are the values of one categorical variable, and the columns are the values of the other variable. The count in any particular cell of the table equals the number of subjects who fall into that cell. We want to test the hypothesis (H0) that there is no relationship between the two categorical variables. Alternative Hypothesis: The alternative hypothesis (Ha) is that there is an association between the variables. The alternative hypothesis does NOT specify any particular direction for the association. For two-way tables in general, the alternative includes many different possibilities. Because of that, we cannot describe Ha as either one-sided or two-sided. 6 Expected Cell Counts To do a test of hypothesis, we compare actual counts from the sample data with expected counts, where the latter counts are those expected when there is no relationship between the two variables. The expected count in any cell of a two-way table when H0 is true is row total column total expected count=

7 The Chi-Square Statistic 8 To see if the data give convincing evidence against the null hypothesis, we compare the observed counts from our sample with the expected counts assuming H0 is true. Assume there are r rows in the two-way table and c columns, which means there are r c cells. The test statistic that makes the comparison is the chi-square statistic. The chi-square statistic is a measure of how far the observed counts are from the expected counts. The formula for the statistic is: where observed represents an observed cell count, expected represents the expected count for the same cell, and the sum is over all r c cells in the table. The Chi-Square Distributions (1) When the observed counts are very different from the expected counts, a large value of c2 will result, providing evidence against the null hypothesis. When the observed and expected counts are in close agreement, a small value of c2 will result. The chi-square distributions are a family of distributions that

take only positive values and are skewed to the right. A particular c2 distribution is specified by giving its degrees of freedom (r 1)(c 1) degrees of freedom. 9 The Chi-Square Distributions (2) The P-value is the area under the density curve of this c2 distribution to the right of the value of the test statistic. The c2 distribution Critical Values table (Table F) gives the upper critical values for the c2 distribution. 10 11 Cell Counts Required for the Chi-Square Test 12

The chi-square test is an approximate method that becomes more accurate as the counts in the cells of the table get larger. We must therefore check that the counts are large enough to allow us to trust the P-value. Fortunately, the chi-square approximation is accurate for quite modest counts. Cell Counts Required for the Chi-Square Test You can safely use the chi-square test with critical values from the chi-square distribution when the average of the expected counts is 5 or more and all individual expected counts are 1 or greater. In the special case of a 2 2 table, all four expected counts should be 5 or more. Steps in Chi-Square Test The calculations required to analyze a two-way table are straightforward but tedious. In practice, it is recommended to use software. However, it is possible to do the work with a calculator. When analyzing relationships between two categorical variables, follow this procedure: 1) Calculate descriptive statistics that convey the important information in the table. Usually, these will be column or row percents. 2) Find the expected counts and use them to compute the c2 statistic. 3) Compare your c2 statistic to the chi-square critical values from Table F to find the approximate P-value for your test. 4) Draw a conclusion about the association between the row and column variables. 13 Example 1

14 15 Example 2 16 Two Different Models for c Test 2 In addition to the conditions on expected cell counts, there is another important assumption required for validity of the chi-square test. It must be true that we have a simple random sample of subjects from a population of subjects, and each subject in the population falls into one and only one cell of the two-way table. There is a distinctly different situation in which the chi-square test may also be used. Suppose we have c independent random samples from c different populations, and we classify the individuals within each sample according to a categorical variable that takes on r values. We now have an r c table that looks the same as before but is obtained via a different sampling scheme. In this second situation, we may use the chi-square test to test the hypothesis: H0: the distribution of the categorical variable is the same in each of the c populations 17

The Chi-Square Test -- Summary 18 In summary, the chi-square test can be used when sampling is done in either of the following two ways: Independent SRSs from two or more populations, with each individual classified according to one categorical variable A single SRS, with each individual classified according to both of two categorical variables In either scenario, in order for the chi-square test to be valid, the following should be true: The average of the expected cell counts should be at least 5. All individual cell counts should be at least 1. In a 2 2 table, all four expected cell counts should be at least 5. Computing Conditional Distributions The calculated percents within a two-way table represent the conditional distributions describing the relationship between both variables. For every two-way table, there are two sets of possible conditional distributions (column percents or row percents). For column percents, divide each cell count by the column total. The sum of the percents in each column should be 100, except for possible small roundoff errors. When one variable is clearly explanatory, it makes sense to describe the relationship by comparing the conditional distributions of the response variable for each value (level) of the explanatory variable. 19

Comparing Conditional Distributions 1 20 Market researchers suspect that background music may affect the mood and buying behavior of customers. One study in a supermarket compared three randomly assigned treatments: no music, French accordion music, and Italian string music. Under each condition, the researchers recorded the numbers of bottles of French, Italian, and other wine purchased. Here is a table that summarizes the data: Wine None French Italian Total French 30 39 30 99

Italian 11 1 19 31 Other 43 35 35 113 Total 84 75 84 243

PROBLEM a) Calculate the conditional distribution (in proportions) of the type of wine sold for each treatment. b) Make an appropriate graph for comparing the conditional distributions in part (a). c) Are the distributions of wine purchases under the three music treatments similar or different? Give appropriate evidence from parts (a) and (b) to support your answer. Comparing Conditional Distributions 2 21 (a) When no music was playing, the distribution of wine purchases was French: Italian: Other: When French accordion music was playing, the distribution of wine purchases was French: Italian: Other: When Italian string music was playing, the distribution of wine purchases was French: Italian:

Other: The type of wine that customers buy seems to differ considerably across the three music treatments. Sales of Italian wine are very low (1.3%) when French music is playing but are higher when Italian music (22.6%) or no music (13.1%) is playing. French wine appears popular in this market, selling well under all music conditions but notably better when French music is playing. For all three music treatments, the percent of Other wine purchases was similar. Calculating Expected Cell Counts 1 22 Finding expectedofcounts not that difficult, as the the study following overallthe proportion Frenchis wine bought during wasexample 99/243 = The illustrates.

0.407. So the expected counts of French wine bought under each treatment are No music: French music: Italian music: The null hypothesis in the wine and music experiment is that theres no difference in the distribution of wine purchases in the store when no music, French accordion music, or Italian string music is played. The Italianwe wine bought during the study = see Tooverall find theproportion expected of counts, start by assuming that H0 iswas true.31/243 We can 0.128. So the expected counts of Italian wine bought under each treatment are from the two-way table that 99 of the 243 bottles of wine bought during the study were French wines. No music: French music: Italian music: Wine None

French Italian Total If the specific type of music thats French 30 39 30 99 playing has no effect on wine The overall proportion of Other wine bought during11the study = 31 Italian 1 was 113/243 19 purchases, the proportion of French 0.465. the expected of Other wine bought under each treatment are wine So

sold under each counts music condition Other 43 35 35 113 should be 99/243 = 0.407. 84 75 84 243 No music: French Total music: Italian music: Calculating Expected Cell Counts 2 Music Consider the expected count of French wine bought when no music was playing: 99 84 =34.22 243 23

Observed Counts Wine None French Italian Total French 30 39 30 99 Italian 11 1 19

31 Other 43 35 35 113 Total 84 75 84 243 The values in the calculation are the row total for French wine, the column total for no music, and the table total. We can rewrite the original calculation as = 34.22 Expected Counts

The expected count in any cell of a two-way table when H00 is true is expected count = row total column total table total The Chi-Square Calculation Observed Counts Music 24 Expected Counts Wine None French Italian Total Wine None

French Italian Total French 30 39 30 99 French 34.22 30.56 34.22 99 Italian 11

1 19 31 Italian 10.72 9.57 10.72 31 Other 43 35 35 113 Other 39.06

34.88 39.06 113 Total 84 75 84 243 Total 84 75 84 243 For the French wine with no music, the observed count is 30 bottles and the expected count is 34.22. The contribution to the c 2 statistic for this cell is (Observed - Expected)2 (30 - 34.22) 2 =

=0.52 Expected 34.22 The c 2 statistic is the sum of nine such terms : (Observed - Expected)2 (30 - 34.22) 2 (39 - 30.56) 2 (35 - 39.06) 2 c = = + + ...+ Expected 34.22 30.56 39.06 2 =0.52 + 2.33 + ...+ 0.42 =18.28 The c2 Statistic and its P-Value 25 H0: There is no difference in the distributions of wine purchases at this store when no music, French accordion music, or Italian string music is played. Ha: There is a difference in the distributions of wine purchases at this store when no music, French accordion music, or Italian string music is played. Music Wine None

French Italian Total French 30 39 30 99 Italian 11 1 19 31 Other 43

35 35 113 Total 84 75 84 243 Our calculated test statistic is 2 = 18.28. To find the P-value using a chi-square table look for df = (3-1)(3-1) = 4. P df .0025 .001

4 16.42 18.47 The small P-value (between 0.001 and 0.0025) gives us convincing evidence to reject H00 and conclude that there is a difference in the distributions of wine purchases at this store when no music, French accordion music, or Italian string music is played. Models for Two-Way Tables The chi-square test is a technique that may be used to compare the distributions of a categorical variable in several populations or to test for evidence of a relationship between two categorical variables. We can either: Compare several populations: Randomly select several SRSs each from a different population (or from a population subjected to different treatments) experimental study. Test for independence: Take one SRS and classify the individuals in the sample according to two categorical variables (attribute or condition) observational study, historical design. Both models use the c2 test to test the hypothesis of no relationship.

26 Comparing Several Populations 1 27 Select independent SRSs from each of c populations, of sizes n1, n2, . . . , nc. Classify each individual in a sample according to a categorical response variable with r possible values. There are c different probability distributions, one for each population. The null hypothesis is that the distributions of the response variable are the same in all c populations. The alternative hypothesis says that these c distributions are not all the same. Comparing Several Populations 2 28 Random digit dialing telephone surveys used to exclude cell phone numbers. If the opinions of people who have only cell phones differ from those of people who have landline service, the poll results may not represent the entire adult population. The Pew Research Center interviewed separate random samples of cell-only and landline telephone users who were less than 30 years old. Heres what the Pew survey found about how these people describe their political party affiliation. Cell-only sample Landline sample

Democrat or lean Democrat 49 47 Refuse to lean either way 15 27 Republican or lean Republican 32 30 Total 96 104 We want to perform a test of: H0: There is no difference in the distribution of party affiliation in the cell-only and landline populations. Ha: There is a difference in the distribution of party affiliation in the cell-only and landline populations.

Comparing Several Populations 3 29 If the conditions are met, we can conduct a chi-square test. Random: The data came from separate random samples of 96 cellonly and 104 landline users. Large Sample Size: We used a calculator to determine the expected counts for each cell. The calculator screenshot confirms all expected counts 5. Independent: Researchers took independent samples of cell-only and landline phone users. Sampling without replacement was used, so there needs to be at least 10(96) = 960 cell-only users under age 30 and at least 10(104) = 1040 landline users under age 30. This is safe to assume. Comparing Several Populations 4 Because the conditions are satisfied, we can a perform chi-square test. We begin by calculating the test statistic. Test statistic : (Observed - Expected)2 2 c = Expected (49 - 46.08) 2 (47 - 49.92) 2 (30 - 32.24) 2 = + + ...+ =3.22

46.08 49.92 32.24 P-Value: Using df = (3 1)(2 1) = 2, the P-value is 0.20. Because the P-value, 0.20, is greater than = 0.05, we fail to reject H0. There is not enough evidence to conclude that the distribution of party affiliation differs in the cell-only and landline user populations. 30 Testing for Independence 1 Suppose we have a single sample from a single population. For each individual in this SRS of size n, we measure two categorical variables. The results are then summarized in a two-way table. The null hypothesis is that the row and column variables are independent. The alternative hypothesis is that the row and column variables are dependent. 31 Testing for Independence 2 32 Were interested in whether angrier people tend to get heart disease more often. We can compare the percents of people who did and did not get heart disease in each of the three anger categories: Low anger

Moderate anger High anger Total CHD 53 110 27 190 No CHD 3057 4621 606 8284 Total 3110

4731 633 8474 There is a clear trend: As the anger score increases, so does the percent who suffer heart disease. A much higher percent of people in the high-anger category developed CHD (4.27%) than in the moderate (2.33%) and low (1.70%) anger categories. Testing for Independence 3 33 Here is the complete table of observed and expected counts for the CHD and anger study side by side. Do the data provide convincing evidence of an association between anger level and heart disease in the population of interest? Observed Expected Low Moderate High

CHD 53 110 27 No CHD 3057 4621 606 Low Moderate High CHD 69.73 106.08 14.19

No CHD 3040.27 4624.92 618.81 We want to perform a test of H0: There is no association between anger level and heart disease in the population of people with normal blood pressure. Ha: There is an association between anger level and heart disease in the population of people with normal blood pressure. Testing for Independence 4 34 If the conditions are met, we should conduct a chi-square test for association/independence. Random: The data came from a random sample of 8474 people with normal blood pressure. Large Sample Size: All the expected counts are at least 5, so this condition is met. Independent: Because we are sampling without replacement, we need to check that the total number of people in the population with normal blood pressure is at least 10(8474) = 84,740. This seems reasonable to assume.

Testing for Independence 5 Because the conditions are satisfied, we can perform a chi-test for association/independence. We begin by calculating the test statistic. Test statistic : (Observed - Expected)2 c = Expected 2 (53 - 69.73) 2 (110 - 106.08) 2 (606 - 618.81) 2 = + + ...+ 69.73 106.08 618.81 =4.014 + 0.145 + ...+ 0.265 =16.077 P-Value: The two-way table of anger level versus heart disease has two rows and three columns. We will use the chi-square distribution with df = (2 1)(3 1) = 2 to find the P-value. Table: Look at the df = 2 line in Table F. The observed statistic 2 = 16.077 is larger than the critical value 15.20 for = 0.0005. So the P-value is less than 0.0005. Technology: The calculator command 2cdf(16.077,1000,2) gives 0.00032. Because the P-value is clearly less than = 0.05, we reject H0 and conclude that anger level and heart disease are associated in the population of people with normal blood pressure. 35

9.2 Goodness of Fit The chi-square goodness-of-fit test 36 The Chi-Square Test for Goodness of Fit 1 37 Mars, Inc. makes milk chocolate candies. Heres what the companys Consumer Affairs Department says about the color distribution of its M&Ms candies: On average, the new mix of colors of M&Ms milk chocolate candies will contain 13% of each of browns and reds, 14% yellows, 16% greens, 20% oranges, and 24% blues. The one-way table below summarizes the data from a sample bag of M&Ms. In general, one-way tables display the distribution of a categorical variable for the individuals in a sample. Color Blue Orange Green Yellow

Red Brown Total Count 9 8 12 15 10 6 60 The sample proportion of blue M&MS is p = 9 =0.15. 60 The Chi-Square Test for Goodness of Fit 2

38 Because the company claims that 24% of all M&Ms are blue, we might believe that something fishy is going on. We could use the z test for a proportion to test the hypotheses: H0: p = 0.24 Ha: p 0.24 where p is the true population proportion of blue M&MS. We could then perform additional significance tests for each of the remaining colors. However, performing a one-sample z test for each proportion would be inefficient and lead to the problem of multiple comparisons, in which we have to adjust for the fact that several tests are done at the same time. More important, performing one-sample z tests for each color would not tell us how likely it is to get a random sample of 60 candies with a color distribution that differs as much from the one claimed by the company as this bag does (taking all the colors into consideration at one time). For that, we need a new kind of significance test, which is called a chi-square test for goodness of fit. The Chi-Square Test for Goodness of Fit 3 39 We can write the hypotheses in symbols as H0: pblue = 0.24, porange = 0.20, pgreen = 0.16, pyellow = 0.14, pred = 0.13, pbrown = 0.13, Ha: At least one of the proportions is different than claimed where pcolor = the true population proportion of M&Ms of that color.

The idea of the chi-square test for goodness of fit is this: We compare the observed counts from our sample with the counts that would be expected if H0 is true. The more the observed counts differ from the expected counts, the more evidence we have against the null hypothesis. In general, the expected counts can be obtained by multiplying the proportion of the population distribution in each category by the sample size. The Chi-Square Test for Goodness of Fit 4 40 Assuming that the color distribution stated by Mars, Inc. is true: 24% of all M&Ms produced are blue. For random samples of 60 candies, the average number of blue M&Ms should be (0.24)(60) = 14.40. This is our expected count of blue M&Ms. Using this same method, we can find the expected counts for the other color categories: Color Observed Expected Orange: (0.20)(60) = 12.00 Blue 9

14.40 Green: (0.16)(60) = 9.60 Orange 8 12.00 Yellow: (0.14)(60) = 8.40 Green 12 9.60 Yellow 15 8.40 Red 10 7.80

Brown 6 7.80 Red: (0.13)(60) = 7.80 Brown: (0.13)(60) = 7.80 The Chi-Square Test for Goodness of Fit 5 41 To calculate the chi-square statistic, use the same formula as you did earlier in the chapter. (Observed - Expected)2 c = Expected 2 (9 - 14.40) 2 (8 - 12.00) 2 (12 - 9.60) 2 c = + + 14.40 12.00

9.60 2 (15 - 8.40) 2 (10 - 7.80) 2 (6 - 7.80) 2 + + + 8.40 7.80 7.80 c 2 =2.025 +1.333 + 0.600 + 5.186 + 0.621 + 0.415 =10.180 The Chi-Square Test for Goodness of Fit 6 The Chi-Square Test for Goodness of Fit A categorical variable has k possible outcomes, with probabilities . That is, is the probability of the outcome. We have n independent observations from this categorical variable. To test the null hypothesis that the probabilities have specified values find the expected count for each category assuming that is true. Then calculate the chi-square statistic: where the sum is over k different categories. The P-value is the area to the right of under the density of the chi-square distribution on with degrees of freedom. 42

The Chi-Square Test for Goodness of Fit 7 43 We computed the chi-square statistic for our sample of 60 M&M's to be c 2 =10.180. Because all of the expected counts are at least 5, the c 2 statistic will follow a chi-square distribution with df=6-1=5 reasonably well when H 0 is true. P df .15 .10 .05 4 6.74 7.78 9.49 5 8.12

9.24 11.07 6 9.45 10.64 12.59 The value c 2 =10.180 falls between the critical values 9.24 and 11.07. The Because our P-value between and it isdistribution greater than = corresponding areas in is the right tail0.05 of the chi0.10, - square with df0.05. =5

Therefore, fail to reject H0. We do not have sufficient evidence to are 0.10 andwe 0.05. conclude that the companys claimed color distribution is incorrect. So, the P - value for a test based on our sample data is between 0.05 and 0.10.