Categorical data: two proportions


c24-fig-5002


The Problems



  • We have two independent groups of individuals (e.g. homosexual men with and without a history of gonorrhoea). We want to know whether the proportions of individuals with a characteristic (e.g. infected with human herpesvirus-8, HHV-8) are the same in the two groups.
  • We have two related groups, e.g. individuals may be matched, or measured twice in different circumstances (say, before and after treatment). We want to know whether the proportions with a characteristic (e.g. raised test result) are the same in the two groups.

Independent Groups: the Chi-Squared Test


Terminology


The data are obtained, initially, as frequencies, i.e. the numbers with and without the characteristic in each sample. A table in which the entries are frequencies is called a contingency table; when this table has two rows and two columns it is called a 2 × 2 table. Table 24.1 shows the observed frequencies in the four cells corresponding to each row/column combination, the four marginal totals (the frequency in a specific row or column, e.g. a + b), and the overall total, n. We can calculate (see Rationale below) the frequency that we would expect in each of the four cells of the table if H0 were true (the expected frequencies).


Table 24.1 Observed frequencies.


c24t06903l3


Assumptions


We have samples of sizes n1 and n2 from two independent groups of individuals. We are interested in whether the proportions of individuals who possess the characteristic are the same in the two groups. Each individual is represented only once in the study. The rows (and columns) of the table are mutually exclusive, implying that each individual can belong in only one row and only one column. The usual, albeit conservative, approach requires that the expected frequency in each of the four cells is at least five.


Rationale


If the proportions with the characteristic in the two groups are equal, we can estimate the overall proportion of individuals with the characteristic by p = (a + b)/n; we expect n1 × p of them to be in Group 1 and n2 × p to be in Group 2. We evaluate expected numbers without the characteristic similarly. Therefore, each expected frequency is the product of the two relevant marginal totals divided by the overall total. A large discrepancy between the observed (O) and the corresponding expected (E) frequencies is an indication that the proportions in the two groups differ. The test statistic is based on this discrepancy.









1 Define the null and alternative hypotheses under study
H0: the proportions of individuals with the characteristic are equal in the two groups in the population

H1: these population proportions are not equal.

2 Collect relevant data from samples of individuals

3 Calculate the value of the test statistic specific to H0

c24ue001

where O and E are the observed and expected frequencies, respectively, in each of the four cells of the table. The vertical lines around O − E indicate that we ignore its sign. The 1/2 in the numerator is the continuity correction (Chapter 19). The test statistic follows the Chi-squared distribution with 1 degree of freedom.

4 Compare the value of the test statistic to values from a known probability distribution
Refer χ2 to Appendix A3.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 9, 2017 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on Categorical data: two proportions

Full access? Get Clinical Tree

Get Clinical Tree app for offline access