6 Assessment of Risk and Benefit in Epidemiologic Studies

Causal research in epidemiology requires that two fundamental distinctions be made. The first distinction is between people who have and people who do not have exposure to the risk factor (or protective factor) under study (the **independent variable**). The second distinction is between people who have and people who do not have the disease (or other outcome) under study (the **dependent variable**). These distinctions are seldom simple, and their measurements are subject to random errors and biases.

In addition, epidemiologic research may be complicated by other requirements. It may be necessary to analyze several independent (possibly causal) variables at the same time, including how they interact. For example, the frequency of hypertension is related to *age* and *gender,* and these variables interact in the following manner: before about age 50, men are more likely to be hypertensive; but after age 50, women are more likely to be hypertensive. Another complication involves the need to measure different degrees of *strength of exposure* to the risk factor, the *duration of exposure* to the risk factor, or both. Investigators study strength and duration in combination, for example, when they measure exposure to cigarettes in terms of *pack-years,* which is the average number of packs smoked per day times the number of years of smoking. Depending on the risk factor, it may be difficult to determine the time of onset of exposure. This is true for risk factors such as sedentary lifestyle and excess intake of dietary sodium. Another complication of analysis is the need to measure different levels of *disease severity.* Exposure and outcome may vary across a range of values, rather than simply be *present* or *absent*.

Despite these complexities, much epidemiologic research still relies on the dichotomies of exposed/unexposed and diseased/nondiseased, which are often presented in the form of a **standard 2** × **2 table** (Table 6-1).

# I Definition of Study Groups

Causal research depends on the measurement of differences. In cohort studies the difference is between the frequency of disease in **persons exposed** to a risk factor and the frequency of disease in **persons not exposed** to the same risk factor. In case-control studies the difference is between the frequency of the risk factor in **case participants** (persons with the disease) and the frequency of the risk factor in **control participants** (persons without the disease).

The exposure may be to a *nutritional* factor (e.g., high–saturated fat diet), an *environmental* factor (e.g., radiation after Chernobyl disaster), a *behavioral* factor (e.g., cigarette smoking), a *physiologic* characteristic (e.g., high serum total cholesterol level), a *medical* intervention (e.g., antibiotic), or a *public health* intervention (e.g., vaccine). Other factors also play a role, and the categorization may vary (e.g., nutritional choices are often regarded as behavioral factors).

# II Comparison of Risks in Different Study Groups

Although differences in risk can be measured in absolute terms or in relative terms, the method used depends on the type of study performed. For reasons discussed in Chapter 5, case-control studies allow investigators to obtain only a relative measure of risk, whereas cohort studies and randomized controlled trials allow investigators to obtain absolute and relative measures of risk. Whenever possible, it is important to examine absolute and relative risks because they provide different information.

After the differences in risk are calculated by the methods outlined in detail subsequently, the level of statistical significance must be determined to ensure that any observed difference is probably real (i.e., not caused by chance). (Significance testing is discussed in detail in Chapter 10.) When the difference is statistically significant, but not clinically important, it is real but trivial. When the difference appears to be clinically important, but is not statistically significant, it may be a false-negative (beta) error if the sample size is small (see Chapter 12), or it may be a chance finding.

## A Absolute Differences in Risk

Disease frequency usually is measured as a risk in cohort studies and clinical trials and as a rate when the disease and death data come from population-based reporting systems. An absolute difference in risks or rates can be expressed as a risk difference or as a rate difference. The **risk difference** is the risk in the exposed group minus the risk in the unexposed group. The **rate difference** is the rate in the exposed group minus the rate in the unexposed group (rates are defined in Chapter 2). The discussion in this chapter focuses on risks, which are used more often than rates in cohort studies.

When the level of risk in the exposed group is the same as the level of risk in the unexposed group, the risk difference is 0, and the conclusion is that the exposure makes no difference to the disease risk being studied. If an exposure is harmful (as in the case of cigarette smoking), the risk difference is expected to be greater than 0. If an exposure is protective (as in the case of a vaccine), the risk difference is expected to be less than 0 (i.e., a negative number, which in this case indicates a reduction in disease risk in the group exposed to the vaccine). The risk difference also is known as the **attributable risk** because it is an estimate of the amount of risk that *can be attributed to,* or *is attributable to* (is caused by), the risk factor.

In Table 6-1 the risk of disease in the exposed individuals is *a*/(*a* + *b*), and the risk of disease in the unexposed individuals is *c*/(*c* + *d*). When these symbols are used, the attributable risk (AR) can be expressed as the difference between the two:

Figure 6-1 provides data on age-adjusted death rates for lung cancer among adult male smokers and nonsmokers in the U.S. population in 1986 and in the United Kingdom (UK) population.^{1,}^{2} For the United States in 1986, the lung cancer death rate in smokers was 191 per 100,000 population per year, whereas the rate in nonsmokers was 8.7 per 100,000 per year. Because the death rates for lung cancer in the population were low (<1% per year) in the year for which data are shown, the rate and the risk for lung cancer death would be essentially the same. The risk difference (attributable risk) in the United States can be calculated as follows:

Figure 6-1 Risk of death from lung cancer.

Comparison of the risks of death from lung cancer per 100,000 adult male population per year for smokers and nonsmokers in the United States (USA) and United Kingdom (UK).

(Data from US Centers for Disease Control: *MMWR* 38:501–505, 1989; and Doll R, Hill AB: *BMJ* 2:1071–1081, 1956.)

Similarly, the attributable risk in the UK can be calculated as follows:

## B Relative Differences in Risk

Relative risk (RR) can be expressed in terms of a risk ratio (also abbreviated as RR) or estimated by an odds ratio (OR).

### 1 Relative Risk (Risk Ratio)

The **relative risk**, which is also known as the risk ratio (both abbreviated as RR), is the ratio of the risk in the exposed group to the risk in the unexposed group. If the risks in the exposed group and unexposed group are the same, RR = 1. If the risks in the two groups are not the same, calculating RR provides a straightforward way of showing in relative terms how much different (greater or smaller) the risks in the exposed group are compared with the risks in the unexposed group. The risk for the disease in the exposed group usually is greater if an exposure is harmful (as with cigarette smoking) or smaller if an exposure is protective (as with a vaccine). In terms of the groups and symbols defined in Table 6-1, relative risk (RR) would be calculated as follows:

The data on lung cancer deaths in Figure 6-1 are used to determine the **attributable risk** (AR). The same data can be used to calculate the RR. For men in the United States, 191/100,000 divided by 8.7/100,000 yields an RR of 22. Figure 6-2 shows the conversion from absolute to relative risks. Absolute risk is shown on the left axis and relative risk on the right axis. In relative risk terms the value of the risk for lung cancer death in the unexposed group is 1. Compared with that, the risk for lung cancer death in the exposed group is 22 times as great, and the attributable risk is the difference, which is 182.3/100,000 in absolute risk terms and 21 in relative risk terms.

Figure 6-2 Risk of death from lung cancer.

Diagram shows the risks of death from lung cancer per 100,000 adult male population per year for smokers and nonsmokers in the United States, expressed in absolute *terms (left axis)* and in relative terms *(right axis).*

(Data from US Centers for Disease Control: *MMWR* 38:501–505, 1989.)

It also is important to consider the number of people to whom the relative risk applies. A large relative risk that applies to a small number of people may produce few excess deaths or cases of disease, whereas a small relative risk that applies to a large number of people may produce many excess deaths or cases of disease.

### 2 Odds Ratio

People may be unfamiliar with the concept of odds and the difference between “risk” and “odds.” Based on the symbols used in Table 6-1, the **risk** of disease in the exposed group is *a*/(*a* + *b*), whereas the **odds** of disease in the exposed group is simply *a*/*b*. If *a* is small compared with *b*, the odds would be similar to the risk. If a particular disease occurs in 1 person among a group of 100 persons in a given year, the risk of that disease is 1 in 100 (0.0100), and the odds of that disease are 1 to 99 (0.0101). If the risk of the disease is relatively large (>5%), the odds ratio is not a good estimate of the risk ratio. The odds ratio can be calculated by dividing the odds of exposure in the diseased group by the odds of exposure in the nondiseased group. In the terms used in Table 6-1, the formula for the OR is as follows:

In mathematical terms, it would make no difference whether the odds ratio was calculated as (*a*/*c*)/(*b*/*d*) or as (*a*/*b*)/(*c*/*d*) because cross-multiplication in either case would yield *ad*/*bc*. In a case-control study, it makes no sense to use (*a*/*b*)/(*c*/*d*) because cells *a* and *b* come from different study groups. The fact that the odds ratio is the same whether it is developed from a horizontal analysis of the table or from a vertical analysis proves to be valuable, however, for analyzing data from case-control studies. Although a risk or a risk ratio cannot be calculated from a case-control study, an odds ratio can be calculated. Under most real-world circumstances, the odds ratio from a carefully performed case-control study is a good estimate of the risk ratio that would have been obtained from a more costly and time-consuming prospective cohort study. The odds ratio may be used as an estimate of the risk ratio if the risk of disease in the population is low. (It can be used if the risk ratio is <1%, and probably if <5%.) The odds ratio also is used in logistic methods of statistical analysis (logistic regression, log-linear models, Cox regression analyses), discussed briefly in Chapter 13.

### 3 Which Side Is Up in the Risk Ratio and Odds Ratio?

If the risk for a disease is the same in the group exposed to a particular risk factor or protective factor as it is in the group not exposed to the factor, the risk ratio is expressed simply as 1.0. Hypothetically, the risk ratio could be 0 (i.e., if the individuals exposed to a protective factor have no risk, and the unexposed individuals have some risk), or it may be infinity (i.e., if the individuals exposed to a risk factor have some risk, and the unexposed individuals have no risk). In practical terms, however, because there usually is some disease in every large group, these extremes of the risk ratio are rare.

When risk factors are discussed, placing the exposed group in the numerator is a convention that makes intuitive sense (because the number becomes larger as the risk factor has a greater impact), and this convention is followed in the literature. However, the risk ratio also can be expressed with the exposed group in the denominator. Consider the case of cigarette smoking and myocardial infarction (MI), in which the risk of MI for smokers is greater than for nonsmokers. On the one hand, it is acceptable to put the smokers in the numerator and express the risk ratio as 2/1 (i.e., 2), meaning that the risk of MI is about twice as high for smokers as for nonsmokers of otherwise similar age, gender, and health status. On the other hand, it also is acceptable to put the smokers in the denominator and express the risk ratio as 1/2 (i.e., 0.5), meaning that nonsmokers have half the risk of smokers. Clarity simply requires that the nature of the comparison be explicit.

Another risk factor might produce 4 times the risk of a disease, in which case the ratio could be expressed as 4 or as 1/4, depending on how the risks are being compared. When the risk ratio is plotted on a logarithmic scale (Fig. 6-3), it is easy to see that, regardless of which way the ratio is expressed, the distance to the risk ratio of 1 is the same. Mathematically, it does not matter whether the risk for the exposed group or the unexposed group is in the numerator. Either way the risk ratio is easily interpretable. Almost always the risk of the exposed group is expressed in the numerator, however, so that the numbers make intuitive sense.

Figure 6-3 Possible risk ratios plotted on logarithmic scale.

Scale shows that reciprocal risks are equidistant from the neutral point, where the risk ratio is equal to 1.0.