The Study of Risk Factors and Causation

4 The Study of Risk Factors and Causation

Epidemiologists are frequently involved in studies to determine causation—that is, to find the specific cause or causes of a disease. This is a more difficult and elusive task than might be supposed, and it leaves considerable room for obfuscation, as shown in a newspaper article on cigarette smoking.1 The article quoted a spokesman for the Tobacco Institute (a trade association for cigarette manufacturers) as saying that “smoking was a risk factor, though not a cause, of a variety of diseases.”

Is a risk factor a cause, or is it not? To answer this question, we begin with a review of the basic concepts concerning causation. Studies can yield statistical associations between a disease and an exposure; epidemiologists need to interpret the meaning of these relationships and decide if the associations are artifactual, noncausal, or causal.

I Types of Causal Relationships

Most scientific research seeks to identify causal relationships. The three fundamental types of causes, as discussed next in order of decreasing strength, are (A) sufficient cause, (B) necessary cause, and (C) risk factor (Box 4-1).

D Causal and Noncausal Associations

The first and most basic requirement for a causal relationship to exist is an association between the outcome of interest (e.g., a disease or death) and the presumed cause. The outcome must occur either significantly more often or significantly less often in individuals who are exposed to the presumed cause than in individuals who are not exposed. In other words, exposure to the presumed cause must make a difference, or it is not a cause. Because some differences would probably occur as a result of random variation, an association must be statistically significant, meaning that the difference must be large enough to be unlikely if the exposure really had no effect. As discussed in Chapter 10, “unlikely” is usually defined as likely to occur no more than 1 time in 20 opportunities (i.e., 5% of the time, or 0.05) by chance alone.

If an association is causal, the causal pathway may be direct or indirect. The classification depends on the absence or presence of intermediary factors, which are often called intervening variables, mediating variables, or mediators.

A directly causal association occurs when the factor under consideration exerts its effect without intermediary factors. A severe blow to the head would cause brain damage and death without other external causes being required.

An indirectly causal association occurs when one factor influences one or more other factors through intermediary variables. Poverty itself may not cause disease and death, but by preventing adequate nutrition, housing, and medical care, poverty may lead to poor health and premature death. In this case, the nutrition, housing, and medical care would be called intervening variables. Education seems to lead to better health indirectly, presumably because it increases the amount of knowledge about health, the level of motivation to maintain health, and the ability to earn an adequate income.

A statistical association may be strong but may not be causal. In such a case, it would be a noncausal association. An important principle of data analysis is that association does not prove causation. If a statistically significant association is found between two variables, but the presumed cause occurs after the effect (rather than before it), the association is not causal. For example, studies indicated that estrogen treatments for postmenopausal women were associated with endometrial cancer, so that these treatments were widely considered to be a cause of the cancer. Then it was realized that estrogens often were given to control early symptoms of undiagnosed endometrial cancer, such as bleeding. In cases where estrogens were prescribed after the cancer had started, the presumed cause (estrogens) was actually caused by the cancer. Nevertheless, estrogens are sometimes prescribed long before symptoms of endometrial cancer appear, and some evidence indicates that estrogens may contribute to endometrial cancer. As another example, quitting smoking is associated with an increased incidence of lung cancer. However, it is unlikely that quitting causes lung cancer or that continuing to smoke would be protective. What is much more likely is that smokers having early, undetectable or undiagnosed lung cancer start to feel sick because of their growing malignant disease. This sick feeling prompts them to stop smoking and thus, temporarily, they feel a little better. When cancer is diagnosed shortly thereafter, it appears that there is a causal association, but this is false. The cancer started before the quitting was even considered. The temporality of the association precludes causation.

Likewise, if a statistically significant association is found between two variables, but some other factor is responsible for both the presumed cause and the presumed effect, the association is not causal. For example, baldness may be associated with the risk of coronary artery disease (CAD), but baldness itself probably does not cause CAD. Both baldness and CAD are probably functions of age, gender, and dihydrotestosterone level.

Finally, there is always the possibility of bidirectional causation. In other words, each of two variables may reciprocally influence the other. For example, there is an association between the density of fast-food outlets in neighborhoods and people’s purchase and consumption of fast foods. It is possible that people living in neighborhoods dense with sources of fast food consume more of it because fast food is so accessible and available. It is also possible that fast-food outlets choose to locate in neighborhoods where people’s purchasing and consumption patterns reflect high demand. In fact, the association is probably true to some extent in both directions. This bidirectionality creates somewhat of a feedback loop, reinforcing the placement of new outlets (and potentially the movement of new consumers) into neighborhoods already dense with fast food.

II Steps in Determination of Cause and Effect

Investigators must have a model of causation to guide their thinking. The scientific method for determining causation can be summarized as having three steps, which should be considered in the following order3:

These steps in epidemiologic investigation are similar in many ways to the steps followed in an investigation of murder, as discussed next.

A Investigation of Statistical Association

Investigations may test hypotheses about risk factors or protective factors. For causation to be identified, the presumed risk factor must be present significantly more often in persons with the disease of interest than in persons without the disease. To eliminate chance associations, this difference must be large enough to be considered statistically significant. Conversely, the presumed protective factor (e.g., a vaccine) must be present significantly less often in persons with the disease than in persons without it. When the presumed factor (either a risk factor or a protective factor) is not associated with a statistically different frequency of disease, the factor cannot be considered causal. It might be argued that an additional, unidentified factor, a “negative” confounder (see later), could be obscuring a real association between the factor and the disease. Even in that case, however, the principle is not violated, because proper research design and statistical analysis would show the real association.

The first step in an epidemiologic study is to show a statistical association between the presumed risk or protective factor and the disease. The equivalent early step in a murder investigation is to show a geographic and temporal association between the murderer and the victim—that is, to show that both were in the same place at the same time, or that the murderer was in a place from which he or she could have caused the murder.

The relationship between smoking and lung cancer provides an example of how an association can lead to an understanding of causation. The earliest epidemiologic studies showed that smokers had an average overall death rate approximately two times that of nonsmokers; the same studies also indicated that the death rate for lung cancer among all smokers was approximately 10 times that of nonsmokers.4 These studies led to further research efforts, which clarified the role of cigarette smoking as a risk factor for lung cancer and for many other diseases as well.

In epidemiologic studies the research design must allow a statistical association to be shown, if it exists. This usually means comparing the rate of disease before and after exposure to an intervention that is designed to reduce the disease of interest, or comparing groups with and without exposure to risk factors for the disease, or comparing groups with and without treatment for the disease of interest. Statistical analysis is needed to show that the difference associated with the intervention or exposure is greater than would be expected by chance alone, and to estimate how large this difference is. Research design and statistical analysis work closely together (see Chapter 5).

If a statistically significant difference in risk of disease is observed, the investigator must first consider the direction and extent of the difference. Did therapy make patients better or worse, on average? Was the difference large enough to be etiologically or clinically important? Even if the observed difference is real and large, statistical association does not prove causation. It may seem initially that an association is causal, when in fact it is not. For example, in the era before antibiotics were developed, syphilis was treated with arsenical compounds (e.g., salvarsan), despite their toxicity. An outbreak of fever and jaundice occurred in many of the patients treated with arsenicals.5 At the time, it seemed obvious that the outbreak was caused by the arsenic. Many years later, however, medical experts realized that such outbreaks were most likely caused by an infectious agent, probably hepatitis B or C virus, spread by inadequately sterilized needles during administration of the arsenical compounds. Any statistically significant association can only be caused by one of four possibilities: true causal association, chance (see Chapter 12), random error, or systematic error (bias or its special case, confounding, as addressed later).

Several criteria, if met, increase the probability that a statistical association is true and causal6 (Box 4-2). (These criteria often can be attributed to the 19th-century philosopher John Stuart Mill.) In general, a statistical association is more likely to be causal if the criteria in Box 4-2 are true:

Aug 27, 2016 | Posted by in PUBLIC HEALTH AND EPIDEMIOLOGY | Comments Off on The Study of Risk Factors and Causation

Full access? Get Clinical Tree

Get Clinical Tree app for offline access
%d bloggers like this: