6.1 Introduction
Epidemiologic experiments are also called trials. The word trial comes from the Anglo-French root trier, meaning “to try” or “to put something to a test.” Epidemiologic trials put either preventive or therapeutic measures to the test. The three main types of trials in epidemiology are:
- Field trials, which are used to address the efficacy of preventive interventions applied to individuals (e.g., a vaccine trial).
- Community trials, which are used to address the efficacy of preventive interventions applied at the group level (e.g., a health education campaign).
- Clinical trials, which are used to address the efficacy of therapeutic interventions in ill individuals (e.g., a chemotherapy trial in the treatment of cancer).
Let us start by considering of a couple of examples.
The Women’s Health Initiative (WHI) included a large randomized controlled primary prevention trial that addressed whether women in the decades of life following menopause should take estrogen plus progesterone in order to prevent chronic diseases (WHI, 2002). The study included 16 608 postmenopausal women aged 50–79 years who were randomized to form two groups. Group 1 received conjugated estrogens plus progesterone in tablet form (n = 8506). Group 2 received an identical looking placebo tablet (n = 8102). The two main disease outcomes in this trial were coronary heart disease and invasive breast cancer. After an average of 5.2 years of follow-up, the trial was stopped because the risks of continued use of the active treatment exceeded its benefits. For example, the incidence proportion of fatal and nonfatal coronary events in the estrogen plus progesterone group was = 0.019 28 or 19.3 per 1000. In the placebo group, the incidence was = 0.015 06 or 15.1 per 1000. Figure 6.1 plots the occurrence of coronary events in the groups over time. This plot demonstrates that differences between the groups became evident soon after randomization and persisted through the follow-up period.
Whereas the prior example illustrated a field trial in which individuals were randomized into a treatment or control group, other trials randomized the study intervention on a group-by-group basis, as illustrated in this example.
This study sought to determine if childhood mortality due to high rates of respiratory disease and diarrhea could be reduced by vitamin A supplementation; 450 villages in Sumatra were randomly assigned to either participate in a vitamin A supplementation program (n = 229) or to serve as a control village (n = 221) (Sommer et al., 1986). Vitamin A capsules were distributed to preschool children 1–3 months after enrollment in the treated villages and again 6 months later. The one-year mortality rate in children (1–5 years) in the villages that received vitamin A supplementation was = 4.9 per 1000. The mortality rate in the control villages was = 7.3 per 1000. Thus, the rate difference mortality was (4.9 per 1000) – (7.3 per 1000) = −2.4 per 1000, indicating that vitamin A had reduced childhood mortality substantially.
Before delving further into modern epidemiologic trials, let us gain some historical context.
6.2 Historical perspective
The idea of an experiment in human health is really quite old. The earliest recorded account of a trial appears in the first chapter of the Book of Daniel in the Old Testament (Lilienfeld, 1982). In this story, Daniel requests a 10-day trial comparing a diet of the “king’s food” with a standard diet of leguminous plants. Daniel predicts superior results on the standard diet of leguminous plants, which he requests for his people. After 10 days on the respective diets, Daniel recommends:
Then let our countenances be looked upon before thee, and the countenances of the youths that eat of the king’s food.
Apparently, the proposal was accepted, since the story goes on to tell:
So, he hearkened unto them and tried them in this matter, and tried them for ten days…
The results favored the diet of leguminous plants:
… at the end of the ten days [the group eating the diet of leguminous plants had] countenances [which] appeared fairer and they were fatter in the flesh, than all the youths that did eat the king’s food.
Biblical quotes cited in Lilienfeld, 1982, p. 4
One of the earliest descriptions of a randomized trial was provided by the Belgian medicinal chemist van Helmont in 1662. Wishing to replace the theory-based approach of his peers with a more empirical approach, van Helmont wrote:
Let us take out of the hospitals, out of the Camps, or from elsewhere, 200, or 500 poor People, that have Fevers, Pleurisies, &c. Let us divide them into halfes, let us cast lots, that one half of them may fall to my share, and the others to yours; … we shall see how many funerals both of us shall have: But let the reward of the contention or wager, be 300 florens, deposited on both sides.
Armitage, 1983, p. 328
This passage describes a randomized controlled trial (RCT). It was randomized because the treatment was assigned to study subjects by mechanism based on chance (“let us cast lots”). It was controlled because there is a treatment and control group (“that one half of them may fall to my share, and the other to yours”). It was a clinical trial because it tested the efficacy of a treatment in the caring for the ill (“People, that have Fevers, Pleurisies, &c.”). Thus, the general idea of an RCT dates back many centuries.
A well-known historical example of a nonrandomized trial is that of James Lind’s 1753 trial of treatments for scurvy. Lind assembled six elixirs and concoctions that he thought to be the most likely cures for scurvy. He then assigned a pair of scurvy-ridden sailors to each of the six treatments. Lind (1753) remarks, “the most sudden and visible good effects were perceived from the use of the oranges and lemons.” Hence the practice of supplying British sailors with citrus at sea and their derived nickname “limeys.” Box 6.1 presents Lind’s experiment in his own words.
On the 20th May, 1747, I took twelve patients in the scurvy on board the Salisbury at sea. Their cases were as similar as I could have them. They all in general had putrid gums, the spots and lassitude, with weakness of their knees. They lay together in one place, being a proper apartment for the sick in the forehold; and had one diet in common to all, viz., water gruel sweetened with sugar in the morning; fresh mutton broth often times for dinner; at other times puddings, boiled biscuit with sugar etc.; and for supper barley, raisins, rice and currants, sago and wine, or the like. Two of these were ordered each a quart of cyder a day. Two others took twenty five gutts of elixir vitriol three times a day upon an empty stomach, using a gargle strongly acidulated with it for their mouths. Two others took two spoonfuls of vinegar three times a day upon an empty stomach, having their gruels and their other food well acidulated with it, as also the gargle for the mouth. Two of the worst patients, with the tendons in the ham rigid (a symptom none the rest had) were put under a course of sea water. Of this they drank half a pint every day and sometimes more or less as it operated by way of gentle physic. Two others had each two oranges and one lemon given them every day. These they eat with greediness at different times upon an empty stomach. They continued but six days under this course, having consumed the quantity that could be spared. The two remaining patients took the bigness of a nutmeg three times a day of an electuray recommended by an hospital surgeon made of garlic, mustard seed, rad. raphan., balsam of Peru and gum myrrh, using for common drink narley water well acidulated with tamarinds, by a decoction of which, with the addition of cremor tartar, they were gently purged three or four times during the course.
The consequence was that the most sudden and visible good effects were perceived from the use of the oranges and lemons; one of those who had taken them being at the end of six days fit four duty. The spots were not indeed at that time quite off his body, nor his gums sound; but without any other medicine than a gargarism or elixir of vitriol he became quite healthy before we came into Plymouth, which was on the 16th June. The other was the best recovered of any in his condition, and being now deemed pretty well was appointed nurse to the rest of the sick…
As I shall have occasion elsewhere to take notice of the effects of other medicines in this disease, I shall here only observe that the result of all my experiments was that oranges and lemons were the most effectual remedies for this distemper at sea.
Source: Lind (1753).
Comment regarding use of the term “natural experiment”
The term natural experiment has historically been used to refer to a study with a natural but fortuitous distribution of treatments that mimics an experiment. An early example of one such study was described by the barber-surgeon Ambroise Paré (circa 1510–1590). During the battle for the castle of Villaine in 1537, Paré ran out of the standard treatment for battle wounds which, at that time, was to douse the wound with boiling oil. Having run out of the standard treatment, Paré resorted to treat the wounds with a much less noxious treatment of “digestive medicament.” After the battle, Paré noted superior results with the alternative innocuous digestive medicament, stating:
I raised myself very early to visit them, when beyond my hope I found those to whom I had applied the digestive medicament, feeling but little pain, their wounds neither swollen nor inflamed, and having slept through the night. The others whom I had applied the boiling oil were feverish with much pain and swelling about their wounds. Then I determined never again to burn thus so cruelly the poor wounded.
Armitage and Colton, 1998, pp. 1–2
Although this type of observation has been historically referred to as a natural experiment, it is actually nonexperimental (observational) in nature, since use of the alternative treatment was not allocated as part of the study protocol. Nevertheless, one may still hear use of the term natural experiment applied to this type of serendipitous observation.
6.3 General concepts
The control group
Section 3 of Chapter 5 addressed the importance of using a referent group when judging the effects of an exposure. Referent groups in experimental studies are properly called control groups. Without the referent rate provided by the control group, it would often be impossible to determine the extent to which the rate in the treatment group reflected the effect of the treatment or the natural history of the disease.
In addition, when analyzing the results of a trial, the investigator is aware of the tendency of study participants to show improvements that are unrelated to the treatment being studied, at least temporarily. Several explanations have been advanced this phenomenon. Two such explanations are the placebo effect and the Hawthorne effect.
The placebo effect refers to perceived improvements following treatment with a pharmacologically inert substance (“placebo”) such as a sugar pill or saline injection. This effect has been ascribed to a positive belief in the treatment and the perception of being cared for.a
The Hawthorne effect refers to the tendency of subjects to alter their behavior in a way that is favorable to the results of the study. This effect was first described in a series of worker productivity studies conducted in the 1920s at the Hawthorne Works of the Western Electric Company in Chicago, IL, USA (Mayo, 1933). Continual improvements in worker performance were observed over the course of the study no matter the nature of the intervention. For example, worker output improved whether lighting was intensified or diminished. The Hawthorne and related effects have been attributed to the awareness of being observed and improved social conditions associated with observation. An attention bias analogous to the Hawthorne effect has been observed in subjects in health studies. A counter John Henry effect may occur when a control group getting no intervention compares themselves to the treatment group and responds by actively working harder to overcome the “disadvantage” of being in the control group (Sackett, 1979).
Because of factors such as the Hawthorne effect, it is important to compare the experience of the treatment group with that of a control group. It is also important to care for and observe the treatment and control groups identically, to the extent that this is possible. Blinding (masking) the study participants and investigators about the treatment being received offers just such protection.
There are many different ways to incorporate control groups into an experiment. The simplest way is use a parallel design in which the experience of the treatment group and control group are compared concurrently:
The importance of using a concurrent control group is demonstrated in this example.
The Multiple Risk Factors Intervention Trial (MRFIT, 1982) was conducted in the early 1980s to test interventions intended to decrease the incidence of cardiovascular disease. Participants in the MRFIT trial were randomly assigned to either a special intervention group that received counseling or to a control group that received their usual sources of health care. After approximately 7 years of follow-up, the incidence of coronary disease mortality dropped precipitously in the special intervention group. However, it dropped equally in the control group. This is because the trial took place at a time when the entire country was learning about the benefits of reducing their cardiovascular disease risk profile by quitting smoking, decreasing dietary fat, and increasing exercise. Had the study included no control group, or had historical controls been used, it is likely that the intervention would unjustifiably have been declared a success resulting in the loss of millions of dollars on ineffective programs.
There are alternatives to simple parallel design for incorporating control groups into experiments. One such alternative is called a cross-over design. In a cross-over design, the treatment is first randomized. After a period of observation and measurement a “washout” period follows, during which the effects of the treatment subside. This is then followed by a cross-over to the alternative treatment by study subjects:
This creates a matched design where study subjects serve as their own “control.”
More complex study designs are used to simultaneously assess the effects of two or more treatments. These are called factorial designs. For example, a factorial design may randomized multiple treatments sequentially, as follows: