17.1 Introduction
Survival analysis encompasses a wide variety of techniques that focus on how long given states “persist” over time. This type of analysis has wide application whenever time to onset is important, as is the case in cohort studies and clinical trials.
Survival analysis is particularly important when analyzing data in which risks vary over time. Figure 17.1 displays a survival model for the 1997 US population (solid line). Because mortality increase greatly with age, human survivorship drops off sharply at older ages, creating a survival curve that is “rectangularized” in shape. If mortality risk were constant with age, survivorship would demonstrate an exponential decay curve, as demonstrated, for instance, by the dashed line in Figure 17.1. Clearly, constant risk (exponential decay; dashed line) does not apply to human survival. In fact, most health risks are not constant over time.
Let us consider the survival experience of 10 patients treated for a life-threatening disease. Figure 17.2 displays the experience of each patient graphically. The study started in 1990 and ended in 1999. Subjects were enrolled throughout the course of the study. For each study subject, one of the following outcomes was possible:
Notice that survival data are complete only for study subjects in category 1. Study subjects in category 2 and category 3 have incomplete survival data in that we do not know the date of their death. Thus, data are truncated or right-censored for these subjects. We combine these right-censored study subjects into a category called withdrawal.a
The next step in the analysis is to back-up each study subject’s follow-up time to “time zero” (t0), the time when they entered the study and, presumably, when they began their treatment (Figure 17.3). The time from a subject’s time zero to either withdrawal or death is called the “person-time of observation” or simply “person-time.” Table 17.1 lists the person-time and outcome for each study subject.
Subject | Person-months | Outcome |
1 | 2 | Death |
2 | 6 | Death |
3 | 18 | Withdrawal (discontinued study) |
4 | 20 | Death |
5 | 42 | Death |
6 | 75 | Withdrawal (discontinued study) |
7 | 95 | Withdrawal (study ended) |
8 | 110 | Withdrawal (discontinued study) |
9 | 120 | Withdrawal (study ended) |
10 | 120 | Withdrawal (study ended) |
In describing the survival experience of the group, we might initially be tempted to determine the average survival time of the study subjects during the period of study. However, this would substantially underestimate survival by ignoring survival times of study subjects after they withdrew from the study. A more scientific approach calculates the death rate in the cohort using the person-time method introduced in Section 3.1. Thus, the observed rate is
17.1
where
For the Illustrative Example 17.1 (Table 17.1), A = 4, T = 608 months, and = = 0.006 58 per month or, equivalently, 6.58 per 1000 person-months.
The inverse of this rate is the expected survival time in the cohort. In this instance, the expected survival time = ≅ 152 months.
Although reporting the mortality rate and/or expected survival time is superior to, say, reporting the average survival time during the period of observation, this assumes that the rate of death is constant over time. However, if we carefully examine the illustrative data, we notice that two of the four deaths occurred within 6 months of treatment and all four deaths occurred within 42 months (3.5 years) of treatment. Therefore, hazards are concentrated near the beginning of follow-up. This non-constant hazard needs to be addressed.
17.2 Stratifying rates by follow-up time
One straightforward method for dealing with a non-constant hazard is to stratify rates according to sequential follow-up periods. This is accomplished by grouping the person-time into intervals 1 through K. Rates are then calculated within each interval. Let denote the death rate in interval k:
17.2
where Ak is the number of deaths in interval k and Tk is the sum of person-time in that interval.
Let us tally person-time and events within each sequential time-interval. During the first year of follow-up, person 1 contributes 2 person-months of observation time, person 2 contributes 6 person-months, and the remaining 8 people contribute 12 person-months each. Therefore, the sum of person-time during the first year of follow-up, T1 = 2 + 6 + (8 × 12) = 104 person-months. During this interval, there were 2 deaths (A1). Consequently,
During the second year of follow-up, person 1 (now dead) contributes 0 person-months, person 2 (also dead) contributes 0 person-months, person 3 contributes 6 person-months, person 4 contributes 8 person-months, and the remaining 6 people contribute 6 × 12 person-months. Thus, the person-time during year 2 of follow-up is T2 = 0 + 0 + 6 + 8 + (6 × 12) = 86 person-months, during which there was one death. Therefore,
Table 17.2 lists all of the follow-up interval specific rates. Historically, this type of analysis has been called a modified life table.
Notice thatb in Table 17.2, all of the mortality occurred during the first 4 years of follow-up. The crude mortality rate of 6.6 per 1000 person-months is a weighted average of internal-specific rates. This weighted average fails to capture the non-constant hazard over time. Stratifying the rates into follow-up intervals is a simple way to address this non-constant hazard. It also gives rise to methods for estimating the survival function. Two such methods are the actuarial method and the Kaplan−Meier method.
17.3 Actuarial method of survival analysis
The actuarial method of survival analysis is used to estimate probabilities of death over successive follow-up intervals. Let:
Table 17.3 lists these data elements for the illustrative data in columns (2), (3), and (4), respectively. Notice that the number of people entering interval k + 1 is equal to the number entering interval k minus the number of withdrawals and deaths in the interval:
17.3
For the illustrative data, N2 = N1 − W1 − A1 = 10 − 0 − 2 = 8 [column (2), row 2].
The most fundamental information needed to complete an actuarial table is the proportion of people dying within each interval. Before calculating this proportion, we need to compensate for the withdrawals of person-time that occurred during each interval. Therefore, we calculate the number effectively “exposed” to risk during the interval (denoted ). Several methods may be considered for calculating . The actuarial method assumes withdrawals occur at mid-interval. This reduces the number of people effectively exposed to risk by half the number of withdrawals:
17.4
For example, in the illustrative data, eight people entered the second interval and one withdrew during this interval. Therefore, The number of people effectively exposed to risk in the Illustrative Example is listed in column (5) of Table 17.3.
Let denote the incidence proportion of the outcome during interval k:
17.5
For the illustrative data, = 1/7.5 = 0.1333. Values for the other interval-specific incidence proportions are shown in column (6) of Table 17.3.
The survival proportion in interval k, conditional on having survived to that point, denoted , is the complement of the incidence proportion:
17.6
For the illustrative data, = 1 – 0.1333 = 0.8667. Survival proportions are shown in column (7) of Table 17.3.
Comment on notation: We have used p to represent the incidence proportion of the outcome and q to represent its complement (survival proportion) to be consistent with the notation presented earlier in the book. Some sources reverse this convention, using q to represent the proportion dying and p to represent the proportion surviving. Still other sources use R to represent the proportion dying (R stands for risk) and S to represent the proportion surviving.
We are now ready to calculate the survival function. Let denote the cumulative proportion surviving through interval k. This is equal to the product of the survival proportions up to and including the current interval:
17.7
For the illustrative data, the cumulative proportion surviving the second year is . This quantifies the likelihood of surviving the current interval and the prior interval. In contrast, the interval-specific survival proportions () is conditional on having survived all prior intervals. Cumulative survival proportions are listed in column (8) of Table 17.3. This column comprises the survival function for the data and is plotted as such in Figure 17.4.