INTRODUCTION
Oh, and one more thing: it is now time for you to get another colonoscopy,” Dr Brown said to his patient Mrs Mary Moore, as she was packing up her purse and getting ready to leave the exam room. “We recommend that everyone gets a colonoscopy every 5 to 10 years, so I will have my nurse call you to set-up an appointment for you with the gastroenterologist here in town. Okay?”
“Umm, okay, sure,” Mrs Moore, an 86-year-old widowed woman with severe congestive heart failure, hesitantly answered.
“Great! It is always a pleasure seeing you Mrs Moore, thanks for coming today. See you at your next appointment,” Dr Brown smiled as he left.
Dr Brown has known Mrs Moore for more than 30 years. He has helped her through difficult times over the years such as the death of her spouse, and has taken care of her chronic hypothyroidism and more recently her worsening congestive heart failure. Mrs Moore truly respects Dr Brown and is thankful for their therapeutic relationship.
A colonoscopy involves inserting a small camera on a telescope-like instrument through the rectum in order to examine the interior of a person’s large bowels. In general, the goal of a screening colonoscopy is to discover and remove polyps that can be precursors to cancer. But since progression from a polyp in the colon to a cancer is a slow process, it takes many years for patients to realize benefits from a colonoscopy procedure.1 Also the procedure is expensive, preparation can be uncomfortable because of the need to take powerful laxatives in order to remove the fecal matter from the bowel, and there are risks of damaging the intestines during the procedure. Therefore, the US Preventive Services Task Force (USPSTF) advises against routine colon cancer screening in patients over 75 years and any screening in patients over 85.2 Nevertheless, Medicare pays for routine screening colonoscopies regardless of age, paying doctors more than $100 million for nearly 550,000 screening colonoscopies in 2009, with around 40% of those for patients over 75.3 But the cost is not only monetary: life-threatening complications such as perforating the bowel significantly increase with advancing age.4 Therefore, an 86-year-old patient with a life-limiting disease like congestive heart failure is more likely to suffer from the risks of a colonoscopy than to ever see the potential downstream benefits.
Overscreening patients for disease is a massive and growing problem in the United States. Elderly patients, often with multiple life-limiting diseases and/or advanced dementia, frequently undergo screening testing that will never benefit them.5 Approximately 50% of woman older than 80 years continue to receive breast cancer screening, and more than half of patients older than 75 years report that their physicians continue to recommend routine screening.6
For younger patients, colonoscopies may also present a personal monetary hurdle. Although insurance plans cover the costs of colonoscopy as a preventive test, if an abnormality is found (such as a polyp or growth) then some no longer consider it a “screening test,” and patients may be responsible for paying their entire deductible or copay.7 For patients on a high-deductible health plan (see Chapter 2), this could mean that the patient is responsible for the entire cost of the procedure, which may be up to thousands of dollars. The American Cancer Society (ACS) recommends: “If you are getting a screening colonoscopy, be sure to find out how much you will have to pay if something is found (and biopsied or removed) during the exam. This can help you avoid any surprise costs.”7 This clearly illustrates the tension between patient affordability and societal stewardship; screening is cost-effective at the societal level and improves population health, but has the potential to be expensive for the individual patient and very rarely improves individual health.
Many colonoscopies are performed unnecessarily, and not only in elderly patients. Following a negative screening colonoscopy, the recommendation is to not repeat the test within 7 to 10 years. However, in a large study, nearly half of patients underwent a repeated examination in fewer than 7 years, suggesting an epidemic of unhelpful colonoscopies.8 In the right setting—a patient in the appropriate target group and obtained at recommended intervals—colonoscopies can be incredibly valuable, life-saving procedures, but too often these procedures are performed in situations without likely benefit. With no realized benefit, the only things left are unpleasant bowel preps, discomfort, risks, and costs.
DEFINING AN EFFECTIVE SCREENING TOOL
Effective screening tests should meet a number of conditions9:
Designed to detect asymptomatic and early stage disease
Highly sensitive and highly specific (see below) to pick up most cases of true disease and avoid false-positives
Targeted toward populations with a higher disease prevalence (high positive predictive value)
Relatively safe and cost-effective to society (as defined by “willingness to pay” thresholds that vary by country or health system)
Screen for diseases in which early identification and treatment have been demonstrated to improve clinical outcomes
In 1968, the World Health Organization published formal guidelines for the principles of screening, often referred to as “Wilson’s Criteria” after the lead author, Dr James M.G. Wilson (Table 14-1).10 These tenets form the foundation for screening tests.
World Health Organization (WHO) principles of screening guidelines
|
|
|
|
|
|
|
|
|
|
There are different types of screening strategies that are commonly used in society, including:
Mass screening: the screening of an entire population or a subgroup, offered to all, irrespective of the risk status of the individual.
Examples include pap screening at age 21 regardless of sexual history; or HIV screening for all adolescents and adults aged 15 to 65 years (as recommended by the USPSTF).
High-risk or selective screening: screening conducted only among populations at a particular risk for the disease.
For instance, many genetically linked disorders are part of prenatal testing for specific patient groups (eg, cystic fibrosis in Northern European patients; Tay Sachs in Ashkenazi Jewish patients). Another example is early breast and ovarian cancer screening in women with a genetic disorder that puts them at high risk for these diseases (BRCA+ mutation).
Multiphasic screening: a technique of screening populations that combines the application of two or more screening tests (a “battery of screening tests”) to a large population at one time instead of carrying out separate screening tests for single diseases.
This strategy is often employed by Health Maintenance Organizations (HMOs) and corporate wellness programs to screen for “metabolic syndrome” disorders, and are also used frequently for prenatal immunity screening (single blood collection in first trimester to test rubella, hepatitis B, HIV, and other diseases).
It is vital that clinicians have a firm grasp on interpreting the basic test characteristics of all diagnostic tests. Sensitivity and specificity both measure a test’s validity. They describe the test’s ability to correctly detect people with or without the disease in question.11 Therefore, they are critical to the evaluation of screening tests and thresholds. Let us briefly visit these concepts now. The equations for each of these biostatistical terms are presented in Table 14-2.
Biostatistical terms and equations
Term | Definition | Equation |
---|---|---|
Sensitivity | The ability to detect people who do have the disease. | |
Specificity | The ability to detect people who do not have the disease. | |
Positive Predictive Value (PPV) | The likelihood that a person with a positive test result actually has the disease. | |
Negative Predictive Value (NPV) | The likelihood that a person with a negative test result truly does not have the disease. |
The sensitivity of a test is its ability to detect people who actually have the disease. A test that is 100% sensitive would always identify every person with that disease. Very sensitive tests are required in situations where the consequences of a false-negative result (a negative test result obtained in a case where the person does actually have the disease) would be extremely serious.11 For instance, screening donated blood for HIV, in which a highly sensitive test is required to ensure that there are not any false-negative cases leading to accidental transmission of HIV.
Highly sensitive tests are good for ruling out disease. In other words, if a highly sensitive test is negative, one can rest assured that it is exceedingly unlikely that they have the disease in question.
Of course, finding everyone who have the disease is only half of the issue. A test that always returned positive may indeed be a true-positive for every person with the disease (therefore, “highly sensitive”), but it would also be positive for everyone without the disease (false-positive); thus, it would remain a worthless test.
The specificity of a test is its ability to detect people who do not have the disease. Tests that are highly specific are needed particularly in situations where a false-positive test would cause significant harm. For instance, prior to starting a toxic chemotherapy regimen for cancer, the patient and physician need to be quite certain that the test result is true and the patient does indeed have cancer.
Highly specific tests help rule in disease. That is, that if a highly specific test is positive for a certain disease then there is substantial confidence that the patient does in fact have that disease.
Ideally, screening tests will maximize both sensitivity and specificity, but in practice there is almost always a trade-off between the two; sensitivity is increased at the cost of specificity, and vice versa. This is usually because patients with a disease and without a disease are on a continuum and the two groups often overlap each other.11 The test must have a “cut-off” point that is chosen based on the specific test characteristics. Where that cut-off point is chosen may lead to either more missed cases (false-negatives) or more incorrect diagnoses (false-positives). A commonly used strategy is to perform a highly sensitive and rather cheap test first to identify all patients that may have a disease, and then follow up with a more specific (and usually more expensive) test to eliminate false-positive results. This approach is used for HIV testing.
Sensitivity and specificity are very important concepts when designing or evaluating diagnostic tests, and when determining whether a test should be ordered in the first place. But once a test is performed, the individual patient and clinician want to answer a different question altogether. The patient wants to know, based on this test result, how likely is it that I do or do not have the disease. The answer of this question requires knowing the predictive values of the test.
The Positive Predictive Value (PPV) is the proportion of positive results that are true-positives. If a patient tests positive for HIV, the question is: “Given the positive test, how likely is it that I actually have HIV?” The PPV answers this question.
The Negative Predictive Value (NPV) is the proportion of negative results that are true-negatives. If the HIV test returns negative, the question is: “Given the negative test, how likely is it that I really do not have HIV?” The NPV answers this question.
The predictive values vary according to the prevalence of the disease in a specific population group. Using the blood test CA-125 to screen for ovarian cancer may be an excellent example of predictive values and the concept of the trade-offs between sensitivity and specificity. CA-125 may be up to 99% specific, but it is less than 80% sensitive. Since the population prevalence is so low, this results in an abysmal 0.4% PPV, thus rendering it unhelpful as a mass screening tool.12 As Figure 14-1 shows, with these test characteristics if 100,000 women were screened, there would be approximately 10,040 positive results. The problem is that this would be a false-positive for 10,000 of these women, potentially subjecting them to further testing, undue worry, and invasive procedures.
LIMITATIONS AND BIASES IN SCREENING
There are various limitations that are inherent to screening tests and are important for clinicians to consider when interpreting these studies. The apparent effects from screening will always be more favorable than real effects seen in a population.13 This is because screening-detected cases include cases that were diagnosed earlier, progress more slowly, and may never actually cause a clinical problem. There are a number of biases that we review below that inflate the survival of screen-detected cases. The key is to understand the important difference between survival rates and mortality. Whereas mortality rates define the number of people that die of a certain cause in a given year, survival rates calculate the percentage of people with a disease who are still alive a set amount of time after diagnosis. Preventing death, curing the disease, or making the diagnosis earlier can all increase survival rates.14
Consider this illustrative story by Indiana University School of Medicine professor Dr Aaron Carroll from the Incidental Economist blog14:
Let’s say there’s a new cancer of the thumb killing people. From the time the first cancer cell appears, you have 9 years to live, with chemo. From the time you can feel a lump, you have 4 years to live, with chemo. Let’s say we have no way to detect the disease until you feel a lump. The 5-year survival rate for this cancer is about 0, because within 5 years of detection, everyone dies, even on therapy.
Now I invent a new scanner that can detect thumb cancer when only one cell is there. Because it’s the United States, we invest heavily in those scanners. Early detection is everything, right? We have protests and lawsuits and now everyone is getting scanned like crazy. Not only that, but people are getting chemo earlier and earlier for the cancer. Sure, the side effects are terrible, but we want to live.
We made no improvements to the treatment. Everyone is still dying 4 years after they feel the lump. But since we are making the diagnosis 5 years earlier, our 5-year survival rate is now approaching 100%! Everyone is living 9 years with the disease. Meanwhile, in England, they say that the scanner doesn’t extend life and won’t pay for it. Rationing! That’s why their 5-year survival rate is still 0%.
The mortality rate is unchanged. The same number of people are dying every year. We have just moved the time of diagnosis up and subjected people to 5 more years of side-effects and reduced quality of life. We haven’t done any good at all. We haven’t extended life, we’ve just lengthened the time you have a diagnosis.
Dr Carroll’s “thumb cancer” example highlights the problem of lead-time bias from screening tests. Since we determine the length of survival from the time of diagnosis, and screen-detected cases are diagnosed earlier than those detected by signs and symptoms, the measured survival length often increases with screening, even if that patient ends up dying at the same exact time they would have died without the screening test. The patient seems to have lived longer with cancer, but in reality they simply knew about the cancer longer. In Figure 14-2, the patient develops cancer in March 2013 and dies in January 2016. The only difference is that with a screening test, the diagnosis was made in January 2014; without the test, the diagnosis was made based on symptoms in July 2015. With the screening test, the patient is said to have survived for 2 years with cancer. Without the screening test, the patient is considered to have survived for only 6 months with cancer. But the reality is that she died the same date regardless. She simply was aware of the existence of her disease longer.
Lead-time bias helps explain why the commonly used metric of 5-year survival rates to judge the value of cancer screening can potentially be misleading.15
Another issue is that screening overestimates survival duration due to the relative excess identification of slowly progressing disease, a phenomenon known as length-time bias. Screening tests disproportionately find slow-growing cancers compared to rapidly advancing cancers due to a much larger window of opportunity to identify these diseases in an asymptomatic state.13,15 In other words, “the probability of detection is directly proportional to the length of time during which they are detectable (and thereby inversely proportional to the rate of progression).”13 The most important consequence is that the very nature of screening tests selects for cancers that inherently have a better prognosis.15
Figure 14-3 demonstrates this concept, showing how slower growing tumors that are much less likely to kill someone within the given time-span are much more likely to be picked up by screening tests, compared to rapidly progressive disease that both develops and causes death within the screening interval.
Figure 14-3
Length-time bias. Notes: A total of seven patients (lines) are depicted. The length of the line represents the length of the clinical course. Even though the prevalence of disease is nearly the same in this hypothetical cohort, patients that survive during the period of observation (blue lines) are more likely to be detected during a “screening snapshot” than patients that die during the period of observation (black lines). This can skew perceived survival rates.
The extreme of this is “overdiagnosis,” which is discussed in detail further below. Overdiagnosis refers to finding a lesion that is so indolent that if it were untreated it would never go on to cause problems for the individual. In 2012, the USPSTF recommended against routine prostate cancer screening using the prostate-specific antigen (PSA) test for exactly this reason. Before this guideline, PSA testing was widespread. In an effort to curb overuse of PSA, the USPSTF pointed out the multiple harms of overdiagnosis in their recommendations, including the small risk of death from unnecessary surgery.16
There is often something fundamentally different about people who agree to participate in prevention and early detection programs from those who do not. In general, folks that opt to participate in screening or prevention programs tend to be more health-conscious. On average, they are more likely to exercise, they smoke and drink less, and they come from higher socioeconomic classes compared to those that do not participate in these programs.15 Therefore, at baseline this self-selected group of people seem to live longer in general than those that do not participate in these types of programs. In fact, this is such a common finding that epidemiologists have labeled it the “healthy volunteer effect.”15
In the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial—a large, multicenter program that randomized more than 150,000 men and women between 55 and 74 years to a screening or a control arm over many years—the standardized all-cause mortality rate was 43 in participants, whereas the expected standard would be 100.17 Why did the men and women who chose to enroll in this study die so much less frequently (regardless of randomization to the screening arm) than would be expected in the general population? Well, it turns out that these subjects were better educated, more physically active, more likely to be married, and less likely to be current smokers than the average public.17
ENTHUSIASM FOR CANCER SCREENING IN THE UNITED STATES
Americans are generally “enthusiastic” about cancer screening.18 This is true for people in other countries as well. In the United Kingdom, people expressed a clear preference to undergo diagnostic testing for cancer at all risk levels—88% of people surveyed said that they would want to undergo diagnostic testing for cancer even if the risk that their symptom actually represented cancer was only 1%.19 When it comes to cancer, often emotion prevails over cold hard science.20,21 However, interestingly, this fervor for cancer screening does not extend to other healthcare issues, such as heart disease, which actually kills many more women than breast cancer.20
This is not necessarily surprising. There exist many persuasive stories and messages of people touting that a screening test “saved my life.” There is no doubt that this is sometimes true and that certain screening tests have made a substantial contribution to lives being saved. But, over the course of 30 years, screening mammography has had a limited effect on breast cancer mortality in the United States.22 Although the introduction of mammography has been associated with a doubling of early-stage breast cancer cases detected each year (from 112 to 234 cases per 100,000 women), the rate at which women present with late-stage cancer has only decreased by 8% (from 102 to 94 cases per 100,000 women). Therefore, it seems that only 8 of the additional 122 early-stage cancers diagnosed by screening mammography would have been expected to progress to advanced disease.22