Chapter Contents
What Is a Surrogate Endpoint? 193
Advantages of Surrogate Endpoints 194
Disadavantages of Surrogate Endpoints 194
Validation of Surrogate Endpoints 196
Terminological Tangles 196
Levels of Evidence 198
The Way Forward 199
Composite Outcomes 200
Advantages of Composite Outcomes 200
Disadvantages of Composite Outcomes 201
Composite Outcomes in Contemporary Research 202
Conclusion 202
Clinical research should focus on outcomes that matter. For many reasons, that is often not the case. As the health of the population has improved, serious morbidity such as strokes has declined in frequency, and longevity has improved. Although great news for public health, this poses a growing challenge for researchers who now have fewer events to study. Prospective studies with sufficient power to find clinically important differences need to be larger, longer, or both than previously. Moreover, the pressure for regulatory bodies, such as the US Food and Drug Administration, to approve new products is relentless, given the huge economic implications for pharmaceutical companies.
In response to these problems, two workarounds have become common in clinical research: surrogate endpoints and composite endpoints. In this chapter we will review these two alternatives, discuss their advantages and disadvantages, describe some notorious examples, and generally advise against their use.
What is a Surrogate Endpoint?
As its name implies, a surrogate endpoint is a proxy or substitute for a clinical outcome of importance. Some synonyms include ‘intermediate measures’ and ‘surrogate markers’. These endpoints are usually biological processes measured with laboratory tests of blood or body fluids or imaging studies thought to be along the causal pathway to illness. Regrettably, the pathway is often a parallel track not causally involved. Examples of common surrogate endpoints include intraocular pressure as a surrogate outcome for vision loss in glaucoma or blood pressure as a surrogate for myocardial infarction or stroke. In nephrology, declining glomerular filtration rate or rising creatinine or proteinuria serve as surrogates for kidney failure.
Advantages of Surrogate Endpoints
Surrogate endpoints have enormous appeal to researchers conducting randomised controlled trials because of their efficiency. Rather than waiting for years to accrue sufficient numbers of clinical events (e.g., fracture, stroke, or death), the researcher can quickly and cheaply get an answer by focusing instead on a laboratory test or imaging study. Keeping trials short helps to minimise losses to follow-up and deviations from the assigned treatment. Another rationale for using surrogate endpoints is that the clinical endpoint of interest may be so expensive, invasive, or painful that a surrogate is deemed more acceptable to participants. Clinicians tend to be comfortable with surrogate endpoints, such as haemoglobin A 1c , which are part of routine care; short-term effects of treatment are readily observable, as opposed to late complications of diabetes. Another seductive appeal is that surrogate endpoints can give clinicians a sense of mastery over how a treatment should influence disease process. As discussed later, this reassurance is often unwarranted.
Disadavantages of Surrogate Endpoints
Surrogate endpoints may not measure what is intended. Some surrogate endpoints provide ambiguous results, waste resources, and lead clinicians astray. More importantly, changes in surrogate endpoints may not translate into clinical benefit. Indeed, sometimes use of surrogate endpoints for clinical decisions inadvertently harms patients.
The danger of using cardiovascular drugs approved on the basis of benefit on surrogate endpoints has been well documented ( Panel 18.1 ). In each of these examples, the drug was approved using surrogate endpoints; later studies with true clinical endpoints found that they had a paradoxical effect on survival. The most notorious example was the wide use of antiarrhythmic drugs to suppress ventricular irritability after an acute myocardial infarction. This was anticipated to reduce the risk of sudden cardiac death. Indeed, drugs such an encainide and flecainide nicely suppressed premature ventricular contractions—but tripled the risk of death through unknown mechanisms. More than 200,000 patients in the United States received these drugs in clinical practice, and many thousands died needlessly as a result of poor science. Poor-quality research is unethical, because it misleads clinicians and hurts patients.
Drug | Indication | Surrogate Endpoint |
---|---|---|
Aprotinin | High-risk cardiac surgery | Decreased need for blood transfusion |
Clofibrate | Hypercholesterolaemia in healthy men | Decreased serum cholesterol |
Encainide, flecainide | Ventricular irritability after myocardial infarction | Fewer premature ventricular contractions |
Erythropoietin | Anaemia from chronic renal failure | Increased haemoglobin level |
Flosequinan | Chronic congestive heart failure | Improved ventricular function |
Ibopamine | Severe congestive heart failure | Increased exercise tolerance, decreased vascular resistance |
Milrinone | Severe congestive heart failure | Increased cardiac contractility |
Metoprolol | Noncardiac surgery in patients at cardiovascular risk | Decreased postoperative myocardial ischemia |
Moxonidine | Congestive heart failure | Decreased plasma norepinephrine |
Another infamous example is the use of fluorides to treat osteoporosis. A randomised trial compared fluoride with placebo and observed the anticipated 35% increase in bone mineral density. Those given fluoride, however, had a paradoxical increase in both vertebral and nonvertebral fractures. Bones became denser but apparently more brittle. Bone mineral density measures only one characteristic of bone health: bone quantity, not bone quality (the living biomatrix).
In 2004, the US Food and Drug Administration (FDA) placed a black box warning on labelling of depo-medroxyprogesterone acetate (DMPA) injections for contraception for more than 2 years of use ( Fig. 18.1 ). A ‘black box’ is the most serious warning from the FDA, usually reserved for potentially life-threatening adverse effects of a drug. To our knowledge, DMPA is the only modern contraceptive never linked to a death. The warning was based on the transient effect of DMPA on bone mineral density, similar to that with breastfeeding. As in the previous fluoride example, bone mineral density was known not to be a valid predictor of fracture. In response to the FDA’s alarmist labelling, some gynaecologists started ordering bone mineral density tests for teenagers and began prescribing them supplemental oestrogen and bisphosphonates. In contrast, the World Health Organization recommends no restrictions on DMPA for contraception ( Fig. 18.2 ). The FDA’s black box warning based on an invalid surrogate endpoint has been widely criticised as not evidence-based.
Rosiglitazone reached the market based on its lowering of haemoglobin A 1c in patients with type 2 diabetes; subsequent studies revealed the drug was associated with a modest increase in the risk of myocardial infarction and death from cardiovascular disease. This discovery (and a subsequent black box warning from the FDA) had a chilling effect on its use. A trial of the effect of tighter control of type 2 diabetes found that reducing this surrogate endpoint led to a paradoxical increase in deaths through other mechanisms.
The FDA continues to use surrogate endpoints inappropriately. Multiple drug-resistant tuberculosis is a deadly infection, and the need for novel treatments is acute. In 2012 the FDA approved a new type of drug, bedaquiline, based on its effect on a surrogate endpoint of unknown validity: conversion of the patient’s sputum culture from positive to negative for Mycobacterium tuberculosis. Instead of requiring large trials of clinical efficacy, the drug approval was based on two modest-sized trials of 47 and 161 patients, and the effect on sputum culture was not dramatic. What was dramatic was the paradoxical five fold increase in death among those who received the new drug. Here the surrogate endpoint (sputum culture) trumped the true endpoint (death) for unknown reasons. Nonetheless, the drug was approved despite being more lethal.
Validation of Surrogate Endpoints
As cautioned by Fleming and DeMets, ‘A correlate does not a surrogate make’. Decades after this warning, clinicians, researchers, and drug regulators remain largely unaware of this distinction. Two criteria must be met to validate a surrogate endpoint. First, the surrogate must correlate with the true clinical endpoint. This criterion is generally met. Second, the surrogate must fully capture the effect of the treatment on the true clinical endpoint of interest. Sadly, this criterion is almost never satisfied. To meet both criteria requires at least one prospective study using both the surrogate and true endpoints. Avoiding such a large, expensive, and time-consuming study is the rationale for using surrogates in the first place. Hence such studies are rarely done.
This two-step approach to validation has been challenged in recent years. As an alternative, sophisticated approaches are now being considered. Meta-analysis of the literature is being considered. However, aggregation of incomplete information cannot plug the gaps. Empirical evidence of validity is still needed.
A small number of surrogate endpoints have been established as valid. For patients infected with HIV, viral RNA load has been shown through several confirmatory trials with different drugs to capture the effect of treatment on disease progression and death. Other examples include haemoglobin A 1c as a surrogate for microvascular complications of diabetes and low-density lipoprotein cholesterol lowering as a surrogate for cardiovascular disease concerning statins.
Terminological Tangles
Terms used to describe surrogate endpoints, risk markers, and biomarkers are inconsistent and confusing. In an attempt to provide greater clarity, the National Institutes of Health promulgated suggested terminology in 2001. Next, the Institute of Medicine (now National Academy of Medicine) weighed in with a scholarly tome of more than 300 pages. More recently, the FDA and the NIH took another stab at nomenclature with a regulatory focus; it had an obligatory acronym: BEST ( B iomarkers E ndpoint S , and other T ools) Resource.
In the glossary of this document, surrogate endpoints are ranked by their credibility ( Panel 18.2 ).
Endpoint
A precisely defined variable intended to reflect an outcome of interest that is statistically analyzed to address a particular research question. A precise definition of an endpoint typically specifies the type of assessments made, the timing of those assessments, the assessment tools used, and possibly other details, as applicable, such as how multiple assessments within an individual are to be combined.
Clinical Endpoint
A characteristic or variable that reflects how a patient [or consumer] feels, functions, or survives. Example: death.
Intermediate Clinical Endpoint
In a regulatory context, an endpoint measuring a clinical outcome that can be measured earlier than an effect on irreversible morbidity or mortality (IMM) and that is considered reasonably likely to predict the medical product’s effect on IMM or other clinical benefit. The intermediate clinical endpoint may be a basis for full approval if the effect on the endpoint is considered clinically meaningful. It may also be a basis for accelerated approval if the IMM effect is considered critical for use of the drug or for expedited access for medical devices intended for unmet medical need for life threatening or irreversibly debilitating diseases or conditions.
Example: Exercise tolerance has been used as an intermediate clinical endpoint in trials of device treatments for heart failure.
Example: A treatment for preterm labor was approved based on a demonstration of delay in delivery. Under accelerated approval, the sponsor was required to conduct postmarketing studies to demonstrate improved long-term postnatal outcomes.
Surrogate Endpoint
An endpoint that is used in clinical trials as a substitute for a direct measure of how a patient feels, functions, or survives. A surrogate endpoint does not measure the clinical benefit of primary interest in and of itself, but rather is expected to predict that clinical benefit or harm based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence.
From a U.S. regulatory standpoint, surrogate endpoints and potential surrogate endpoints can be characterized by the level of clinical validation:
Validated Surrogate Endpoint
An endpoint supported by a clear mechanistic rationale and clinical data providing strong evidence that an effect on the surrogate endpoint predicts a clinical benefit. Therefore, it can be used to support traditional approval without the need for additional efficacy information.
Example: Hemoglobin A 1c (HbA 1c ) reduction is a validated surrogate endpoint for reduction of microvascular complications associated with diabetes mellitus and has been used for the basis for approval of drugs intended to treat diabetes mellitus.
Example: HIV-RNA reduction is a validated surrogate endpoint for human immunodeficiency virus (HIV) clinical disease control and has been used for the basis for approval of drugs intended to treat HIV.
Example: Low-density lipoprotein (LDL) cholesterol reduction is a validated surrogate endpoint for reduction of cardiovascular events and has been used for the basis for approval of statins.
Example: Blood pressure reduction is a validated surrogate endpoint for reduction in rates of stroke, myocardial infarction, and mortality and has been used for the basis for the approval of drugs intended to treat hypertension.
Reasonably Likely Surrogate Endpoint
An endpoint supported by clear mechanistic and/or epidemiologic rationale but insufficient clinical data to show that it is a validated surrogate endpoint. Such endpoints can be used for accelerated approval for drugs or expedited access for medical devices. In the case of accelerated approval for drugs, additional trial data, assessing the effect of the intervention on the clinical benefit endpoint of interest will be collected in the post-marketing setting to verify whether an effect on the reasonably likely surrogate actually predicts clinical benefit in the specific context under study.
Example: Outcomes of 6-month follow-up treatment (i.e., sputum culture status and infection relapse rate) have been considered reasonably likely to predict the resolution of pulmonary tuberculosis and have supported accelerated approval of drugs to treat tuberculosis.
Candidate Surrogate Endpoint
An endpoint still under evaluation for its ability to predict clinical benefit.