If the extrapolation of nonclinical safety data could perfectly predict human responses, then most issues concerning extrapolation would not exist. Likewise, if there was absolutely no utility of extrapolating animal data to humans, there would also be little need for as much detailed preclinical investigations as required by regulatory agencies. The actual situation lies in the gray area between these two extremes. The major discussion on extrapolating safety data is organized around the following four questions or issues. Based on the information presented in addressing these questions and also based on data in the literature, a number of principles are then described.
How Does One Evaluate Whether Nonclinical Safety Data Can Be Extrapolated to Humans?
The most direct approach for determining the extrapolatability of nonclinical results for humans is to measure retrospectively the correlation between results obtained in animals and humans. Even the most accurate data, however, do not enable one to predict whether extrapolation for the next compound tested will yield false-negative, false-positive, or correct conclusions about the effects that will be observed in humans.
One of the biggest reasons why it is difficult to retrospectively investigate the accuracy of extrapolation from nonclinical safety studies to humans is that most drugs that are highly toxic in nonclinical studies are never tested in humans. Thus, the only drugs for which retrospective analysis is possible are those with relatively favorable nonclinical safety profiles. Consequently, retrospective analysis can address the question of how well favorable nonclinical safety results can be extrapolated to humans, but cannot say anything about how well unfavorable nonclinical safety results might predict human safety. This problem makes it extremely difficult to study accurately the predictive value of nonclinical safety studies.
How Reliable Are the Animal Toxicity Data Collected?
This question does not focus on the quality of the data obtained in specific laboratories, although that is sometimes an important consideration. If Good Laboratory Practices regulations are in force, the staff is able and experienced, the facilities are appropriate, and the equipment is up to date, then the quality of the data collected should be acceptable and not be an issue. The issues raised by this question are (a) how consistent are the data obtained, (b) are the numbers of animals used in nonclinical safety studies sufficient to detect uncommon adverse events, and (c) are differences in interpretation of toxicity results among laboratories relatively common and are such differences important? A number of other issues relating to the extrapolation of safety data to humans are discussed in
Chapter 88 of
Guide to Clinical Trials (
Spilker 1991).
If the rate of false positives and false negatives for extrapolating safety data were less than 5%, one might take the position that toxicological data should be accepted as valid, but the larger percentage of false positives and negatives reported in the literature means that the toxicity of all potential drugs must be determined in humans. Nonetheless, only compounds with toxicity profiles that are judged as meeting certain regulatory standards may be ethically tested in humans. Therefore, some potentially valuable drugs are lost because their toxicity in animals is judged greater than what would be acceptable for testing in humans, even though some of those drugs are unlikely to be as toxic in humans as in animals.
Nonclinical safety studies almost always employ relatively few animals, compared to the number of patients from which clinical safety data are obtained. In fact, for most drugs, far more humans are exposed during clinical development than animals exposed during nonclinical development. Consequently, if one assumes that rare adverse events in humans are also rare in animals, then many, if not most uncommon adverse events in humans will not be observed in nonclinical safety studies.
There is an old joke among pathologists that if you get five pathologists together, you get eight separate opinions on
interpreting a histological slide. One of the reasons for this is that no consensus exists among pathologists as to whether pathologists should read tissue slides and interpret specimens blinded or unblinded. The argument for reading slides unblinded states, in part, that knowledge of the clinical diagnosis helps the pathologist better interpret the data, since numerous types of interpretations could usually be made. An extremely defensive editorial in support of unblinded slide reading (
Society of Toxicologic Pathology 1986) does not present objective evidence to support its actual position, but ironically presents reasons to support blinding (e.g., “The long-standing practice of open or nonblinded slide reading is based on the fact that morphologic diagnostic pathology is a highly subjective and complex discipline.”) and even fails to consider various methods for blinding slides (e.g., blinding only to treatment group). The argument for blinded reading is based on the notion that the biases that readily enter data analysis and interpretation are minimized. See the paper by
Crissman et al. (2004) in the additional readings listing for a recent position paper on this topic.
What Do Literature Data Show about Extrapolating Animal Safety Data?
Ralph Heywood (1990) summarized a correlation of adverse events in humans and animal toxicology data and stated that it was in the range of 5% to 25%. One reason for such poor correlations is that many toxicology studies are conducted using standard study designs without full consideration of how they should be modified to consider human pharmacokinetics, metabolism, and methods of use (e.g., manner of administering the drug or the frequency of administration). Heywood quoted other studies (
Heywood 1981;
Falahee et al. 1983) showing that the correlation between toxicological results in rats and a non-rodent species was about 30%.
Fletcher (1978) predicted, based on 45 drugs studied, that 25% of the toxic effects in animals would occur in humans.
Heywood (1990) states that only four of 22 major adverse events observed in humans since 1960 were predictable from animal studies, and another two adverse events were questionable. It is therefore apparent that most of this group of adverse events could not be predicted using animal studies.
Litchfield (1962) evaluated six compounds studied in humans, rats, and dogs and calculated the likelihood that (a) adverse events would be found in humans if they were found in both rats and dogs and (b) adverse events would not be found in humans if they were only found in one animal species. He found that 68% of the toxic effects observed in both rats and dogs were found in humans and only 21% of toxic effects found in a single animal species were found in humans. He found that for the specific drugs tested, the dog yielded better data than did the rat for predicting human responses (Schein et al. 1961). The best correlations between animal and human data were reported for gastrointestinal complaints, especially vomiting. Schein et al. (1961) reported that Litchfield’s analysis overstated the results by not accounting for the large number of false negatives in animals, which accounted for 68% of the toxicity observed in humans. Selected reasons for false-positive and false-negative observations in toxicology studies are listed in
Tables 13.2 and
13.3. Additional discussions on this topic are presented in
Animal Toxicity Studies: Their Relevance for Man (
Lumley and Walker 1990).
Three reasons were given by
Johnsson, Ablad, and Hansson (1984) to explain why it is difficult to relate human adverse events to animal data: (a) subjective adverse events are not detectable in animals (e.g., dizziness, headache, and nausea), (b) drug doses (and plasma levels) are often excessive in animal studies, and (c) immunological effects are difficult to detect in animals. A detailed discussion of this topic for a single hepatotoxic drug is given by
Clarke et al. (1985).