Chapter 5 Health is necessarily a relative concept.27 However, to say that health is relative implies that the condition of individuals must be related to something. Ideally, an observed value in an individual should be related to relevant collections of reference values, such as values from healthy persons, from the undifferentiated hospital population, from persons with typical diseases, and from ambulatory individuals, along with previous values from the same subject.78 A patient’s laboratory result simply is not medically useful if appropriate data for comparison are lacking. Historically, the term normal values was frequently used to refer to medical data used for purposes of comparison. However, use of the term often leads to confusion because the word “normal” has several different connotations.59 For example, three medically important but very different meanings of “normal” are given in the following: 1. Statistical sense: Values are often qualified as “normal” if their observed distribution seems to follow closely the theoretical normal distribution of statistics—the Gaussian probability distribution. This use of “normal” has sometimes misled people to believe that the distribution of biological data is symmetric and bell shaped, like the Gaussian distribution. But on closer examination, this usually is not correct. To exorcize the “ghost of Gauss,” Elveback and colleagues recommend not using the term normal limits.20 For a similar reason, the term normal distribution should be avoided and replaced by the term Gaussian distribution. 2. Epidemiologic sense: Another meaning of “normal” is illustrated by the following statement: It is “normal” to find that the activity of gamma-glutamyltransferase (GGT) in serum is between 7 and 47 IU/L, whereas it is considered “abnormal” to have a serum GGT value outside these limits. Here a more exact statement would read as follows: Approximately 95% of the values obtained, when the activity of GGT in sera collected from individuals considered to be healthy is measured, are included in the interval 7 to 47 IU/L. The obsolete concept of normal values in part carried this meaning. Alternative terms for “normal” in this sense are common, frequent, habitual, usual, and typical. 3. Clinical sense: The term “normal” also is often used to indicate that values show the absence of certain diseases or the absence of risks for the development of diseases. In this sense, a normal value is considered as a sign of health. Better descriptive terms for such values are healthy, nonpathologic, and harmless. To prevent the ambiguities inherent in the term normal values, the concept of reference values was introduced and implemented in the 1980s.28,78 This was an important event in establishing a scientific basis for clinical interpretation of laboratory data.84 The International Federation of Clinical Chemistry and Laboratory Medicine (IFCC) recommends the term reference values and related terms, such as reference individual, reference limit, reference interval, and observed values.42 The definitions given below and the presentation in the following sections of this chapter are in accordance with IFCC recommendations.* The definition of reference values is based on that of the reference individual42: A reference value may then be defined as follows42: The observed value is defined as follows42: The IFCC also defines other terms related to the concept of reference values: reference population, reference sample group, reference distribution, reference limit, and reference interval.42 Some of these terms are introduced in later sections of this chapter. The term reference range is sometimes used for the IFCC-recommended term reference interval. This use is incorrect, as the statistical term range denotes the difference (a single value!) between maximum and minimum values in a distribution.45 The terms reference limits and clinical decision limits should not be confused.63,84 Reference limits is descriptive of the reference distribution; they tell us something about the observed variation of values in the selected subset of reference individuals. Comparison of new values with these limits conveys information about similarity to the given reference values. In contrast, clinical decision limits provide optimum separation among clinical categories. The latter limits may be based on analysis of reference values from several groups of individuals (healthy persons and patients with relevant diseases) and are used for the purpose of differential diagnosis.28,63,81 Alternatively, such values are established scientifically on the basis of outcome studies and are used as clinical guidelines for treatment. Examples of current decision limits include the National Cholesterol Education Program guidelines for cholesterol,21 the American Diabetes Association recommendations for glycated hemoglobin,60 and the American Academy of Pediatrics guidelines on neonatal bilirubin5; each assumes that measurements of the involved analytes are accurate. In this context, it is critical to point out another difference between reference limits and clinical decision limits. For most analytes, a laboratory should establish (or verify) its own reference limits. This is especially true for new analytes. But for other analytes, in particular those with clinical decision limits, physicians tend to use the national (or international) guidelines. In the 2010 Clinical and Laboratory Standards Institute (CLSI) guidelines,15 this point is given much-deserved emphasis. Laboratory efforts that once would have been dedicated to establishing or verifying reference intervals should, for these analytes, be redirected toward establishing accuracy. It does little good to establish one’s own reference limits if physicians will (and should) use national guidelines. Methods to establish the accuracy of one’s method are discussed in Chapters 2 and 8. Subject-based reference values are previous values from the same individual, obtained when he or she was in a known state of health. Population-based reference values are those obtained from a group of well-defined reference individuals and are usually the types of values referred to when the term reference values is used with no qualifying words. This chapter deals primarily with population-based values. It should be noted, however, that for some tests, intraindividual variation may be small relative to interindividual differences. In such cases (e.g., creatinine,25 immunoglobulins80), population-based reference intervals may actually mask clinically significant intraindividual changes, as noted later in this chapter. Certain conditions apply for a valid comparison between a patient’s laboratory results and reference values19: 1. All groups of reference individuals should be clearly defined. 2. The patient examined should sufficiently resemble the reference individuals (in all groups selected for comparison) in all respects other than those under investigation. 3. The conditions under which the specimens were obtained and processed for analysis should be known. 4. All quantities compared should be of the same type. 5. All laboratory results should be produced using adequately standardized methods under sufficient analytical quality control (see Chapters 3 and 8). To these general requirements one may add others that become necessary when more advanced techniques for decision making are applied.84 6. Stages in the pathogenesis of diseases that are the objectives for diagnosis should be demarcated. For example, although some overlap occurs, the clinical grades of congestive heart failure (CHF) are distinguished by progressive increases in levels of N-terminal (NT)-proBNP.65 7. Clinical diagnostic sensitivity and specificity, prevalence, and clinical costs of misclassification should be known for all laboratory tests used. For example, in some instances, one might want to know whether a given BNP (or NT-proBNP) value is “healthy,” in which case one would want to use reference values for age- and gender-matched individuals with no evidence of CHF. In contrast, faced with a patient complaining of shortness of breath in the emergency room, one might want instead to know, not so much whether any degree of CHF is present, but whether the patient’s CHF is sufficiently advanced to be the cause of the shortness of breath.49,54 A set of selection criteria determines which individuals should be included in the group of reference individuals.42,78 Such selection criteria include statements describing the source population and specifications of criteria for health or for the disease of interest. Often, separate reference values for each sex and for different age groups,8A as well as other criteria, are necessary. Our group of reference individuals therefore may have to be divided into more homogeneous subgroups. For this purpose, specific rules for the division, called stratification or partitioning criteria, are needed. There is an obvious requirement for health-associated reference values for quantities measured in the clinical laboratory. But the concept of health27 is problematic; much confusion may arise if the selection criteria for health are not clearly stated for a specific project. Gräsbeck suggested the following general definition of health, which summarizes the relative, privative, and goal-oriented aspects discussed previously27: Health is characterized by a minimum of subjective feelings and objective signs of disease, assessed in relation to the social situation of the subject and the purpose of the medical activity, and it is in the absolute sense an unattainable ideal state. Several methods have been suggested for the selection of reference individuals. Table 5-1 shows a variety of concepts that may be used to describe a sampling scheme. The concepts of each pair are mutually exclusive. For example, the sampling may be direct or indirect. One may, however, combine one concept from several pairs to obtain a more exact description. For example, the selection may be direct, a priori or a posteriori, and nonrandom. TABLE 5-1 Strategies for Selection of Reference Individuals *Note: The terms a priori and a posteriori signify in this context “before” and “after” and refer to when inclusion criteria are applied. Direct selection of reference individuals (see Table 5-1) concurs with the concept of reference values as recommended by the IFCC,42 and it is the basis for the presentation in this chapter. Its only disadvantages are the problems and costs of obtaining a representative group of reference individuals. These practical problems have led to the search for simpler and less expensive approaches such as the indirect method.6,28 This method is based on the observation that most analysis results produced in the clinical laboratory seem to be “normal.” An example of an indirect method is shown in Figure 5-1. As seen, the values of serum sodium concentrations from hospitalized patients have a distribution with a preponderant central peak and a shape similar to a Gaussian distribution. The underlying assumption of the indirect method is that this peak is composed mainly of normal values. Advocates of the method therefore claim that it is possible to estimate the normal interval if the distribution of normal values from this distribution is extracted. However, as shown in Figure 5-1, normal limits determined by the indirect method on the basis of this distribution would be seriously biased compared with the health-associated reference limits. Note, for example, the substantial proportion of values below 135 mmol/L—the true, health-associated, lower reference limit. (The term “normal” is used here intentionally to distinguish between the concepts of normal values and reference values.) Several mathematical methods have been used to extract the distribution of normal values from routine laboratory data.6,28 The indirect method, however, has at least two major deficiencies: 1. Estimates of the lower and upper normal limits depend heavily on the particular mathematical method used and on its underlying assumptions. 2. The indirect method destroys the scientific basis for obtaining and comparing reference values. The results for each hospital would depend on the characteristics of the hospital’s patient group at that particular time. These results would vary not only across hospitals but for the same hospital at different times. The outcome would be a compilation of unstable values for each analyte. Hospital databases may, however, be used for the establishment of reference values that are fully concordant with IFCC recommendations.46,76 The requirement is that laboratory data should be combined with information stored in clinical databases (i.e., to apply a direct sampling strategy instead of the distribution-based indirect method). Laboratory results are to be used as reference values only if stated clinical criteria are fulfilled. One may define criteria for selecting individuals who have a specified state of health or the disease for which reference data are necessary. Usually certain constraints are imposed on the use of their laboratory results, such as allowing only one result of each analyte under study from each selected individual. Such reference values have a potential advantage over those based on direct sampling from other types of populations: hospital-based reference values are ideal for interpretation of results from hospitalized patients because they are produced under similar conditions. When carefully performed, both a priori and a posteriori sampling (see Table 5-1) may result in reliable reference values. The choice is often a question of practicality. Both require the same set of successive steps, but the order of some of these operations differs depending on the mode of selection: a priori or a posteriori.28 The first step in the process of producing reference values for a laboratory test should always be the collection of quantitative information about sources of biological variation for the analyte studied. A search through relevant literature may yield the required information (see Chapter 6).71,85 If relevant information cannot be found in the literature, pilot studies may be necessary before the selection of reference individuals is planned in detail. A study performed in Kristianstad, Sweden,8 highlights a practical problem often met when reference individuals are selected: the number of subjects fulfilling the inclusion criteria may be too small. In this study, only 17% of participants were accepted into the study, according to the criteria used, leaving an insufficient reference sample group. The frequency of exclusion was higher among women and in older age groups. This problem has two possible solutions: 1. The exclusion criteria may be relaxed. As already discussed, the set of relevant sources of biological variation differs among different analytes. One may define a minimum set of exclusion criteria for a given laboratory test. In the Kristianstad study, the complete group of individuals could probably be used for establishment of reference values for serum sodium, and most of the individuals would be acceptable for the determination of reference values for several other analytes.8 2. Another design of the sampling procedure could reduce the practical problems and costs of obtaining a sufficiently large group of reference individuals. The Kristianstad study showed that 75% of excluded subjects could have been identified using only a simple questionnaire.8 In the upper age group, this percentage was even higher. Therefore, preliminary screening of a large number of individuals from the parent population, using a carefully designed autoanamnestic questionnaire (i.e., of or related to the current or previous medical history of a patient), would result in a much smaller sample of individuals for examination clinically and by laboratory methods. If 3000 individuals had been prescreened in Kristianstad, and if only the individuals remaining in the reduced sample were subjected to a closer examination, a group of 240 reference individuals would have been obtained. For several reasons, most collections of reference values are, in fact, obtained by a nonrandom process.33 This means that all possible reference individuals in the entire population under study do not have an equal chance of being chosen for inclusion in the usually much smaller sample of individuals studied. A strictly random sampling scheme in most cases is impossible for practical reasons. It would imply the examination of and application of inclusion criteria to the entire population (thousands or millions of persons), and then the random selection of a subset of individuals from among those accepted. The selection of reference individuals consists essentially of applying defined criteria to a group of examined candidate persons.42 The required characteristics of the reference values determine which criteria should be used in the selection process. Box 5-1 lists some important criteria to consider when production of health-associated reference values is the aim. Similar problems affect the definition of hypertension in relation to the establishment of health-associated reference values and exclusion criteria based on laboratory examinations. It has been argued that a circular process might happen when laboratory tests are used to assess the health of subjects who are subsequently used as healthy control subjects for laboratory tests. But actually there is no difference, in this context, between measuring height, weight, and blood pressure and performing selected laboratory tests, provided that these laboratory tests are neither those for which reference values are produced nor tests that are significantly correlated with them.27 It is particularly difficult to define selection criteria when establishing reference values for a geriatric population.22 In higher age groups, it is “normal” to have minor or major diseases and to take drugs. One solution is to collect values at one time and to use the values of survivors after a defined number of years.27,61 It may also be necessary to define partitioning criteria for the subclassification of the set of selected reference individuals into more homogeneous groups (Box 5-2).42 (The question of determining when stratification of the reference sample group is necessary and justified is discussed in later sections.) In practice, the number of partitioning criteria should usually be kept as small as possible to ensure sufficient sample sizes to derive valid estimates. Age and gender are the most frequently used criteria for subgrouping, because several analytes vary notably among different age and gender groups (see Chapter 6).22,71,85 Age may be categorized by equal intervals (e.g., by decades) or by intervals that are narrower in the periods of life where greater variation is observed. In some cases, it is more convenient to use qualitative age groups, such as (1) postnatal, (2) infancy, (3) childhood, (4) prepubertal, (5) pubertal, (6) adult, (7) premenopausal, (8) menopausal, and (9) geriatric. Height and weight also have been used as criteria for categorizing children. Several preanalytical factors influence the values of biological quantities, such as the concentrations of components in blood and in other specimens and the amount excreted in feces, urine, or sweat. This topic is covered elsewhere (see Chapter 6).* In this discussion, only aspects of special relevance to the generation of reliable reference values are highlighted.4,42Preanalytical standardization of the (1) preparation of individuals before specimen collection, (2) procedure of specimen collection itself, and (3) handling of the specimen before analysis may eliminate or minimize bias or variation from these factors. This reduces biological “noise” that might otherwise conceal important biological “signals” of disease, risk, or treatment effect. 1. Only such factors that may be relatively easily controlled in the clinical setting should be part of the standardization when reference values are produced. 2. The rules for preanalytical standardization when reference values are produced (Table 5-2) should also be used for the clinical situation. For example, it has been shown that it is possible to apply these rules rather closely in the clinical setting for both hospitalized and ambulatory patients.78 The same philosophy forms the basis for recommendations concerning routine blood specimen collection.† The magnitudes of preanalytical sources of variation clearly are not equal for different analytes (see Chapter 6).‡ In fact, some believe that only those factors that cause unwanted variation in the biological quantities for which reference values are being generated should be considered. For example, body posture during specimen collection is highly relevant for the establishment of reference values for nondiffusible analytes, such as albumin in serum, but irrelevant for establishment of serum sodium values.23 Alternatively, several constituents are analyzed routinely in the same clinical specimen. Therefore, it would be impractical to devise special systems for every single type of quantity.78 Consequently, three standardized procedures for blood specimen collection by venipuncture have been recommended4,28: (1) collection in the morning from hospitalized patients, (2) collection in the morning from ambulatory patients, and (3) collection in the afternoon from ambulatory patients. Table 5-2 summarizes these procedures. However, such schemes have to be modified depending on local conditions and necessities and on the intended use of the reference values produced. Published checklists42,78 may be helpful in the design of a scheme. A special problem is caused by drugs taken by individuals before specimen collection,44,86,93 and it may be necessary to distinguish between indispensable and dispensable medications. If possible, dispensable medication should always be avoided for at least 48 hours. The use of indispensable drugs, such as contraceptive pills or essential medication, may be a criterion for exclusion or partitioning. In emergency or other unplanned clinical situations, even a partial application of the standardized procedure for collection has been shown to be of great value.28 An empirical approach78 is to produce other sets of reference values, such as postprandial values, postexercise values, or postpartum values.28 Such a method, however, is very expensive and does not cover all situations that could possibly arise. Another, more general solution to the problem is called the predictive approach.78 Starting from a set of ordinary reference values and using quantitative information on the effects of various factors, such as (1) intake of food, alcohol, and drugs; (2) exercise; (3) stress; or (4) posture, expected reference values that fit the actual clinical setting (see Chapter 6) could be estimated.71,85 Essential components of the required definition of a set of reference values are specifications concerning (1) the analysis method (including information on equipment, reagents, calibrators, type of raw data, and calculation method), (2) quality control (see Chapter 8), and (3) reliability criteria (see Chapter 2).28,42 It is often claimed that analytical quality should be better when reference values rather than routine values are produced. This may be true for accuracy; all measures should be taken to eliminate bias. The question of imprecision is more difficult because it depends in part on the intended use of the reference values. Increases in analytical random variation result in widening of the reference interval.28 For some special uses of reference values, the narrower reference interval obtained by a more precise analytical method may be appropriate. However, this usually is not true for routine clinical use of reference values. Interpretation is simplest if a patient’s values and reference values are comparable with regard to analytical imprecision. For the same reason, it is advisable to analyze specimens from reference individuals in several runs to include between-run components of variation. A safe way to obtain comparability is to include these specimens in routine runs together with real patient specimens.
Establishment and Use of Reference Values
The Concept of Reference Values
Interpretation by Comparison
Normal Values—an Obsolete Term
Terminology
Clinical Decision Limits
Types of Reference Values
Subject-Based and Population-Based Reference Values
Requirements
Selection of Reference Individuals
Concept of Health in Relation to Reference Values
Strategies for Selection of Reference Individuals
Direct
Individuals are selected from a parent population using defined criteria.
Indirect
Individuals are not considered, but certain statistical methods are applied to analytical values in a laboratory database to obtain estimates with specified characteristic.
A priori*
Direct method (see above) in which individuals are selected for specimen collection and analysis if they fulfill defined inclusion criteria.
A posteriori
Direct method using an already existing database containing both analysis results and information on a large number of individuals. Values of individuals fulfilling defined inclusion criteria are selected.
Random
Process of selection giving each item (individual or test result) an equal chance of being chosen.
Nonrandom
Process of selection giving each item an unequal chance of being chosen.
Direct or Indirect Sampling?
A Priori or A Posteriori Sampling?
Random or Nonrandom Sampling?
Selection Criteria and Evaluation of Subjects
Partitioning of the Reference Group
Specimen Collection
Preanalytical Standardization
Analyte-Specific Considerations
The Necessity for Additional Information
Analytical Procedures and Quality Control