Element
Risk
Mitigation
Antibody–epitope specificity
Cross-reactivity (shared epitopes) and neoepitopes
False or unexpected positives
Be aware of expected antibody specificity and reported patterns of “unexpected” staining; repeat tests with unexplained patterns
Conditional epitopes
False positives
Disregard nonspecific or unexpected staining in nontarget cell compartments (e.g., nuclear staining using an antibody to a membrane-based determinant) unless reproducible (idiosyncratic) or documented in the literature; repeat tests with unexpected results
Fixation and retrieval
Underfixation
False negatives
Adhere to minimum recommended fixation conditions with aldehyde chemistry in mind
Overfixation
No significant risk
If laboratory process and workflow allow, try to allow for at least 12 h of fixation
Excess or inappropriate retrieval
Creation of conditional antigens (see earlier in the text), altered limits of detection (false positives)
Review pretreatment protocols on a regular basis; evaluate and confirm retrieval conditions and antibody titers before use
Controls
Deletion of negative reagent controls
False positives, especially with the use of nonpolymer detection systems or polyclonal primary antibodies
Repeat test with appropriate controls; if background staining interferes with test interpretation, reinstate negative reagent controls until the issue is resolved; adhere to recommended uses of negative tissue and reagent controls
Wrong positive control
Cannot verify a negative test result
Repeat test with appropriate controls
Antibody optimization/calibration
Antibody optimized to limits of detection
False positives
Reoptimize with diagnostic or clinical target in mind (use to fit)
Antibody optimized to intended clinical use
False negatives
Use appropriate positive and negative controls to test limits of clinically relevant expression
Assay validation
Validation of a diagnostic immunohistochemical assay using only normal tissues
Risks relevant for all circumstances relating to validation: false positives and negatives due to unexpected/untested tissue heterogeneity
Use validation targets that reflect the range of clinical utility; normal ones are satisfactory when supplemented with high- and low-expressing diseased samples
Validation using only tissue microarrays (TMAs)
Poor overall and positive/negative concordance against a previously validated assay
Supplement TMAs with whole-tissue sections; be familiar with documented use of TMAs for target antigens that are typically heterogeneous or focal in distribution; avoid TMAs when the literature does not support their use
Validation against a small set of tissue samples
Failure to recognize nontarget compartment staining pattern that might be revealed by evaluation of a broader validation sample
Consider expanding a small validation set with additional positive cases; internal negatives can be used to provide a complementary increase in negative samples
Antibody–Epitope Specificity
If we use IHC as illustrative of ancillary testing in general, we need to recognize that what constitutes the specificity and sensitivity of an immunohistochemical test is not necessarily the same as it might be for detection of an analyte in the chemistry laboratory. In the latter setting, the technical validity of a reagent is measured quantitatively against a known standard. In IHC, there often is no gold standard, and the technical validation of a new antibody reagent often relies on the expectation of positive and negative results based on histologic and clinical context or cross-validation with a non-IHC methodology that assesses a presumably related phenomenon. The most relevant example of the latter, in current practice, is the comparison of IHC and fluorescent in situ hybridization (FISH) for Her2/neu [2], where one assumes that clinically relevant overexpression detected by IHC can be predicted with high reliability from gene amplification status detected by FISH. In short, technical validation in IHC almost never is based on the known presence or absence of a given marker based on an extramorphologic chemical determination. With that in mind, we need to briefly explore certain elements of the antibody–epitope relationship in the clinical milieu, which for the purposes of this discussion, is the application of antibodies to formalin-fixed, paraffin-embedded (FFPE) tissues.
Antibodies, however prepared for clinical use, are reasonably monospecific for single epitopes, though this statement needs to be qualified. It is a given that commercial polyclonal heteroantisera (of diminished use in today’s laboratory, though not without persistent exceptions, including antisera to a variety of polypeptide hormones, selected infectious agents, and immunoglobin heavy and light chains—in both immunofluorescent and IHC applications—selective markers of differentiation (such as prostate specific antigen, Napsin A), and Her2/neu) are not monospecific with respect to a given epitope, though any one of the antibody clones that comprise such reagents will generally target only one epitope. The potential advantage of a polyclonal preparation, apart from being easier to prepare, is that more than one epitope on a given target is likely to be recognized. However, as the number of clones in such a preparation increases, the likelihood that one of them will recognize an epitope unique to or shared by an unrelated protein will also increase, and the overall specificity of the reagent may be diminished [3]. Taken individually, however, these constituent parts of a polyclonal reagent are not intrinsically less specific for the intended target than a monoclonal antibody raised against the same target. I would note, parenthetically, that this statement is only appropriate to affinity-purified polyclonal reagents, since crude heteroantisera (not uncommonly employed in the early days of diagnostic IHC) probably contained considerably more antibody clones that recognized nontarget than target, though few if any of these were present in sufficient quantity to actually label tissue in a meaningful way.
Murine monoclonal antibodies, the initial product of hybridoma technology and for long the standard for most immunohistochemical applications in clinical practice, are now being supplanted, in turn, by rabbit monoclonal antibodies, in part because the testing environment in medical research and in clinical practice seems to benefit from having quality reagents that are not prone to species-specific adsorption in tissue (particularly relevant to the use of murine monoclonal antibodies in the study of mouse models of human disease). However, the real value of rabbit monoclonal antibodies stems from the immune environment in which they are produced. These reagents, when compared with murine products, often exhibit higher levels of sensitivity and specificity for target proteins and are apparently easier to generate against small molecules, potentially opening a broader set of proteins in human tissue to immunohistochemical study [4].
However, there are potential drawbacks to all such reagents, irrespective of source. Not all monoclonal antibodies are demonstrably monospecific; indeed, under in vitro testing conditions, many presumably specific reagents may, with varying degrees of affinity, bind to more than one epitope, and thus, in clinical practice, potentially label something other than the intended target. This infidelity—indeed, promiscuity (see Parnes [5] and Cohn [6])—is well known even in biological systems. IHC detection methodology that relies on high antibody concentration and low-stringency binding conditions (room temperature or heated environments) is at greatest risk in this regard, as such conditions foster an environment in which lower-affinity binding may occur. It is interesting to muse on the performance of IHC when higher stringency was part of the process (primary incubation with low antibody concentration at cold room temperatures for 18 or more hours) and how, for the most part, even polyclonal preparations performed reasonably well under these conditions. Contrast that with automated staining methods that, given their intended advantages (speed and reproducibility), cannot perform under similarly stringent conditions.
The performance of a selected antibody (or the availability of its epitope) may also change in a variety of disease settings, particularly malignant transformation, where functional changes in microenvironment or protein structure (or the creation of mimics sufficient to allow antibody binding) may yield elements that are not normally exposed to immune recognition—the so called neoantigens/neoepitopes. Perhaps one of the best known of these is the neoepitope on the keratin 18 molecule that is exposed only after caspase cleavage in apoptotic cells. Recognized by the antibody M30, this neoepitope is a specific marker of apoptotic cell death [7]. We also know that tissue handling and processing—even antigen retrieval methodology—may occasionally be associated with idiosyncratic patterns of reactivity with selected antibodies that may or may not reflect the actual distribution of the intended target [8]. These so-called conditional antigens account for a variety of unexpected results with antibody reagents, including, in my experience, nuclear staining for prostate-specific antigen and membrane-based staining for hepatitis B surface antigen. Even the intended targets of monoclonal antibodies in clinical and investigational samples may only emerge under certain conditions of tissue handling, fixation, or retrieval. A particularly good example of this phenomenon is the detection of keratin 7 in reactive myofibroblasts after heat-induced retrieval, a pattern of staining almost never encountered in tissues more traditionally “retrieved” by enzyme digestion (personal observation). Willingham has argued, in fact, that in the current era of heat-induced epitope retrieval, most stain targets might be seen as “conditional” [8].
Tissue Fixation and Epitope Retrieval
As just alluded to, heat-induced epitope retrieval, the current standard for enhancement of immunoreactivity in FFPE tissues, generally increases the sensitivity of an assay for a given epitope, and in some cases, is unambiguously necessary for useful labeling with certain antibody preparations [9–12]. Yet, it is also possible that retrieval may change the apparent sensitivity and specificity of a given antibody reagent in its diagnostic milieu. One need only understand that a variety of markers generally assumed to be selective for a given cell lineage or pattern of differentiation are occasionally expressed (with demonstrable gene transcription and translation) in low levels in other cell types. Low-molecular-weight keratins, for example, have been detected in a variety of “nonepithelial” cell populations (a matter we will return to shortly), increasing the likelihood that staining under nonstringent conditions, particularly at high antibody concentration, may yield unexpected results.
As noted elsewhere in this text, standardized protocols, when adhered to, largely mitigate the potential for methodology-sensitive analytic errors [13–17]. This is just as true for histochemical testing as it is for IHC , in situ hybridization, and more specific molecular techniques. Perhaps the most comprehensive source for IHC standards concerning test preparation and performance has been prepared by the Clinical Laboratories Standards Institute (CLSI) [18]. However, established and emerging external quality assurance (QA) programs, including NordiQC [19], cIQc [20], and UK-NEQAS [21] (among others), through the dissemination and interpretation of targeted laboratory challenges, have generated useful data about variance in laboratory practice, the latter forming the basis for credible recommendations regarding the selections of antibody clones, best practices in retrieval methodology, preferred detection options, and objective evaluation of automation platform-based variance. The College of American Pathologists, through its IHC surveys programs, has the potential to be another important player in the external QA market, but has yet to provide the richness of feedback available through other QA sources. Attention to these recommendations, through participation in the available quality control (QC) challenges and through perusal of web-based summaries of the external QA programs, should result in fewer analytic and interpretative errors in daily practice [12, 13, 15, 22]. And yet, despite these attempts to harmonize IHC practices, many preanalytic and analytic variables remain uncontrolled in current diagnostic and investigative practice, including (and certainly not limited to) cold ischemic time before proper tissue preparation and fixation, tissue processing protocols (both reagents and times), choice of materials for controls (see later in the text), the handling of unstained slides (though this has more recently been the subject of specific recommendations for preparation, handling, and storage [23]), the choice of pretreatment protocols, the use of automated platforms, selection of primary antibody (differing clones, differing product presentation—ready to use versus concentrated), and the use of chromogens.
Fixation remains a particularly important focus of efforts to standardize practice because of the increasing clinical reliance on biomarkers predictive of treatment response that are interpreted in quantitative or semiquantitative terms [12, 13, 18, 24, 25]. Although this subject has been addressed elsewhere in this text, it is important to briefly revisit the impact of fixation on the biomarkers used in the clinical evaluation of breast carcinoma: estrogen receptor (ER) protein , progesterone receptor (PR) protein , and Her2/neu. From a basic perspective, formaldehyde fixation does not impart either particularly destructive or permanent alterations to the protein matrix [26]. Even after fixation to extinction (greater than 48 h), no more than 1 % of total protein is insoluble [27], and the mild cross-linking that occurs is reportedly 90 % reversible [28]. However, when fixation does not progress for at least 18–24 h, these cross-links may be rapidly broken down when the tissue is removed to another medium during tissue processing. This “unlinking” is exaggerated in tissues exposed to formalin for less than 8–12 h. The next step in most tissue processing protocols is exposure to ethanol, a reagent that typically results in extensive protein damage (as a protein coagulant) and loss of up to 40 % of soluble proteins [27].
How does this relate to the immunohistochemical detection of ER protein? There can be no reconciliation of the literature on this point, although the American Society of Clinical Oncology (ASCO)/College of American Pathologists (CAP) Her2 [2] and ER/PR [29, 30] recommendation panels recognized the potential problems of underfixation and at least set a lower limit of acceptable fixation (in this case 6–8 h). That limit, unfortunately, falls within the range of fixation times that are likely to promote protein degradation during processing, and Goldstein et al. [25], in their study of ER reactivity and fixations times in a cohort of breast carcinomas, confirmed the potential susceptibility of ER to underfixation even at these fixation conditions.
Interestingly, two often-cited studies provide contrary evidence to the notion that short fixation times pose a risk for suboptimal staining for ER [31] and Her2/neu [32]. These studies, however, remain problematic in the context of this discussion because each was based on the sequential analysis of a single case, sampled at regular time intervals, each chosen for its size, lack of neoadjuvant treatment, and known high level of biomarker expression. Though this point will be made again later, the assessment of biomarker IHC in a specimen enriched for that marker and using a highly sensitive detection system (current standard of practice) does not provide a testing environment in which fixation-related changes in accessible analyte concentration (even relatively large changes) can be easily recognized [33].
From the perspective of the practitioner attempting to gain insight into the potential utility of selected immunohistochemical reagents, even the manner in which these variables are reported (particularly in the peer-reviewed literature) are not held to uniform standards. Having said that, it must be acknowledged that considerable effort has been expended trying to provide clarity (through recommended standards) to all preanalytic, analytic, and even postanalytic elements of IHC [16, 17, 34].
Ad hoc and organized groups within the investigative and diagnostic pathology communities have also provided particularly useful recommendations for the reporting of methods and results themselves, providing standards for both immunohistochemical and molecular analyses presented in peer-reviewed forums. The minimum information specification for in situ hybridization and immunohistochemistry experiments (MISFISHIE) initiative, a protocol patterned on earlier attempts to define minimum information about a microarray experiment (MIAME) [35, 36], should help create an environment that fosters more uniformity of practices, even approaching best practices, in both the investigative use of ancillary methodology and the purveyance of high-quality patient care.
Controls
Clive Taylor [37] famously coined the phrase “An exaltation of experts” (drawing on the witty and often poignant historical, etymological, fictional, even fantastical rendering of collective nouns offered by James Lipton (of “Actor’s Studio” fame) in his book An Exaltation of Larks [38]) to suggest perhaps the implicit dichotomy of both the coherent collective of these graceful birds and the cacophony of their collective voices as metaphor for the value of expert consensus opinion in the practice of pathology. Perhaps the better approach—evidence-based practice—has only more recently taken center stage in attempts to provide clarity to ancillary testing protocols and pathology practice in general. Careful perusal of most recent recommendations of best practices in ancillary testing suggests that the exaltation still echoes a bit more loudly than it should, though one might reasonably argue that this reflects the paucity of evidence supporting elements of standard work in the laboratory practice of anatomic pathology. Nonetheless, careful integration of practice experience and evidence, through ad hoc and more formal associations of experts in the field, have provided important guidance, emphasizing the practical implications of proper selection and deployment of positive and negative controls [39, 40]. While there has perhaps been greater emphasis on reagent selection, methodology, and interpretative criteria in consensus works dedicated to the task of process improvement and risk mitigation in pathology practice [12, 13, 16– 18, 29, 34, 41], attention to controls is no small matter [42]. The proper use of controls is an increasingly important element of error recognition and avoidance in ancillary testing because it is the basis for both the optimization/calibration and validation of these reagents in their clinical testing environments .
The negative control, seemingly almost anachronistic in the current era of polymer-based immunohistochemical detection systems, remains an important element in both test development and clinical application . Generally discussed in terms of negative reagent controls (NRCs: tests performed on serial sections of patient material and subjected to otherwise identical retrieval and detection conditions, further separated into “specific” NRCs that test the vehicle in which the primary antibody is prepared by substituting antibody with species-specific serum (for polyclonal antibodies) or either ascites fluid or nonspecific antibody of the same heavy chain class (for monoclonal antibodies), and “nonspecific” NRCs that test the influence of the detection system itself on the staining result by substituting elements of the detection downstream to the primary antibody) and negative tissue controls (NTCs: specific tissues known or expected to lack the target analyte, either internal to the test tissue or external, mounted on-slide), these elements are effective monitors of the analytic and clinical specificity of a given reagent and the precision and limits of detection of the selected detection method [39] . Because currently employed polymer-based methods only rarely introduce unwanted background staining in most testing environments, the use of the NRC has been largely discontinued (this approach is in fact recommended by several agents of QA, including the College of American Pathologists). Based on recent recommendations from an ad hoc expert panel, however, there are a few important exceptions to this trend [39]:
NRCs should be utilized as part of the evaluation of any new antibody reagent, retrieval medium, or detection system.
NRCs should be used at the pathologist’s discretion when endogenous tissue pigment interferes with interpretation, when suitable internal NTCs are lacking in a clinical test sample, or when in the absence of an initial NRC, a false-positive result is suspected.
NRCs should be used if published guidelines for a given testing protocol specifically recommend their use.
NRCs should be used in the performance of any stand-alone diagnostic test or predictive biomarker unless the stain is deployed in a panel that includes sufficient alternative NTCs, or if the predictive marker is used as a screen for a confirmatory molecular test.
The last of these exceptions explicitly draws a distinction between antibodies applied in routine diagnostic practice and those that are used as predictive or prognostic biomarkers in clinical practice. I highlight this distinction because it reflects an impression that has driven consensus guidelines for the detection and interpretation of selected biomarkers in recent years and was perhaps the critical driving force in the FDA’s decision to classify predictive biomarkers and other stand-alone IHC tests separately from analyte-specific reagents (ASRs) and IHC in vitro diagnostic devices (IVDs) used to corroborate histologic diagnoses [43]. The underlying assumption is that IHC methods for predictive biomarkers and stand-alone diagnostic tests need to be more strictly controlled. Indeed, the FDA reasoned that most diagnostic IVDs and ASRs pose only limited risk to the patient, and these were defined as Class I reagents (subject to good manufacturing principles and general controls), whereas predictive markers and stand-alone diagnostic tests, as they provide actionable test results independent of the other elements of the histopathologic evaluation, are of higher risk to the patient. These markers were defined as Class II reagents and were subject to more rigorous premarket documentation of clinical performance characteristics and demonstration of “substantial equivalence to existing validated tests” (premarket clearance). The third category (Class III—of highest risk to patient safety and requiring premarket approval), did not specifically include examples of antibody IVDs, but it is notable that selected vendors have chosen to gain premarket approval of predictive marker IHC test kits as Class III reagents prior to marketing for clinical use.
While I agree that there is inherent risk in the use of Class II and Class III reagents, I am not entirely sure that I agree that there should therefore be a relaxed standard of evaluation for Class I reagents. I will return to this thought later.
Positive controls—tissues known or expected to contain the analyte of interest—are (perhaps counter-intuitively) somewhat harder to define and standardize than either tissue-negative or methodologic-negative controls due to a lack of consensus about how to define an appropriate control in differing clinical settings [40, 42]. Should a positive control for an analyte used to support a diagnosis of malignancy be prepared from representative neoplastic tissue? Should it include tissues expected to contain high, intermediate, or low concentrations of the analyte (or a combination of these)? Should cell lines with documented levels of analyte expression be used or should biologic tissue constructs that mimic the target tissue (the so-called histoids) [44]? Should the control (and its evaluation) be tailored to different uses of the same reagent (for example, ALK-1 IHC testing in lung adenocarcinoma, as opposed to hematopoietic neoplasms or inflammatory myofibroblastic tumor) [41]? Here, consulting consensus recommendations and external QC sources may be of value. NordiQC, for example, has drawn on results from multilaboratory challenges to discern patterns in control selection and staining quality, allowing for specific recommendations for the use of normal tissue with constitutive analyte expression in some settings [19]. These discussions have precipitated more focused consideration of how positive controls can be designed to facilitate a more uniform approach to reagent evaluation within and between laboratories, as a part of internal quality management programs; facilitate the design and creation of tissue microarrays for test and reagent development; and, by extension, facilitate the preparation and maintenance of controls of consistent quality for use in external QA and proficiency testing programs. Such target-specific controls, referred to as “immunohistochemistry critical assay performance controls (iCAPS),” [40] have been proposed recently by an ad hoc expert panel. iCAPS ideally would be prepared from tissues selected for consistent and predictable patterns of analyte expression, levels of analyte expression, and cellular localization of expression. Such controls, if properly designed and disseminated, might reasonably mitigate error associated with methodologic variance in both translational research and clinical applications.

Stay updated, free articles. Join our Telegram channel

Full access? Get Clinical Tree

