Overview
Pathologists frequently use immunohistochemistry (IHC) to evaluate antigen location and expression patterns in tissue for diagnostic and/or prognostic purposes. This process often also requires quantification of these immunostains to assist in therapeutic drug selection. Traditional scoring systems for IHC have relied on manual subjective interpretation and semiquantification of immunostains by pathologists examining microscopic glass slides. With the advent of digital images, multispectral imaging, and immunofluorescent microscopy, automated image-analysis tools have been developed that can be used by pathologists for automated scoring of immunostains.
Image analysis is defined as the extraction of meaningful information from digital images by using image-processing tools. Computers are necessary to analyze large datasets and extract quantitative information. Most image-analysis tools operate by measuring the number of pixels showing staining for one or more antigens and then quantifying colocalization of these stains. They automate and quantify groups of pixels or a region of interest (ROI) with greater consistency and accuracy than light microscopy because they eliminate interobserver and intraobserver variations that occur with human interpretation. They also permit new staining protocols, such as multiplexed antibody studies, to be used that were previously impossible to accurately quantify by using analog-driven approaches.
Digital imaging also makes it easier to use immunofluorescence as a primary diagnostic tool. Quantitative immunofluorescence (QIF) has been used extensively in research laboratories and recently has also seen limited use in the clinical setting. Now that entire glass slides can be scanned with whole-slide digital scanners, digitized slides are increasingly used to perform image analysis and apply machine-learning techniques.
Improvements in computation power and development of sophisticated image-analysis algorithms continue to offer pathologists more innovative tools to perform computer-aided image analysis (CAIA), which has been most widely used for estrogen receptor (ER), progesterone receptor (PR), and HER2 assessments in breast cancer. However, in spite of decades of improvement, CAIA is still predominantly used as a research tool. It has been applied in a limited way in clinical practice, including quantification of breast carcinoma biomarkers, some use in proliferation markers, microvessel density (MVD) analysis, and other less commonly used applications.
This chapter provides an overview of image-analysis theory and technology and focuses on common examples used in current pathology practice.
Imaging Systems
A digital image is a numeric representation of an image captured with a device, such as a digital camera. Digital cameras can be attached (coupled with a C-mount adaptor) to light microscopes, or they may be more specialized for advanced imaging (e.g., fluorescence or multispectral imaging). Digital images are made up of thousands of small rectangular pixels (PIcture ELements). Each pixel contains binary data that stores values such as brightness and color. Digital images can range from static images (stills or “snapshots”) to whole-slide images of digitized slides, sometimes called “virtual” images.
Several vendors now offer whole-slide imaging (WSI) scanners with built-in robotics, optical microscopes, and digital cameras capable of automatically scanning glass slides at high speeds (1 to 4 minutes) to produce high-resolution whole-slide images. As a result, image analysis of tissue need not be limited to still images of selected fields of view (FOV) but can now be performed on the entire tissue section present on a digitized slide. Because WSI allows the entire slide to be analyzed, field selection can also be automated.
The digital imaging process involves four steps: (1) image acquisition, or capture; (2) archiving, or the saving, retrieval, and compression of digital files; (3) editing and postcapture manipulation that includes annotation; and (4) sharing the image for viewing, reporting, displaying, or printing. Unfortunately, no standards have been set regarding these various imaging steps in the field of pathology. For CAIA to be reliable, the image-acquisition step must be standardized. Static image acquisition may vary, however, because digital cameras are subject to drift over time. Therefore regular calibration of digital cameras will be needed to adjust for several variables, such as light source and so on. This calibration is especially critical for standardization of quantitative analysis and QIF.
The use of WSI provides a method to standardize image acquisition for image analysis. However, unlike glass slides, whole-slide scanners are presently unforgiving when there are artifacts such as tissue folds, crushed cells, bubbles under the coverslip, or poor staining of material. Pathologists must be aware that such variation may affect the outcome of image analysis. Image compression does not seem to significantly compromise the accuracy of image analysis. Features (structures) in an image can be classified, or segmented, according to their shape (morphology), spatial arrangement (topology), texture (smoothness, roughness, or coarseness of the image), intensity (brightness), and color.
The term multispectral microscopic imaging refers to the capture of spectrally resolved data (i.e., wavelengths) from each pixel in an image by using bright-field and/or fluorescence modalities. Although it is feasible to capture a multispectral image of a whole slide, this is not done routinely because only a few dozen fields of information are presently required to create a statistically meaningful sample of a tissue slide. Spectral imaging systems allow pathology slides stained with multiple antibodies to be analyzed, and most spectral imaging systems are able to resolve at least three or more chromogens.
Analysis of multiple chromogenic stains begins by unmixing, or spectrally separating, the individual stains, which should take into account the counterstain. This requires specialized imaging hardware and software to automate both the image-acquisition process and the resolution of spectral (color) information across a broad range of visible and infrared bands. Determining the spectral patterns (signatures) and intensity of each individual stain in the image can then help analyze cells and/or tissue. Although it has been used in experimental settings, the inherent variability of hematoxylin has limited its utility in clinical settings. Furthermore, the physics of light absorption limits the number of multiplex channels for chromogens, even with spectral unmixing, and the dynamic range for image quantification.
QIF allows more flexibility in multiplexing and can routinely accommodate five or more channels. Although not in common use, investigators have claimed to be able to multiplex up to 40 channels using QIF by bleaching, restaining, and reimaging. Fluorescence also has a broader dynamic range than chromogenic staining. Although this is widely accepted, limited data are available to compare approaches by using the same patient population. Fig. 23.1 shows an example of more than 200 cases of breast cancer read by using the Aperio pixel counter-based CAIA system for bright-field versus Automated Quantitative Analysis (AQUA)-based fluorescence. The limited dynamic range of the diaminobenzidine (DAB)-stained population is evidence of system saturation at the high end of the scale.
Software Algorithms
Image-analysis tools are used to analyze digital images and slides to provide accurate quantitative data about the amount and intensity of individual stains. Such analyses involve multiple computations based on mathematic and statistical algorithms. Several image-analysis tools are currently available that perform these tasks. This include open-source applications (ImageJ, ImmunoRatio, ImmunoMembrane) and commercial products, such as AQUA technology (Genoptix), Genetic Imagery Exploitation (Genie) from Aperio, Definiens TissueStudio, INform from Perkin Elmer, Olympus Cell Imaging Software, HALO from Indica Labs, Biotopix and Oncotopix modules from Visiopharm, HistoQuant from 3DHistech, HistoQuest from TissueGnostics, and others. Many researchers have also developed custom software applications and algorithms. Available open source software for image analysis includes ImageJ ( https://imagej-nih-gov.easyaccess1.lib.cuhk.edu.hk/ij/ ) from the National Institutes of Health (NIH) and CellProfiler ( http://cellprofiler.org/ ) from the Massachusetts Institute of Technology (MIT).
Image analysis is a multistep process that involves feature extraction, feature selection, and classification steps ( Figs. 23.2 and 23.3 ). Feature extraction transforms large sets of data, such as topology, into a reduced representation set of features, including graphs that represent structural and spatial information. Feature selection using heuristic algorithms helps determine which features are relevant at a given resolution, but features present within a dataset that may not be generalizable often limit this approach. The end result is the compilation of a set of features that can be used for image classification and/or quantification.
Manual, semiautomatic, and automatic selection can be used to determine features or ROIs for measurement. Segmentation algorithms can be based on intensity, texture, and/or colors. Algorithms have been developed to determine positive pixel counts—looking for positive, negative, and neutral areas—to quantify the amount and intensity of a specific stain present in a digital image. Some software packages include region segmentation algorithms (i.e., to identify ROIs) designed to classify regions to be analyzed based on a user training paradigm, in which an experienced end user trains the segmentation algorithm by showing the software a few example regions of different disease classes. Thereafter, the algorithm is able to classify the remaining image and additional images that might be needed for analysis. These systems have largely been used in research settings because generalization to the varied conditions of staining in clinical laboratories around the world has confounded even the best classifiers. Very few systems have been tested by using multicenter prospective experimental approaches. The highest level of evidence published for any classifier is the work by Beck and colleagues using a Definiens-based system called C-PATH.
Once algorithms have been developed, or manual region selection has been applied to the image, further algorithms can be used for classification or quantification. The algorithm can be applied specifically for evaluating nuclear staining, such as for ER and PR, or membrane staining, such as for HER2. Membrane segmentation may be tricky in IHC tissue images because the cellular membranes are visible only in the stained tracts of the cell, whereas the unstained tracts are not visible. Cytoplasm can be detected by using specific cytoplasmic stains, or it can be detected by using computational methods that exploit the fact that cytoplasmic areas are between nuclear and membrane areas. For carcinomas, algorithms must differentiate epithelial parenchyma from desmoplastic stroma so that only the stain-expression levels in the epithelial regions are quantified.
To evaluate IHC in which the immunostains generate pixels with different colors, unmixing (separating colors) is often required. This result can be achieved by color deconvolution, separating the image into different channels that correspond to the actual colors of the stains used. This approach permits a pathologist to accurately measure the area or intensity of each stain separately, even when the stains are superimposed at the same location. These algorithms usually include control parameters, such as intensity settings, the user can tailor to meet specific needs.
An alternative approach is segmentation or ROI definition by molecular colocalization. This approach is challenging in chromogen-based systems because even with unmixing, the number of chromogens that can be resolved is limited. Fluorescence-based systems often capitalize on this approach. For example, the AQUA technology of other QIF-based software can use colocalization with cytokeratin to define an ROI in epithelial neoplasms to avoid issues of inaccuracy associated with automated feature identification. This technology further uses molecular methods to define subcellular compartments, and then it quantifies the amount of protein expressed within the compartment by colocalization. Colocalization with 4′,6-diamidino-2-phenylindole (DAPI) staining can be used to define nuclei or colocalization with CD31 to define endothelial cells. Another example is the construction of a ratio between the nuclear and cytoplasmic levels of a protein that revealed relationships to outcome that met biologic hypotheses but were not revealed by overall measures of expression. By inclusion of a series of cell line controls, the quantitative result achieved by AQUA is comparable to an enzyme-linked immunosorbent assay (ELISA) within a subcellular compartment. This approach has been tested on multiple machines with multiple users on different days and has been shown to have an average coefficient of variation of less than 5%. The key principles of this technology and how it works when applied to ER are illustrated in Fig. 23.4 .
Strengths and Limitations
Scoring of an IHC stain involves a preanalytical phase that includes tissue preparation—as in fixation, processing, and staining—and an analytical phase and a postanalytical phase for the quantification and reporting of results. Significant considerations related to each of the aforementioned steps must be addressed. Preanalytical variation is probably the most problematic. Delayed time before fixation of tissue, or cold ischemic time, is a significant problem. A range of work has shown that assessment of critical biomarkers may be altered by prolonged time to fixation. Nkoy and colleagues, in Elisabeth Hammond’s group, showed a higher percentage of ER-negative cases in surgeries done on Friday or Saturday, illustrating how delay to fixation that occurs over the weekend can have dramatic effects on patient care. Subsequently, a number of studies have been done to suggest that ER and other markers show degradation with delayed fixation. This problem is particularly troublesome in applications where users wish to assess phosphorylation using phosphate-specific antibodies. In attempts to address this problem, guidelines have set the target time to fixation in formalin to less than 1 hour.
Preanalytical variables also include tissue processing and immunostaining. This process has been dramatically improved over the years as a result of the introduction of automated staining devices. However, the preparation of tissue, also referred to as antigen retrieval, represents a daunting variable. Although standard methods with standard buffers are often used and desirable, some antigens require protease-mediated antigen retrieval, which is much more challenging to standardize. The failure of epidermal growth factor receptor (EGFR) as a biomarker for EGFR antibody therapeutics was likely due to this issue or to the issue of antibody validation. Antibody selection or validation of selected antibodies can also be a preanalytical variable. Studies that use different antibodies for both ER and HER2 have shown different results that may affect treatment. Antibody validation is also critical in the research setting, where new antibodies may not recognize the protein stated on the label.
Uncontrolled variables in the analytical/postanalytical phase are also a key limitation of IHC staining. As interpretation of immunostaining has shifted from qualitative (positive or negative) to semiquantitative (0, 1, 2, 3) to automated semiquantitative or truly quantitative methods, new challenges arise. Traditional “by eye” microscopy by pathologists is subject to variable interpretation and also to the imprecision of the human eye. A number of studies have revealed the weaknesses of traditional subjective scoring. For example, one study showed that the discrepancy between epidermal receptor protein 2 (ERBB2) IHC and fluorescence in situ hybridization (FISH) was most often due to manual interpretation and not to reagent limitations. Manual scoring is susceptible to interobserver and intraobserver variability. The use of scales (0, 1+, 2+, 3+ staining) and H scores acknowledges the inherent imprecision and subjectivity involved. Human variability in scoring is particularly notable with borderline and weakly stained cases. In addition, human scoring is associated with added subjectivity and fatigue. In fact, when attempts are made to apply semiquantitative tools, it can result in compression of the scale; this has been seen in measurement of ER in breast cancer, in which a continuous distribution of scores appears bimodal when scored by eye. In addition, the evaluation of immunostaining may be influenced by the heterogeneity of epitope biologic expression that exists between tissue sections and blocks of a tumor.
Early studies showed that CAIA was no better than visual analysis. However, with advanced computer technology and image-analysis algorithms, newer published data has subsequently shown that CAIA is comparable or, in small studies, superior to manual methods. Whereas broad adoption awaits more comprehensive trials and higher levels of evidence, automated image-analysis methods appear to offer more objective, precise, and reproducible quantification on a continuous scale than manual scoring. Limited studies have compared different CAIA technologies to demonstrate the agreement between these systems. Problems that may be encountered with CAIA include discrepancies associated with low-level staining, artifacts such as dust particles, interfering nonspecific staining in selected areas, and erroneous low scores generated by small amounts of stained tissue. Furthermore, not all aspects of performing image analysis have been adequately addressed in the literature. These include the number of images (ROIs), appropriate tissue areas (highest labeling areas, or “hot spots”), and level of tissue magnification (×20 or ×40) to be used for analysis. Also, whereas WSI has been used to conduct quantitative image analysis, limited data are available to indicate that analyzing an entirely scanned slide, instead of several FOVs from a single slide, overcomes the problem of tumor heterogeneity and sampling issues. Whereas some CAIA systems can automate selecting the ROIs to be analyzed, one study has been published suggesting that pathologists are perhaps better at selecting the appropriate areas of a slide to be analyzed. Several technical problems may be encountered; for example, algorithms may not work on all file formats or magnifications, and measurements may vary for similar algorithms from different vendors, especially if they are not appropriately calibrated. However, a large multi-institutional study comparing CAIA using seven different scanner platforms coupled with 10 distinct image-analysis programs has been done to assess Ki67 in breast cancer. In spite of the wide range of platforms and software, the intraclass correlation coefficient for a subset of labs that all scanned on the Aperio platform was 0.89, comparable to, or exceeding the 0.88 achieved by pathologists with specific training requiring counting of 500 cells. This study is among the first to suggest that automated platforms, once they pass a certain level of accuracy, may be more interchangeable and accurate than even the most accurate methods performed by pathologists.
Image-analysis tools are not yet in broad clinical use. In clinical practice, they should be used by trained pathologists who have an understanding of the algorithm, the input parameters that may need to be adjusted, and potential pitfalls that may occur when running the algorithm (e.g., counting lymphocytes with nonspecific staining in regions of breast carcinoma designated for analysis). Unfortunately, very few prospective trials of automated analysis methods have been done, and none have been performed in accordance with recent guidelines for evaluation of levels of evidence for biomarkers. Given recent Food and Drug Administration (FDA) statements, it is likely that more comprehensive trials will be required before acceptance is widespread and reimbursement for CAIA is enhanced.