Image Processing—Chemometric Approaches to Analyze Optical Molecular Images

Chapter 7
Image Processing—Chemometric Approaches to Analyze Optical Molecular Images


Thomas Bocklitz, Michael Schmitt, and Jürgen Popp


7.1 Introduction


The principle of spectroscopic pathology presented in Chapters 3–6 is shown in Figure 7.1 and compared with the common workflow of pathology/histology. Common clinical pathology usually deals with the direct microscopic evaluation of specially stained tissue specimens and, based on certain criteria and expert knowledge, establishes a diagnosis. The workflow of optical pathology is different. Here, first a hyperspectral dataset is recorded from the sample, based on the different spectroscopic contrast mechanisms reported in Chapters 3–6. This hyperspectral dataset is subsequently translated by multivariate statistics into an image or a diagnostic marker. Thus, in contrast to common clinical pathology, this is an indirect workflow using mathematical and statistical methods (chemometrics) as translator of spectral features into diagnostically relevant information. The resulting chemometric image or diagnostic marker can then be inspected by a pathologist, or it can be correlated with disease properties.

c07fgz001

Figure 7.1 The principle of chemometrics is sketched in comparison with the usual workflow in histology/pathology. Chemometrics is used to generate an image which can be inspected by a pathologist, or to calculate a diagnostic marker from a spectral dataset. Chemometrics can be seen as an interpreter translating spectral features into diagnostically relevant information. The common workflow in histology/pathology directly utilizes the microscopic image generated from a stained specimen, which is visually inspected by a pathologist; that is, the diagnosis is based on expert knowledge.


This chapter provides an introduction to the chemometrics workflow for an interested reader. The workflow usually starts with preprocessing of the spectroscopic data to correct for different corrupting effects of the spectroscopic measurement method itself. For example, in case Raman spectra are corrupted by a large fluorescence background, baseline correction methods have to be applied. The corresponding preprocessing methods are given in Section 7.3. Subsequent to this preprocessing procedure, the preprocessed spectral data are subjected to multivariate methods of statistical analysis. The multivariate procedures normally used for spectral imaging purposes, namely cluster analysis and factor methods, are introduced in Section 7.5. The resulting plots visualizing the results of the multivariate image analysis can be further analyzed using image analysis procedures, which are introduced in Section 7.4. The scores calculated by image analysis methods and the properties of the images generated by cluster analysis and factor methods can be utilized for diagnosis purposes. This is done by classification and multivariate calibration methods (Section 7.5). These methods establish a statistical model for certain groups or different substance concentrations based on the spectral data or the properties calculated by image analysis methods. These classification and multivariate calibration methods are mostly used for diagnostic purposes. Nevertheless, in some cases, these supervised techniques can be used for image generation as well. For a detailed description of pretreatment methods, image analysis procedures and statistical terminology of the analysis methods are required; hence we start with a short and general description of statistics (Section 7.2).


7.2 Introduction to Statistics


This section deals with various forms of statistics as statistical analysis is not only an important issue for image generation from hyperspectral datasets utilizing the spectroscopic techniques presented in Chapters 3, 4 and 5 and for image analysis but also an important issue for clinical studies. In the following, we briefly address the use of statistics in medical imaging and diagnosis. We will start with a discussion of some types of statistics and their mathematical properties. The first property that is used to group statistics is dimensionality. Dimensionality is dependent on the type of data. Univariate statistics can be applied if an univariate dataset is measured. A hyperspectral dataset should be analyzed using multivariate statistics. The measured data also determines whether discrete or continuous statistics should be used. Another important property that differentiates the different types of statistics is the difference between descriptive and predictive statistics (statistical inference). While the former describes a dataset in terms of a statistical treatment, the latter aims to draw conclusions and predictions from a dataset, which can be used for another dataset. Such types of statistics are important for diagnostic purposes and image generation.


7.2.1 Univariate Statistics–Univariate Data


The statistical terminology that will be introduced in the following may be at first glance quite confusing. Nevertheless, we would like to define some necessary terms and names that are used in the following sections. In statistics, a random variable is denoted by X and can take a set of values c07-math-0001 from a set D. The behavior of X is mathematically described by the distribution function f, and from this function, the expectation value, mean, variance, and covariance can be calculated. An estimation of these properties can be made using the sampled values from X, which are denoted as c07-math-0002. With these sampled values, the (sample) mean c07-math-0003 and (sample) variance c07-math-0004 can be calculated as follows:


7.1 equation

In the following, the difference between sample estimates and properties of the distribution is not further mentioned, as it should be clear from the context. Here we like to discuss two important random variables because they are needed in thefollowing. Poisson distribution describes the noise produced by a measurement with, for example, a charge-coupled device (CCD), while the Gaussian distribution is an important example of a continuous distribution. In case of large numbers of CCD counts, the Poisson distribution can be approximated by a Gaussian distribution. Both these distributions are often assumptions for the error term of statistical methods and so we include them here.


7.2.1.1 Poisson Distribution


One of the most important (discrete) distributions is the Poisson distribution. Poisson distribution describes a coin-throw experiment, but also the probability to measure a photon, for example, with a CCD camera. This is called the shot noise of the CCD. The distribution function f is given in Equation (7.2) and is shown in Figure 7.2a.



The distribution function is parametrized by k and λ. The behavior for low k (only a few photons measured) differs extremely from that for higher k. While for low k values, an exponential decay is visible (red/black), for higher k, a clear bell-shaped curve is obvious (blue/green). From the blue and green graph, the approximation of the Poisson distribution by a Gaussian distribution (see 7.2.1.2) can be seen. By such an approximation, the variance is proportional to the mean intensity, which can be seen in the half-width of the bell-shaped curves.

c07fgz002

Figure 7.2 Probability distributions for a Poisson distribution (a) and a Gaussian distribution (b). See text for details.


7.2.1.2 Gaussian Distribution


Under the right assumptions, every distribution can be approximated by a Gaussian distribution; that is why this distribution is of high importance. The statistical methods described in section 7.4 and Section 7.5 are often constructed with noise contributions governed by a Gaussian distribution. The distribution function is given in Equation (7.3) and shown in Figure 7.2b. A Gaussian distribution with the parameters c07-math-0007 and c07-math-0008 is called a normal distribution:



It is not within the scope of the chapter to give a comprehensive introduction to statistics, but two terms should be clarified. The (two-sided) t-test can be used as a measure of significance under the assumption of two Gaussian-distributed groups. The resulting p-value (of a two-sided t-test) can be interpreted as the probability that measurements (or extremer values of it) occurred through statistical fluctuations under the assumption that both result from a distribution with the same mean. A low p-value is thus an indicator of a statistical significance.


For the visualization of such random variables, a boxplot shown in Figure 7.3 can be utilized. In such a boxplot, a few parameters of the corresponding distribution are sketched. The median corresponds to the black line, the box indicates 50% of the data points, and the points outside the whiskers are called outliers. In Figure 7.3, the difference between significant difference (groups 1 and 2, 1 and 3) and nonsignificant difference (groups 1 and 2) is obvious.

c07fgz003

Figure 7.3 Boxplot of a score sampled for three groups. While the t-test gives for all group combinations the best significance level (***), a prediction is only possible for group 1/2 versus 3. A prediction based on group 1 against 2 fails.


7.2.2 Multivariate Statistics–Hyperspectral Data


The analysis of a hyperspectral dataset requires the application of multivariate statistical methods. A hyperspectral dataset consists of more than one variable, for example, intensity values for different wavenumber or wavelength values. In case next to the spectral information also spatial information is recorded, a hyperspectral image like the one displayed in Figure 7.4 is generated. Such an image or a set of images are typically recorded in Raman-, infrared (IR), and coherent anti-Stokes Raman scattering (CARS) spectroscopy (see Chapters 3 and 4).

c07fgz004

Figure 7.4 Visualization of a hyperspectral image corresponding to a large number of stacked images.


In the following, we define a terminology for the description of such hyperspectral datasets. A hyperspectral dataset consists of N different intensity measurements, which are typically functions of the wavenumber values c07-math-0010 or wavelength values c07-math-0011. Such a measurement is called spectra c07-math-0012 or in a more general way observation c07-math-0013. A number M of observations can be arranged in a matrix, as:


7.4 equation

Typically, additional information for every spectrum is available. This information changes the type of dataset and also the analytic methods that can be applied. The different analytic techniques are summarized in Figure 7.5, which also provides an overview of the following sections of the chapter. If the additional information is the spatial position (c07-math-0015) the spectrum was recorded at, a hyperspectral image can be constructed. Such a hyperspectral image dataset is best analyzed by cluster methods or factor methods. Cluster methods try to partition the dataset, and they are described in Section 7.5.6, while factor methods explained in Section 7.5.5 aim to extract information.

c07fgz005

Figure 7.5 Overview of the analysis methods described in this chapter.


If, besides, the spectral information class information is known, a hyperspectral dataset becomes a classification task. Each spectrum belongs to one particular class, and classification methods try to model these class differences. These classification algorithms are described in Section 7.5.2. If in addition to the spectra another quantity is measured, such as, for example, the time t, temperature T, or concentration c, multivariate calibration methods are best suited to analyze such a hyperspectral dataset. These methods try to model the relationship between the external quantity c07-math-0016, or c and the spectra. The methods are introduced in Section 7.5.3


Hyperspectral datasets can be converted to univariate datasets and analyzed by means of univariate statistics. For hyperspectral images, this leads to images that can be analyzed using image analysis methods which are reviewed in Section 7.4.


Besides choosing the right analysis methods for a given task, also the pretreatment of the data is important. These pretreatment methods aim to reduce corrupting effects, thus allowing for a reliable and robust analysis, and are reviewed in Section 7.3.


7.3 Pretreatment


In this section, pretreatment methods and standardization procedures are briefly described. All these methods aim at minimizing the influence of corrupting effects and of the measuring instrument itself. These corrections are necessary for comparing different observations. Without these corrections, a statistical analysis is not, or only to a limited degree, possible.


The pretreatment procedures are different for the different spectroscopic procedures. In the following, the pretreatment procedures for IR, Raman, and nonlinear spectroscopic measurements are summarized and the pretreatment of spectroscopic images is introduced.


7.3.1 Raman Spectroscopy


Because of the fact that the Raman effect is a rather weak phenomenon, corrupting effects can be quite dramatic. Therefore, good preprocessing methods are necessary. In principle, the Raman preprocessing procedure consists of two parts: spectrometer calibration and spectral preprocessing. Spectrometer calibration aims at removing the influence of the spectrometer on the Raman spectra. Hence, the differences between Raman spectra recorded with different spectrometers are minimized. This procedure is composed of wavelength, wavenumber, and intensity calibration steps. The detailed description of these spectrometer calibration procedures would go much beyond the scope of this chapter. The interested reader is referred to Refs [1, 2].


The second part of the pretreatment procedure of Raman spectra is the correction for sample-dependent effects and is called spectral preprocessing. The most prominent corrupting effects are cosmic spikes, fluorescence background, and the Poisson noise schematically shown in Figure 7.6. Several methods exist for the correction of these effects. Here, the corrections are only listed, while a more detailed description can be found in the literature [3]. Background contributions are subtracted by baseline estimation methods, while cosmic noise and shot noise are corrected by filtration methods. Besides these algorithms, often a scaling or normalization of the Raman spectra is carried out, in order to account for the fluctuating excitation laser intensity.

c07fgz006

Figure 7.6 Raman spectra are influenced by corrupting effects such as cosmic spikes, fluorescence background, and noise. For these contributions corrections have to be applied.


7.3.2 IR Spectroscopy


Because of the fact that modern IR spectrometers are mostly FT-IR spectrometers, we restrict our discussion to pretreatment methods for FT-IR spectra. In FT-IR spectroscopy, an interferogram is recorded, and from this the IR spectrum is recovered by an inverse fast-Fourier transformation (FFT) [4]. Because of the fact that the interferogram is recorded over a limited range, the FFT produces artifacts. These artifacts can be suppressed by weighting the interferogram with a function [5], which is called apodization. Normally, this is already done by the spectrometer software, but the type of function can be chosen. After the response function is corrected for, the same preprocessing methods as for Raman spectra can be applied [6], if necessary.


7.3.3 Nonlinear Raman spectroscopy


Currently, two nonlinear Raman spectroscopic approaches are widely utilized: CARS and stimulated Raman scattering (SRS) spectroscopy. Both are c07-math-0017-processes but feature different behaviors. While SRS spectra can be mostly treated like spontaneous Raman spectra (see above), for CARS spectra a so-called phase retrieval algorithm has to be applied followed by the same preprocessing steps as for spontaneous Raman spectra. Two common phase retrieval methods exist: the maximum entropy method (MEM) and the Kramers-Kronig (KK) approach. Both algorithms have advantages and disadvantages [7, 8]. While the MEM is based on statistics, the KK approach has a physical background. The problem with applying the KK method arises because of the fact that CARS spectra are only measured for a limited energy or wavenumber interval. This is the reason why sometimes the KK method fails. For the MEM algorithm, on the other hand, parameters have to be chosen before applying the algorithm, which are sometimes not known. This drawback is an essential problem and might lead to a wrong phase. Nevertheless, recent publications show that both work (for simulated spectra) equally well [8].


7.3.4 Images


The pretreatment of spectroscopic multimodal images (see Chapter 5) is similar to the pretreatment of white-light images of stained samples (see Chapter 1). The pretreatment involves procedures to reduce the noise generated by the measurement device and procedures that change the type of noise as well as the scale of an image. The latter procedure is important for acomparison of different images and where standardization is essential. Such a standardization procedure might be a normalization on a standard region inside the images, for example, the background, or on some statistical measure of the image itself. If the intensity is scaled to the interval c07-math-0018, the procedure is called clipping. The impact of the measurement device can be reduced by subtracting of or normalization to a characteristic feature of the measurement device.


The task of changing the type of noise is often done by the Anscomb transformation, which changes the Poisson noise produced by photon-counting devices (CCD, photomultiplier tubes (PMT), etc.) to approximately (normal) Gaussian noise. The latter type of noise is an important assumption for various analysis methods and they work best if such an Anscomb transformation is done beforehand. The Anscomb transformation is given by:


7.5 equation

If the measured values c07-math-0020 are Poisson-distributed, the resulting c07-math-0021 values are approximately (normal) Gaussian-distributed. This statement requires a few assumptions, which are however neglected here.


It should be mentioned that the filtration methods introduced in Section 7.4.1 can be used both as analysis method and as pretreatment method. These methods are discussed in more detail in Section 7.4.1.


7.4 Image Analysis


In this section, we discuss the common methods used for the analysis of images. These images can result from white-light measurements (see Chapters 1 and 2) or from multimodal imaging (see Chapter 5). This overview is by no means complete; for further reading, the interested reader is referred to Refs [9, 10].


The function of the methods will be demonstrated on an example image shown in Figure 7.7a. The image results from a magnetic resonance imaging (MRI) measurement and shows a right ankle. The image contrast corresponds to the intensity recorded for a specific spatial pixel of the image. In Figures 7.7a,b, two representations of such an image are given. While Figure 7.7a codes the intensity (ranging from 0 to 1) in gray scale, the image shown in Figure 7.7b is coded in false colors. It should be mentioned that the images presented here are clipped to the interval 0–1. All procedures described in the following are also applicable even if such a clipping is notdone.

c07fgz007

Figure 7.7 An MRI image of an ankle is given. Based on this example image, all image analysis methods will be demonstrated in the following. (a) The intensities coded in gray scale from 0 to 1. (b) The intensity coded in false colors (blue to red). (c) The result of a convolutional filter (in false colors).


Before analyzing the images, the image quality is often enhanced by filter-based methods. Afterward, the image properties can be calculated by applying segmentation approaches and feature detection. These methods are described in the following in more detail.


7.4.1 Filtration Methods


Filtration methods are used to improve the quality of an image and thus to enhance the information content. The main effect that should be removed from images is the noise produced by the detector, for example, a CCD or PMT. The easiest filter is a convolutional filter, which selects a point in the image and calculates the weighted mean of a mask put on that point within the image. The new value at this position is the calculated mean, which is subsequently done for all points of the original image. The result of such a filter is shown in Figure 7.7c for the images displayed in Figure 7.7a,b. An increase of smoothness is obvious, which can cancel out small spectral features.


Another group of filters are the 2D FFT and 2D wavelet filters. These filters decompose the image in terms of a Fourier series or a wavelet series. Then the image is reconstructed from only a subset of the calculated coefficients or weighted versions of these coefficients. With such filters, it is possible to remove either noise (low-pass) or offset variations (high-pass).


Another application of a 2D wavelet filter is multiresolution analysis (MRA) of an image, which will be demonstrated in the following because it is instructive also for the filter application of 2D wavelet analysis. An image is decomposed into a set of images in which the first m images contain details of the original image to a certain extent and the last image represents the smoothed image without these details. This is demonstrated for the example image shown in Figure 7.7a and shown in Figure 7.8. Five detail images (Figure 7.8a–e) are given, which are coarser from the left upper row to the right lower row. The last image (f) in the lower row can be seen as result of the subtraction of the detail images from the original one and is thus smoothed. The filter method from above is consists in the weighting of every detail image and again a recombination of the images. The detail images can be used individually as well, especially if only details of a certain degree are of interest.

c07fgz008

Figure 7.8 Multiresolution analysis (MRA). The images a–e correspond to the detail images; for example, only the details of certain degree are visible in one of the images. The last image (f) represents the original image without these details; thus it is a smoothed version of the original image.


7.4.2 Property Calculation


After an image is preprocessed (by filtering), certain properties of the image can be calculated. These properties can be the mean, median, or other parameters of the statistical distribution of the measured values. A famous method for such a task is the gray-level co-occurrence matrix method [11]. This method involves no hard computations but only the storing of the results. Each entry of a matrix with a dimension, which corresponds to the number of gray levels, is filled with the number of gray level combinations. From this matrix, certain structures of the image can be highlighted, for example, monitoring fibrous structures [12]. In general, this method is suitable for analyzing the short-range order of an image.


7.4.3 Segmentation Approaches


Another approach of extracting information out of an image is the segmentation method. Segmentation uses local properties for grouping different regions of the image. Various methods exist that utilize this principle. In this short overview, we present only two of them, namely, the watershed algorithm and the local and global thresholding algorithm.


The watershed algorithm is a heuristic algorithm that is inspired by the filling of basins with rain [13]. The image can be plotted as a 3D surface, where the intensity is converted into the height of the surface (see Figure 7.9). Afterward, that surface is filled by simulated rain. Regions that form a connected basin are subsequently color-coded in the same false color and the pixels, which are members of such a basin, can be extracted. Subsequent to this, special properties of such a group of pixels in that basin, such as, for example, the mean intensity value or standard deviation, are calculated. The result of applying the watershed algorithm on the original image shown in Figure 7.7a is displayed in Figure 7.10a and shows that local similarities are used for the basin, for example, cluster building.

c07fgz009

Figure 7.9 The intensity of Figure 7.7a is shown as the height, for example, in the z-direction. Such a representation is advantageous for demonstrating the principle of the watershed algorithm. The surface is flooded and, based on certain criteria, the flooding is stopped. The different basins are subsequently coded in false colors and the image is segmented (see Figure 7.10a).

c07fgz010

Figure 7.10 (a) The result of watershed algorithm applied to the original image shown in Figure 7.7. (b) After the application of global thresholding to the original image shown in Figure 7.7, an image is converted to a contour plot by, for example, plotting only the equatorial lines for the intensity values of 0.2, 0.3, 0.5, 0.6, and 0.8.


Other common methods for segmentation are the local and global thresholding algorithms [14]. Both build up a partition based on the intensity value. Global thresholding uses fixed thresholds and produces a contour plot such as given in Figure 7.10. This is similar to a map of a mountain, where also the distinct contour lines are shown. In the local thresholding algorithm, the cut-off values are not fixed, but calculated out of the neighborhood of a pixel. The visualization is then done similar to Figure 7.10a.


7.4.4 Object Recognition


Hough transformation is a method for (linear) edge detection and can be also applied after modifications for circular detection [9, 10]. Here, we present only the edge detection method. The basis is that a line can be described or parameterized by an angle c07-math-0022 and a distance c07-math-0023. The c07-math-0024 space (the Hough space) as well as all combinations of c07-math-0025 values are searched. An accumulator matrix is built up, which stores the number of lines with a certain c07-math-0026 combination. This matrix is shown in Figure 7.11, where a high value is an indicator for a line with such an angle and distance value (c07-math-0027). The value can be interpreted as the probability of a line with such a c07-math-0028 combination. After selecting regions in the Hough space, which exhibit strong accumulation, the reverse transformation can be applied, and only the selected lines are visible.

c07fgz011

Figure 7.11 Result of the Hough transformation for the image shown in Figure 7.7a; for example, the accumulator matrix is shown. The red colored areas around ([-100,5] and [100,85]) indicate that edges with a distance of 100 pixels from the middle with 5° and 85° are common in the image.


7.5 Analysis Methods


In this section, multivariate statistical methods for the analysis of hyperspectral datasets are discussed. The methods feature different properties; that is why they are used for different purposes. Nevertheless, all these methods are, in a mathematical sense, optimization problems and so these problems and methods are discussed in the first part (Section 7.5.1). In the following paragraphs, supervised learning techniques are presented. The two main groups of supervised techniques are the classification models (Section 7.5.2), which aim to model class differences, and multivariate calibration models, which model the spectral dependence with respect to an external parameter (Section 7.5.3). These models have to be evaluated in order to predict their reliability (see Section 7.5.4). The last set of methods that will be presented are cluster algorithms (Section 7.5.6) and factor methods (Section 7.5.5). While cluster algorithms are used for separating the dataset into groups (cluster), factor methods allow the extraction of information from a hyperspectral dataset.


Here, we can only provide a short introduction and overview of all these methods. The interested reader is referred to specialized literature on data mining (see, e.g., [15, 16]).


7.5.1 Optimization Problems and Optimization Methods


Optimization problems involve two steps: in the first step, a criterion is defined, and in the second step an optimization of this criterion in the parameter space is carried out. The criterion is a mathematical definition of a task; for example, in order to maximize the separation between two groups, the criterion might be:


Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Nov 11, 2017 | Posted by in PATHOLOGY & LABORATORY MEDICINE | Comments Off on Image Processing—Chemometric Approaches to Analyze Optical Molecular Images

Full access? Get Clinical Tree

Get Clinical Tree app for offline access