Assessment Using Single-Trial Analysis of Brain Activity



Fig. 1
Typical N100 components from normal control (left) and schizophrenia patient (right)



Both the P50 [2, 7, 8, 1722] and the N100–P200 complex [18, 2226] exhibit gating effects, even though it is now believed that the N100–P200 sensory gating could reflect different mechanisms than those reflected by the P50 gating [5, 6] and thus could account for different brain functions [10].



2 EP Estimation


The ongoing EEG activity is of considerably higher amplitude than EPs, which are typically buried in the background processes and require averaging of a large number of single-trial responses to become visible. Averaging, however, does not allow the study of changes in EP amplitude, latency, or phase characteristics from trial to trial, and thus single-trial analysis would be better suited for analyzing the dynamics of brain activation.

In addition to neural activity, scalp recordings often contain activity from eye blinks, slow eye movements, muscle and cardiac activity, interference from power lines, as well as background activity unrelated to the experimental task under study. Ideally, one would like to remove all these noise processes and record only the useful signal, i.e., the true EP activity of the cortical generators activated by the experimental task. Early attempts at estimating EPs included Fourier-based filtering techniques [27, 28], which, however, were not well suited to the nonstationary nature of the signals. Thus, time-adaptive filters were introduced [29, 30] that were better adjusted to the EP signal characteristics. Wavelets, which follow a similar concept, have been extensively used in the analysis of EEG and average EPs [31, 32] and, more recently, single-trial EPs [33]. Neural networks [34] and principal component analysis have also been employed mostly to remove artifactual activity [35, 36]. Another technique that is based on a combination of Woody filtering [37] for latency correction in the time domain and wavelet denoising has been proposed to enhance single-trial EPs [38]. This method, however, was shown to work mostly with high-amplitude EPs, while for low signal-to-noise ratios (SNRs) its performance deteriorated [38].

To explore possible phase relationships among single trials that would help improve EP estimation, we developed a selective averaging technique that involved fuzzy clustering [39, 40], in which each subject’s single trials were clustered in two mutually exclusive groups and then two partial EPs were computed by averaging the trials in each group, separately for the S1 and S2 responses.

In normal controls, we found that individual responses had approximately the same amplitude for both the S1 and the S2 parts. However, the two partial EPs were in phase around the S1 P50 latency and out of phase around the S2 P50 latency. Phase synchronization produced a high-amplitude average S1 response, while phase desynchronization yielded a lower amplitude average S2, as seen previously with ensemble averaging. Figure 2 shows typical EPs from a control subject for the S1 (left) and S2 (right) responses. Solid lines represent the ensemble average EPs, whereas dashed lines depict the two partial EPs computed from the same dataset.

A303611_1_En_63_Fig2_HTML.gif


Fig. 2
Partial (dashed lines) and ensemble-average (solid lines) P50 components obtained from a normal subject, corresponding to the S1 (left) and S2 (right) responses, respectively

In schizophrenia subjects, again partial EPs had approximately the same amplitude; however, partial EP synchronization was significantly reduced compared to normal controls, for both the S1 and the S2 responses. This reduced synchronization resulted in lower S1 amplitude and decreased S2 amplitude attenuation, as was typically seen in schizophrenia patients with ensemble averaging. Similar latency desynchronization effects were reported later [41] as increased latency jitter (i.e., latency variability) in S1 in schizophrenia subjects compared to controls, but not in S2, and could explain the decreased apparent amplitude of the average S1 response in patients. Overall, these studies show that phase synchronization occurs primarily in normal controls and to a lesser degree in schizophrenia patients, and this difference can explain the gating discrepancies observed in the two groups.

Most of the aforementioned EP estimation techniques attempt to improve the entire brain response waveform at once, i.e., all EP components simultaneously, without paying attention to difference in noise susceptibility of individual components. Contrary to all other methods, we have proposed an alternative methodology [42, 43] that can extract individual components out of the complete EP waveform. The method analyzes single-trial EPs based on independent component analysis (ICA) and relies on the hypothesis that brain activity resulting from an experimental stimulus is independent from neurophysiological artifacts and background activity [32, 44, 45]. The method provides clear estimates of individual EP components in single-trial responses, along with estimates of their amplitude, latency, and phase characteristics, and consequently allows studying the dynamics of the underlying cortical generators that give rise to specific EP components.

The method has been studied extensively with regard to its performance. In particular, we compared [46] the iterative ICA (iICA)-based single-trial analysis against traditional ensemble averaging using synthetic and real data from normal subjects and found that the iICA method provides improved estimates of the N100 component. We also showed that these findings are independent of the model assumed for EP generation and remain true for both the phase-resetting and additive models. Moreover, when compared with time–frequency wavelet denoising, iICA performed equally well or better over a wide range of experimental conditions and SNRs [47]. The iICA method has been used to analyze EP activity with data obtained at different labs and under different experimental protocols, including a simple N100 auditory task and typical paired stimulus P50 and N100 paradigms [48, 49]. The following sections describe the various steps of the iICA method along with application examples on data recorded from normal control subjects and schizophrenia patients.


3 Methods



3.1 Fuzzy Clustering


Clustering is an algorithmic procedure used to partition a set of observations, or patterns, into clusters so that patterns within a class are more similar to each other than patterns in the rest of the classes (in our case, a pattern is a single-trial EP). In conventional clustering, a pattern can either belong or not to a particular class—in the former case, its degree of membership to that class is one; in the latter, it is zero. In fuzzy clustering, on the other hand, every pattern belongs to all possible classes, but membership values are allowed to vary continuously from zero to one. The higher the membership of a pattern to a particular class, the more “typical” the pattern for that class.

For each class of patterns, the clustering algorithm identifies a class representative or template. In conventional clustering a class template is computed as the average of the patterns assigned to that class. In contrast, in fuzzy clustering a class template is computed as a weighted average of all patterns, the weights being the membership values of the patterns to that class. In this way typical patterns contribute more to the template, whereas patterns that are less typical contribute less, and the resulting templates are more reliable representatives of their respective classes.

Mathematically the above procedure can be described assuming that a set of N patterns X j , with j = 1, 2, …, L, can be divided into K classes C i with centroids V i , where i = 1, 2, …, K, respectively. The degree of membership μ ij of vector X j to class C i , for j = 1, 2, …, L, can be computed as



$$ {\mu}_{ij}=\frac{\displaystyle{\left[\frac{1}{d^2\left({X}_j,{V}_i\right)}\right]}^{1/q-1}}{{\displaystyle \sum_{i=1}^K{\left[\frac{1}{d^2\left({X}_j,{V}_i\right)}\right]}^{1/q-1}}}, $$
where d 2 is a distance measure, often the common Euclidean distance, between the j-th pattern X j and the i-th template V i given by 
$$ {d}^2\left({X}_j,{V}_i\right)={\displaystyle \sum_{i=1}^M{\left[{X}_j(t)-{V}_i(t)\right]}^2} $$
and q > 1 is the fuzziness index whose value is to be determined empirically (q = 1 gives conventional “crisp” clustering). Typically, class templates V i and pattern membership values μ ij are identified using an iterative procedure. Bezdek’s [50] fuzzy K-means algorithm computes membership values μ ij , so that 
$$ {\displaystyle \sum_{i=1}^K{\mu}_{ij}=1\kern1em \mathrm{ for}\kern1em j=1,2\ldots, N} $$
, where j = 1, 2, …, L. The fuzzy centroids V i , with i = 1, 2, …, K, are updated iteratively through the relation



$$ {V}_i=\frac{{\displaystyle \sum_{j=1}^N{\left({\mu}_{ij}\right)}^q{X}_j}}{{\displaystyle \sum_{j=1}^N{\left({\mu}_{ij}\right)}^q}} $$
until the difference between two successive estimates of V i is less than a predefined small value (e.g., <10−3) or until a predefined maximum number of iterations has been reached.


3.2 Iterative Independent Component Analysis


Independent component analysis [51] is a method for solving the blind source separation problem [52], which tries to recover N independent source signals, s = {s 1, …, s N }, from N observations, x = {x 1, …, x N } that represent linear mixtures of the independent source signals. The key assumption to separate sources from mixtures is that the sources are statistically independent, while the mixtures are not. Mathematically, the problem is described as a transformation x = As, where A is an unknown mixing matrix. The task then is to recover a version, u, of the original sources, similar to s, by estimating a matrix, W, which inverts the mixing process, i.e., u = Wx. The estimates u are called independent components (ICs). The extended infomax algorithm is the most efficient technique to solve this problem and relies on a neural network approach and information theory [5356].

Briefly, for a discrete variable X the entropy H is defined as



$$ H(X)=-{\displaystyle {\sum}_iP\left(X={a}_i\right) \log P\left(X={a}_i\right)}, $$
where a i are all possible values of X, whereas the mutual information between variables X and Y is a measure of the information that these random variables share and can be defined as



$$ I\left(X,Y\right)=H(X)+H(Y)-H\left(X,Y\right), $$
where H(X, Y) is the joint entropy of X and Y. For a random vector x = {x 1, x 2, …, x N }, the mutual information can also be seen as the distance between the joint distribution and the product of marginal distributions,



$$ I(X)={\displaystyle \int p(x) \log \frac{p(x)}{{\displaystyle \prod_{i=1}^N{p}_i\left({x}_i\right)}}} dx. $$

Maximum information preservation suggests [57] that the transformation of a vector x observed in the input layer of a neural network to a vector y produced in the output layer jointly maximizes the information about the activities in the input layer. The parameter to be maximized is the average mutual information between x and y, in the presence of processing noise. Information maximization can solve the blind source separation problem [53]. Maximizing the joint entropy H(y) of the output y can approximately minimize the mutual information among the output components yi = gi(ui), where gi(ui) is an invertible monotonic nonlinearity, u = Wx, and W are the synaptic weights of the neural processor. The joint entropy of the outputs is given by



$$ H\big({y}_1,\dots, y{}_N\big)=H\left({y}_1\right)+\dots +H\left({y}_N\right)-I\big({y}_1,\dots, y{}_N\big), $$
where H(y 1),…,H(y N ) are the marginal entropies of the output variables, and I(y 1,…,y N ) is their mutual information. Maximizing the joint entropy consists of maximizing the marginal entropies and minimizing the mutual information. The former will also decrease the latter, since the mutual information is always positive. In the limit, when I(y 1, …, y N ) = 0 the joint entropy is the sum of the marginal entropies.

Two sets of parameters determine the maximum joint entropy, namely, the nonlinearity yi = gi(ui) and the synaptic weights W in the neural network. The nonlinearity can be a fixed logistic function [53], while the synaptic weights W can be found by maximizing the joint entropy with respect to W. By computing the derivative of H with respect to W, an efficient way to maximize the joint entropy is to follow the “natural” gradient W T W [58] given by



$$ \varDelta W\propto \frac{\partial H(y)}{\partial W}{W}^{\mathrm{ T}}W. $$

The ideal form for the nonlinearity is the cumulative distribution function of the independent sources, but in practice, it is chosen to be a sigmoid function [53]. The extended infomax ICA algorithm implemented in the software package EEGLab [59] used in our analyses allows for sources with sub-Gaussian or super-Gaussian distribution, and the maximization in these cases is given by [54]



$$ \Delta W\propto \left[I-K \tanh (u){u}^{\mathrm{ T}}-u{u}^{\mathrm{ T}}\right]\;W\left\{\begin{array}{*{20}{c}}{ll}{k}_i=1\hfill & \mathrm{ super}\text{--} \mathrm{ Gaussian\hfill} \\{}{k}_i=-1\hfill & \mathrm{ sub}\text{--} \mathrm{ Gaussian\hfill} \end{array}\right., $$
where k are the elements of an N-dimensional diagonal matrix K given by [55] k i  = sign[E{sech2(u i )}E{u i 2} − E{tanh(u i )u i }], which ensures stability of the learning rule.

Our technique, termed iICA, is an iterative implementation of this algorithm and is applied to a set of recordings consisting of L single trials obtained from N recording channels. All single trials obtained from a particular channel are processed in the following seven steps [42, 43]:

1.

Compute an average EP from all trials in the entire set.

 

2.

ICA-transform all single trials in blocks of 10.

 

3.

Compute the absolute correlation values between the current average EP and the ICs in all blocks, within a predefined window Wr.

 

4.

Set to zero those ICs with correlation less than a predefined threshold r th.

 

5.

Inverse-transform the updated ICs back to the time domain, separately in each block.

 

6.

Randomize the order of the updated single trials in the entire set.

 

7.

Repeat steps 1–6 until a convergence criterion is met.

 

The above steps involved with processing one channel of data containing L single trials are graphically illustrated in Fig. 3. At the i-th iteration, the procedure starts with the computation of an initial template EP i , which is considered a rough estimate of the true EP component, and this estimate is gradually improved at subsequent iterations. The same procedure is then applied to the rest of the channels until all of them have been processed. Randomizing the order of single trials guarantees that each block will include different trials in the next iteration, and thus the resulting ICA system of equations will not be underdetermined. The window parameter values used in our studies were Wr = 15–90 ms poststimulus for the P50 waveform and Wr = 50–250 ms poststimulus for the N100–P200 complex, while the threshold was set to r th = 0.15. The convergence criterion required the root-mean-squared-amplitude difference of the average EP between the current and the previous iteration to be less than 0.001; thus, the algorithm converged when the average EP morphology was stabilized.

A303611_1_En_63_Fig3_HTML.gif


Fig. 3
iICA processing of a single-channel EEG at the i-th iteration of the algorithm: (a) L single trials are averaged to produce a template EPi and then (b) ICA transformed. The resulting ICs are compared with the template EPi within a window Wr, and those ICs with correlation less than a predefined threshold are dropped, while the rest are (c) inverse transformed, randomized in order, and (d) averaged to produce a new template EPi + 1, which is used in the next iteration i + 1

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 25, 2017 | Posted by in PATHOLOGY & LABORATORY MEDICINE | Comments Off on Assessment Using Single-Trial Analysis of Brain Activity

Full access? Get Clinical Tree

Get Clinical Tree app for offline access