Trends in ERP Analysis Using EEG and EEG/fMRI Synergistic Methods



(1)

Any approach used to solve this kind of problem has to make some assumptions regarding the source signals, the mixing process, or both. The main assumption of ICA is that the source signals are statistically independent. The success of ICA in the decomposition of many natural signals is that the assumption of independence seems to be more realistic than other mathematical constraints, like orthogonality.

One of the first applications of PCA and ICA was the de-noising of the multi-channel EEG signal. PCA was the first technique employed for artifact correction and removal from EEG recordings [21, 22] but soon it was found that ICA was more efficient for this purpose [23]. PCA is commonly used for reducing the dimensions of the problem before applying ICA or for data reduction and summarization of time-frequency transformed data [13, 24].

ICA has been proven very successful in the analysis of EEG data and especially in the exploration of the dynamics of ERP data. It is being used for removal of artifacts from the EEG recordings with success without losing relevant information. Also, it has been successfully applied on continuous or event-related EEG to decompose it into a sum of spatially fixed and temporally independent components that can lead in different spatial distribution patterns, which in turn may be directly attributed to underlying cortical activity. The classical procedure for quantifying evoked responses is through averaging over trials. This procedure enhances the task-related response and filters out the irrelevant background EEG. As we noted earlier, this approach assumes that background EEG behaves as random noise and that the task-related response is approximately identical from trial to trial. Phase relevant measures as intertrial coherence (ITC) have been used to characterize the phase consistency of the detailed time frequency content throughout trials [19]. ITC can be expressed as following:



$$\displaystyle{ \mathrm{ ITC}(k,n) = \left \vert { 1 \over T} \sum _{i}{ X_{i}(n,k) \over \vert X_{i}(n,k)\vert }\right \vert }$$

(2)
In [16] a similar measure for quantifying phase-locked activity in ERP trials called phase intertrial coherence (PIC).



$$\displaystyle{ \mathrm{ PIC}(k,n) ={ \sum _{i}X_{i}(n,k) \over \sum _{i}\vert X_{i}(n,k)\vert } }$$

(3)
Though the two measures are similar, PIC measure takes under consideration the amplitude of the measured signal and not only the phase [16]. The first measure for quantifying event-related oscillations is the ERD and ERS which measures increase or decrease in the power of specific bands relative to some baseline pre-stimulus power [9]. Event-related spectral perturbations (ERSP) is an extension of these measures in the time-frequency domain, enhanced with tests for significance [18]. In [16] a measure for the evaluation of consistent oscillations across trials is introduced under the term phase-shift intertrial coherence (PsIC) which is defined as following:



$$\displaystyle{ \mathrm{ PsIC}(k,n) =\sum _{i}{ X_{i}(n,k){\vert }^{2} \over \max \sum _{i}\vert X_{i}(n,k){\vert }^{2}} }$$

(4)
This measure cannot be directly compared with the other two since it does not take into account variations in the power of a certain band but rather examines whether a narrow band oscillation is present in single trials, in a consistent manner. In Fig. 1 we tried to formulate the representation in the measures of the different assumptions about the generation of the ERP covering the spectrum between the two models. We can see that even though we can distinguish evoked from induced activations through the PIC measure the underlying generative model cannot be distinguished using only these measures. Further research is needed in order to shed light on this complex debate.

A303611_1_En_67_Fig1_HTML.gif


Fig. 1
Summary of the different ERP activations and how they are reflected in the measures. We can see that the measures discussed above cannot sufficiently characterize the nature of the activations.




3 Methods for fMRI Analysis


The main goal of fMRI is to identify and map brain regions to certain brain functionalities. We will focus on fMRI using BOLD, as the majority of fMRI-related studies are based on this approach.

It is important to describe the basic principles behind fMRI since it will allow us to better understand the nature of the measured signal and will allow us to identify the level at which the two modalities can be combined. BOLD fMRI takes advantage of the magnetic properties of the hemoglobin which are used as a natural contrast agent. Hemoglobin is the protein responsible for the transport of oxygen from the respiratory organs (lungs) to the rest of the body. The oxygenated hemoglobin presents different magnetic properties than the deoxygenated one. Deoxygenated hemoglobin is paramagnetic while the oxygenated presents diamagnetic properties. In practice, this means that the magnetic field is distorted around blood vessels containing deoxygenated hemoglobin and this distortion results in reduced relaxation time [3]. The tissues around blood vessels with different concentrations of (de)oxygenated hemoglobin will have different relaxation times and therefore different intensities in the final MR image. We will not go into details about the process of image formation since this is beyond the scope of our study.

BOLD fMRI measures the flow or the change in flow of the blood in the brain. Increased blood flow occurs as the brain requires energy in the form of glucose, to be delivered in brain areas involved in some sort of processing. An active area requires more energy, which translates to increase of the oxygenated blood arriving in this area. Therefore, fMRI measures blood oxygenation levels, which change with increased metabolic demands of active brain areas and form a sort of indirect measure to neuronal activity [25].

In order to observe physiological or pathological changes in functional activity using fMRI an experiment has to be set up that will help reveal and identify such activities. MRI images are obtained while a subject is performing a motor, cognitive or sensory task, which is designed to elicit specific brain activity. Using the obtained images, the next step is to find patterns of activity that correlate with the performed task. Voxels whose intensities over time present significant correlation with the time evolution of the experiment are considered as related to the task and marked as active. The effect of noise and the need to perform complex experiments has led to the development of complex preprocessing and analysis methods.

The design of the experiment can be categorized into two distinct categories depending on the way that the stimuli are presented to the subject: block and event-related designs. In block designs the stimuli of a certain class are presented continuously for a certain period/block of time [26]. Usually, periods of rest are alternating with periods of task. The idea behind this type of design is to increase the signal-to-noise ratio of the hemodynamic response by requesting continuously the same response and therefore a steady level of BOLD signal. This comes in the expense of completely disregarding the temporal evolution of the hemodynamic response.

On the other hand, event-related designs alternate the order of presented stimuli, which are presented in random order separated by a short period of rest. The response to each stimulus is measured and the hemodynamic response function (HRF) can be estimated. Event-related designs allow the execution of more complex experiments in the expense that it is not always possible to detect these activations due to low signal-to-noise ratio [27].


3.1 fMRI Data Preprocessing


Analysis of fMRI data involves a series of preprocessing steps necessary for the actual statistical processing. The series and order of such steps are also known as preprocessing pipeline [28]. We will not deal with preprocessing steps taken in k-space, only with methods used in the image space.

These preprocessing steps involve temporal and spatial registration of the acquired MR signals in order to compensate for noise and variations due to the measurement process [28]. Noise in fMRI can be introduced due to MR signal strength fluctuations throughout the session known as thermal noise [25]. This kind of noise is due to thermal motion of the electrodes in the scanner or the tissue under examination. Thermal noise increases as the temperature and the strength of the magnetic field increases. Another source of noise originates from the system itself and is related to inhomogeneities of the magnetic field and variations of the gradient fields used to spatially target the measurement [25]. Thermal and system noise are unavoidable under a particular setup and do not relate to the experimental task and their effects can be easily mitigated.

A different source of noise is due to subject movements. Since we are dealing with a living organism we cannot expect that the subject can remain still throughout the whole session. Head movement is a major source of noise in fMRI studies and excessive head motion during an experiment may render the data unusable [25]. Since the whole procedure relies on radio frequency pulses in order to localize the recorded activity, even small movements will have the effect of transferring activation from one location to nearby ones resulting in blurring of the obtained signal.

An additional problem with head movement is that sometimes is related to the task and its effects can influence the final result. This effect is mostly apparent in visual tasks where the subject has to guide its gaze to various presented targets. Even though it is required only to move its eyeballs most of the time this movement is accompanied by a small involuntary displacement of the head. Also, even small movements due to breathing or heart beating introduce noise. The effect of breathing and heart beating on the results is rather complex since they make the brain inside the skull to move and expand [29].

In order to compensate for noise and cancel any non-task related influences the preprocessing of the data includes noise correction, slice-timing correction, motion correction, and registration [28]. The final pipeline sequence depends largely on the experiment design, the strength of the magnetic field, and the pulse sequence used for acquiring the k-space data.

Slice-timing correction adjusts the data of each slice as each slice was sampled at the same time. This procedure tries to correct the fact that each slice is sampled separately at a slight different time interval than the previous one. Slices can be collected either sequentially, one adjacent slice after the other or interleaved, odd slices first or then the even ones. If the time needed to collect the whole brain volume is T R, then depending on the manner that the slices are collected the last slice is collected T R or T R∕2 from the first one. Interpolation between voxels of the same slice acquired in adjacent time frames is used to estimate the signal for the specific acquisition time [30]. Different interpolation techniques have been proposed for slice-timing correction like linear, cubic, and spline [31]. Usually slice-timing correction is used as the first step since it simplifies next steps in the preprocessing procedure [28].

As we mentioned, both due to system inaccuracies or subject movement, voxels are displaced and therefore we have to align them so that the time series of each one represents the BOLD signal from the same brain region throughout the session. Usually, rigid-body registration is used in order to correct head motion and a large number of techniques have been proposed [28]. We can classify the algorithms into those that use intensity-based measures and those that use landmarks in order to register the slices together. The algorithms used in order to calculate the rigid body parameters and the interpolation method are other factors that differentiate the different algorithms. Many algorithms have been proposed and almost all fMRI analysis packages include their own implementation of algorithms like mutual information-based algorithms, normalized correlation and AIR automated image registration just to name a few [3235]. All of these algorithms assume that all the slices of a single stack have been collected at the same time and they consider a rigid movement of the whole brain volume. This is a reason why slice-timing correction usually precedes head motion correction.

Most of the fMRI studies use multiple subjects to infer information about activating regions. Since the size and shape of brains of different subjects vary along with the relative location of several anatomical brain structures a sort of normalization and registration has to be performed before inferring conclusions from the results. This preprocessing step is achieved by bringing the different volumes in the same coordinate system and then use some linear or nonlinear registration to a common brain atlas, so that their shape, size, and direction are the same [27, 36]. Figure 2 displays the results of these preprocessing steps. A frequently used coordinate system is the Talairach stereotactic coordinate system.

A303611_1_En_67_Fig2_HTML.gif


Fig. 2
First row: Example of affine registration using the FSL toolbox [43]. Figure (a) is the template brain we want to register to. Figure (b) is the brain we want to register and in figure (c) we can see the registration result. Affine registration is used as a first step before applying nonlinear techniques. Second row (d) shows the time course of a specific voxel with high correlation with a block experimental design


3.2 fMRI Analysis Approaches


After a series of preprocessing steps have been carried out, the data are ready for statistical analysis. The data that we have to work with include brain captions in different time points. Alternatively, we can consider our data as the time evolution of the intensity of each pixel in the captured brain slices. A typical fMRI dataset consists of thousands time-series; if we consider that we have 32 slices with each slice being a 128 × 128 matrix.

Typical fMRI analysis involves the detection of changes in the mean signal intensity of the MR data during different behavioral conditions. Different statistical methods have been employed to accomplish this task, ranging from simple correlation analysis to more complex models that take under consideration the temporal and spatial correlations of the data [30]. The final output is an activation map generated for each condition or experimental task. Voxels, which present increased activity during a certain condition are marked as active. Voxels, which present decreased activity during the task condition are marked as deactivated; these are of less interest and are rarely exploited.

An approach for detecting significant changes is to use Student’s t-test. t-Test is commonly used in fMRI analysis due to its simplicity and easy interpretation of the results. Since we have multiple recordings in time an alternative approach is to use the Kolmogorov–Smirnov (K–S) test to compare the distribution of the MR signal intensity between the two conditions [37]. This test can be used to detect changes in the variance along with changes in the mean [38]. Split-half test is another option that has been used to determine significantly different voxels [39]. Many studies have extended in these techniques by adding spatial or other constraints. Finally, an approach was proposed that used multiple linear regressions in order to take under consideration the time course of the fMRI signal [40, 41].

The formula of the general linear model is



$$\displaystyle{ Y = X \times b + n }$$

(5)
where 
$$Y$$
is the response, 
$$X$$
is the matrix of predictors, and 
$$b$$
the unknown coefficients of the predictors. The error n is assumed to have normal distribution with zero mean and variance 
$${s}^{2}$$
. In our case, 
$$Y$$
represents the time course of the fMRI signal in the voxels; 
$$X$$
represents the design matrix of the experiment under which we obtained the measurements. Under the assumption that each measurement and each voxel is independent, the parameters b can be obtained by least squares. The GLM model is a more general model that encompasses t-test and correlation approaches. GLM makes some assumption about the data as that the voxels and their time courses are independent and that the same model as described by the design matrix is sufficient for all voxels. These assumptions do not hold in reality and a lot of efforts have been devoted to the extension of the GLM.

The design matrix incorporates the different experimental conditions and is a matrix containing the BOLD response for each experimental stimulus. The BOLD response is typically modeled as linear time invariant (LTI) system, where the stimulus is the input and the HRF acts as the impulse response of the system [37]. An LTI system can be fully characterized by its impulse response or transfer function. If we treat the brain as an LTI system, finding the transfer function of this system would allow us to predict the response of the brain in different conditions and under complex inputs. Of course no actual physical system can be considered as a LTI one, but it has been shown that an LTI approximation can characterize quite well the behavior of many systems.

The GLM formulation described is the most popular technique for detecting active regions in an fMRI experiment. The main drawback is that it is a very strict model and any mismodeling will result in an increase of the false positive rate. Also the way that the HRF is calculated plays a significant role to the final result. Nonlinear models like the Balloon model [42] that models changes in the flow and volume and how these changes affect the BOLD response are more realistic but they require the estimation of a lot of parameters and they are sensitive to noise. The linear model is very popular mainly due to its simplicity, robustness and interpretability.

A different approach is to use the information of the data without imposing any strict, specific model. This kind of techniques are very popular since they allow the execution of experiments with complex stimuli that makes the difficult to estimate the time of activation and therefore makes the use of models not practical. Data-driven approaches have been used in psychological studies involving emotion, motivation, or memory. On top of that they allowed the study of the fMRI of the resting state.

Popular component decomposition techniques include PCA and ICA. We have already described the basic principles behind PCA and ICA in the EEG section. In the fMRI context, there exist two different approaches, mathematically equally. PCA or ICA can be either temporal or spatial [44]. In the first case we are looking at the temporal structure of the data in order to find voxels/regions that present the same time evolution while, on the other hand, we are looking in the spatial structure of the data and we are looking for similar spatial patterns through time. PCA was one of the first techniques to be employed in the voxel time series in order to extract spatial regions with similar temporal evolution, allowing exploration of the functional connectivity of brain regions, as we will discuss later.

ICA is more popular and has numerous applications in all kinds of studies, from event-related to resting state studies. Temporal and spatial ICA have been used extensively, although spatial ICA is more often encountered in studies. For the spatial case the ICA model is 
$$X = As$$
where X is a 
$$t \times n$$
matrix where t represents the time points and n the voxels, A is 
$$t \times t$$
temporal mixing matrix for the t × n spatial independent components/images of the matrix S. Temporal ICA presents the symmetrical configuration where X is an 
$$n \times t$$
matrix and A is an 
$$n \times n$$
spatially mixing matrix for the n × t independent time courses. Since the number of voxels is much larger than the number of time points, temporal ICA is much more computationally intensive than the spatial one.

The problem with ICA is that since it is a stochastic method, different runs would produce different results [45]. Before using the calculated independent components in our analysis we have to evaluate the reliability of those components. Different methods have been proposed for evaluating the consistency of the results. The most popular technique runs ICA multiple times in bootstrapped data and then clusters the independent components. ICs in clusters with small inter-cluster and high intra-cluster distance are considered reliable for further evaluation [45].

In contrast to the PCA where the significance, as variance explained, of each principal component is already known, in ICA each component has to be evaluated separately in order to distinguish task-related ICs from noise. In [46] spatial ICA was directly compared to the GLM and is used as a way to solve the GLM problem without using a fixed design matrix; it is directly computed by the ICA.

Analysis and evaluation of Independent components can be distinguished into two approaches. The one is inspired from the extensive work in the EEG Event-related experiments where independent components are separated into task-related and noise [14] and thus ICA is treated as a filtering procedure. In a different context, the filtering aspects of ICA were used in order to remove task-related activity in order to study the rest state of the brain [47].

Other methods that have been applied to fMRI data and not presented here include canonical correlation analysis (CCA) [48] with extensions to accommodate group analysis like in [49]. These approaches led to new algorithms that were able to incorporate data not only from multiple subjects but also from multiple modalities [50].


4 Electroencephalography–Functional Magnetic Resonance Imaging


Combining information obtained from different modalities seems really promising especially in the study of the brain. fMRI and EEG (or MEG) seem to be complementary in nature and form an ideal candidate pair of modalities for such integration. EEG provides excellent temporal resolution of neural activations and MRI/fMRI provides structural and spatial accurate information about metabolic changes in different brain regions—that can be attributed to neural activation.

It is apparent that the two modalities describe and represent different phenomena that there is no assurance that they are directly and uniquely associated/correlated. The most promising results supporting EEG–fMRI integration come from studies that combine fMRI with invasive electrode data [51]. These studies show significant correlations between the time course of activations of fMRI and electrophysiological signals.

On the other hand, there exist several studies that suggest that such integration is not as straightforward as it seems. A one-to-one correspondence between ERP peaks with fMRI activations cannot be assumed as underlined in [52, 53]. In [54] simultaneous recordings from a single subject are used in order to demonstrate that EEG significant features, as peak amplitude, are not likely to be correlated with BOLD signals.

Nevertheless, there are different views regarding the relationship between the local neuronal activity captured by the EEG and related changes in the cerebral blood flow. Another point that needs attention is how to treat the absence of any relation or correlation between the two modalities. This could be attributed in algorithmic limitations, meaning that have a false negative situation. The most difficult question is, though, what if a failure to associate the two modalities is by itself a significant finding or an indication of pathology. There are no straightforward answers in this problem and there is an ongoing research regarding these questions. In general, we can assume that EEG activity is not necessarily co-localized with fMRI activations and also certain fMRI activations do not correlate with EEG. An illustrative model to describe the overlap of EEG explained activations and fMRI ones is described in Fig. 3. It is obvious that parallel analysis of the two modalities will help us to understand better the activations reflected by these modalities. In this section we are going to discuss methods for identifying and characterizing brain activations using information from both modalities.

A303611_1_En_67_Fig3_HTML.gif


Fig. 3
Illustration of activities explained by each modality. Activities in the cross-section are reflected in both modalities and are the ones that we can use in fusion. Illustration is based on [55]

Towards this goal, several methods have been employed in order to take advantage of the extra information that each modality provides to the other. On the one hand we have the methodologies that use information from one modality in order to constrain or explain results derived from the other. This approach is known as information integration [55] and includes the methods that we will discuss in the next two sections. The other approach tries to find common patterns of activation in the two modalities in parallel. This approach is characterized as information fusion [55] and includes data-driven and model-based methods. We are interested mainly in the data-driven techniques that have been employed towards this direction.


4.1 EEG Localization Through fMRI Constraints


One of the first attempts for EEG–fMRI integration was to use the fMRI spatial information in order to constrain the problem of EEG source localization. Early attempts used dipole modeling to solve the source localization problem [56] and then regions extracted from the fMRI were used to constrain the dipole location inside the head [57, 58]. This methodology assumes that the possible EEG dipoles express hemodynamic changes reflected in the fMRI. As we mentioned earlier this is an assumption that does not hold in general. On the other hand, dipole modeling localization assumes that the observed EEG is created by a couple of dipoles which seems to be an unrealistic assumption.

In order to overcome the limitation of dipole modeling, current density modeling approaches were employed for the source localization of the EEG [59]. LORETA [60] is the most popular technique for localization through current density modeling. A major disadvantage of the proposed methodologies is that in order to compare the sources calculated from the EEG we have to collapse the EEG in time, either by calculating the sources using the average over time or by using the average LORETA source estimations, thus canceling the temporal advantages of EEG. An important aspect was highlighted by the findings in the study of [59]. In this study, it was shown that fMRI regions and LORETA sources were matching when the group mean data was used. On the individual level though, only half of the subjects presented significant correspondences. The group finding shows that such relations can be established but the individual results stress the fact that caution should be exercised when trying to combine the two modalities.

Despite the fact that EEG source localization is an ill-posed problem and the obtained results should be treated with a certain degree of uncertainty, a lot of efforts have been dedicated to this kind of analysis that extends beyond functional characterization and extends to works that try to assess functional integration [61, 62]. Nonetheless, there still exist major issues in the application of this approach that seem difficult to be transcended soon.


4.2 fMRI Prediction Using EEG Information


A different approach to the problem of EEG–fMRI integration is gaining grounds lately, primarily due to technology advances that allow simultaneous recording of EEG and fMRI. This approach uses EEG features in order to infer fMRI activity.

The work in [63] was the first to display that using fMRI activations we can localize EEG bands without the use of complex and ambiguous methods of source localization. In this work, authors used the alpha band power as a predictor in order to identify regions that changed with alpha band power modulation [63]. Following works extended the study in other bands [64, 65]. This technique has been extremely useful in the analysis of the brain rest state or in complex experiments without a specific stimulus or task. An important application is the presurgical evaluation of pharmacoresistant focal epilepsy where the accurate localization of the epileptic region is needed[6668]. Actually this clinical application was the driving force behind the development of the needed hardware that allowed simultaneous recordings [69].

A different application is based on the examination of EEG–fMRI event-related single trial covariation. The goal is to identify brain regions that the BOLD response shows the same modulation as a specific single trial ERP component (peak). The basic idea is to use features of the single trial ERPs and use them as predictors for producing fMRI maps related to each single trial feature [70, 71].


4.3 Data-Driven Fusion Approaches


The approaches described above use one modality in order to constrain or predict the other. Models that use a common generative model, explaining the data of both modalities would be the ideal solution to the fusion problem. There have been made efforts towards this direction, without any model reaching a sufficient maturity level [7274]. On top of that, the complexity of these models renders their application difficult and therefore many models exist only in the theoretical domain or have found limited applications.

Recently, inspired from the advances in the application of multivariate methods in EEG and fMRI, data-driven fusion is gaining a lot of attention. A lot of effort has been attributed to methods that extend application of ICA to multi-modal data. In this category we have a series of ICA-based methods developed for this end [50, 75]. Other methods include multi-set CCA [76, 77], which has been applied to single trial ERP and fMRI. The application of these methods has been focused primarily on the analysis of ERP data, in an effort to explore and exploit trial to trial variations.

Based on the success of ICA in analyzing EEG and fMRI separately, there have been an effort to extend ICA for the fusion of EEG and fMRI. There are studies that use ICA to decompose EEG data and extract useful features that can be used to predict the fMRI activation [70, 78]. Other studies used ICA to extract distributed regional networks from fMRI and the BOLD signals were correlated with power fluctuations in different EEG bands [79]. The aforementioned approaches use ICA in one or the other modality and then use an asymmetrical approach as the ones described earlier. We will focus on algorithms that operate in both modalities and offer a symmetric approach to the problem.

In this context joint ICA was proposed in [80]. The joint ICA algorithm assumes that EEG and fMRI features share the same mixing matrix. In joint ICA we assume that the modalities are jointly temporal or spatial independent and therefore increased BOLD activity will be reflected in increased amplitude of a certain ERP peak. Joint ICA operates in the space defined by the features of both modalities. In order to avoid bias we need to transform EEG and fMRI data and bring them in the joint space so that we can recover the common mixing matrix and independent sources that explain the features. Usually the ERP data from selected channels are used as features while the fMRI activation maps extracted in a previous step (using GLM for example) are the corresponding features for fMRI. The ERP data used are up sampled in order to match the number of fMRI voxels and the data from both modalities are normalized and concatenated into a single matrix, in which joint ICA is going to be applied.

The problem with joint ICA is that the assumptions regarding the generation of the observations are too strict and possibly are not physiologically plausible. A method proposed for relaxing these assumptions and to provide a more flexible estimation is parallel ICA [75]. This method identifies components in both modalities simultaneously and constrains the solution so that maximum correlation is achieved between the mixing matrices of the two decompositions. The correlation constraint is defined by the maximally correlated components in each iteration. The indices of constrained components as well as the number of them are allowed to vary from one iteration to the other. The correlation threshold is chosen manually and prior knowledge about the experiment is required for choosing the appropriate threshold. Parallel ICA has been reported to provide stable results and has been used extensively for the fusion of other modalities as well [81].

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jun 25, 2017 | Posted by in PATHOLOGY & LABORATORY MEDICINE | Comments Off on Trends in ERP Analysis Using EEG and EEG/fMRI Synergistic Methods

Full access? Get Clinical Tree

Get Clinical Tree app for offline access