f-Information Measures for Selection of Discriminative Genes from Microarray Data

, where $$w_{ij} \in \mathfrak {R}$$ is the measured expression level of gene $${\fancyscript{A}}_i$$ in the $$j$$th sample, $$m$$ and $$n$$ represent the total number of genes and samples, respectively. Each row in the expression table corresponds to one particular gene and each column to a sample [17]. However, for most gene expression data, the number of training samples is still very small compared to the large number of genes involved in the experiments [17]. For example, the colon cancer data set consists of 62 samples and 2,000 genes and the leukemia data set contains 72 samples and 7,129 genes. The number of samples is likely to remain small for many areas of investigation, especially for human data, due to the difficulty of collecting and processing microarray samples [17]. When the number of genes is significantly greater than the number of samples, it is possible to find biologically relevant correlations of gene behavior with the sample categories [37].


However, among the large amount of genes, only a small fraction is effective for performing a certain task. Also, a small subset of genes is desirable in developing gene expression-based diagnostic tools for delivering precise, reliable, and interpretable results. With the gene selection results, the cost of biological experiment and decision can be greatly reduced by analyzing only the marker genes. Hence, identifying a reduced set of most relevant genes is the goal of gene selection. The small number of training samples and a large number of genes make gene selection a more relevant and challenging problem in gene expression-based classification. This is an important problem in machine learning and referred to as feature selection [9, 31].

In this regard, different feature selection methods [4, 9, 10, 31, 32, 34, 39, 55, 60] can be used to select discriminative genes from microarray data sets. A detailed survey on different feature selection algorithms is reported in Chap. 4. There are also lots of gene selection algorithms developed to select differentially expressed genes [58]. One of the popular gene selection method is significance analysis of microarrays [64], which assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements. Other notable gene selection algorithms are reported in [38, 40, 48, 52, 59, 72].

Due to the high dimensionality of microarray data set, fast, scalable, and efficient feature selection techniques such as univariate filter methods [3, 13, 25, 33, 36, 62] have attracted most attention. Univariate methods can be both parametric [2, 15, 49, 63] and non-parametric [14, 41, 51, 53, 54, 64]. The simplicity of the univariate techniques has made it dominant in the field of gene selection using microarray data. However, the univariate selection methods have certain restrictions and may lead to less accurate classifiers as they do not take into account the gene-gene interactions. Also, the gene sets obtained by these methods contain redundant or similar genes.

The application of multivariate filter methods ranges from simple bivariate interactions [5] to more advanced solutions exploring higher order interactions such as correlation-based feature selection [20, 61, 68, 73] and several variants of the Markov blanket filter method [16, 45, 70]. There also exist a number of feature selection algorithms that group correlated features to reduce the redundancy among the selected features [7, 8, 20, 21, 26, 30, 47]. The uncorrelated shrunken centroid [74] and minimum redundancy-maximum relevance (mRMR) [10, 55] algorithms are two important multivariate filter procedures, highlighting the advantage of using multivariate methods over univariate procedures in the gene expression domain. The mRMR method selects a subset of genes from the whole gene set by maximizing the relevance and minimizing the redundancy of the selected genes. An $$f$$-information measure-based method has been reported in [43] for selection of discriminative genes from microarray data using the mRMR criterion. In this regard, it should be noted that the mRMR criterion is also used in [23] and [44] for gene selection, based on the concepts of neighborhood mutual information and fuzzy-rough sets, respectively.

Gene selection using wrapper or embedded methods offers an alternative way to perform a multivariate gene subset selection, incorporating the classifiers; bias into the search and thus offering an opportunity to construct more accurate classifiers. In the context of microarray analysis, most wrapper methods use population-based, randomized search heuristics [4, 29, 35, 50], although some methods use sequential search techniques [24, 71]. An interesting hybrid filter-wrapper approach is introduced in [57], integrating a univariately preordered gene ranking with an incrementally augmenting wrapper method. The embedded capacity of several classifiers to discard input features and thus propose a subset of discriminative genes, has been exploited by several authors. Examples include the use of random forests, a classifier that combines many single decision trees, in an embedded way to calculate the importance of each gene [6, 28, 65]. Another line of embedded feature selection techniques uses the weights of each feature in linear classifiers such as support vector machine [19] and logistic regression [42]. These weights are used to reflect the relevance of each gene in a multivariate way, and thus allow for the removal of genes with very small weights.

In gene selection process, an optimal gene subset is always relative to a certain criterion. In general, different criteria may lead to different optimal gene subsets. However, every criterion tries to measure the discriminating ability of a gene or a subset of genes to distinguish different class labels. To measure the gene-class relevance, different statistical and information theoretic measures such as the $$F$$-test, $$t$$-test [10, 34], entropy, information gain, mutual information [10, 55], normalized mutual information [39], and $$f$$-information measures [43] are typically used, and the same or a different metric-like mutual information, $$f$$-information, the $$L_1$$ distance, Euclidean distance, and Pearson’s correlation coefficient [10, 27, 55] is employed to calculate the gene-gene redundancy. However, as the $$F$$-test, $$t$$-test, Euclidean distance, and Pearson’s correlation depend on the actual gene expression values of the microarray data, they are very much sensitive to noise or outlier of the data set [10, 22, 27, 55]. On the other hand, as information measures depend only on the probability distribution of a random variable rather than on its actual values, they are more effective to evaluate both gene-class relevance and gene-gene redundancy [18, 39, 55].

However, measures of the distance between a joint probability distribution and product of the marginal distributions are information measures [43, 56, 66]. Information measures constitute a subclass of the divergence measures, which are measures of the distance between two arbitrary distributions. A specific class of information (respectively, divergence) measures, of which mutual information is a member, is formed by the $$f$$-information (respectively, $$f$$-divergence) measures [43, 56, 66]. In this chapter, several $$f$$-information measures are compared with mutual information by applying them to the selection of genes from microarray data. The performance of different information measures is studied using the predictive accuracy of naive Bayes classifier, K-nearest neighbor rule, and support vector machine. The effectiveness of different $$f$$-information measures, along with a comparison with mutual information, is demonstrated on three cancer microarray data sets, namely, breast cancer, leukemia, and colon cancer data sets.

The structure of the rest of this chapter is as follows: The problem of gene selection from microarray data sets using several information theoretic measures is described in Sect. 5.2, along with a brief description of different $$f$$-information measures. A few case studies and a comparison among different $$f$$-information measures are reported in Sect. 5.3. Concluding remarks are given in Sect. 5.4.



5.2 Gene Selection Using $$f$$-Information Measures


In microarray data analysis, the data set may contain a number of redundant genes with low relevance to the classes. The presence of such redundant and nonrelevant genes leads to a reduction in the useful information. Ideally, the selected genes should have high relevance with the classes while the redundancy among them should be as low as possible. The genes with high relevance are expected to be able to predict the classes of the samples. However, the prediction capability is reduced if many redundant genes are selected. In contrast, a data set that contains genes not only with high relevance with respect to the classes but with low mutual redundancy is more effective in its prediction capability. Hence, to assess the effectiveness of the genes, both relevance and redundancy need to be measured quantitatively. In this chapter, the minimum redundancy-maximum relevance framework of Ding and Peng [10, 55] is used to select a set of relevant and nonredundant genes from microarray gene expression data sets.


5.2.1 Minimum Redundancy-Maximum Relevance Criterion


Let $${\mathbb {C}}=\{{\fancyscript{A}}_1,\ldots , {\fancyscript{A}}_i,\ldots ,{\fancyscript{A}}_j, \ldots ,{\fancyscript{A}}_m\}$$ be the set of $$m$$ genes of a given microarray gene expression data set and $${\mathbb {S}}$$ is the set of selected genes. Define $$\hat{f}({\fancyscript{A}}_i,{\mathbb {D}})$$ as the relevance of the gene $${\fancyscript{A}}_i$$ with respect to the class label $${\mathbb {D}}$$ while $${\tilde{f}}({\fancyscript{A}}_i,{\fancyscript{A}}_j)$$ as the redundancy between two genes $${\fancyscript{A}}_i$$ and $${\fancyscript{A}}_j$$. The total relevance of all selected genes is, therefore, given by


$$\begin{aligned} {\fancyscript{J}}_\mathrm{relev}= \sum _{{\fancyscript{A}}_i \in {\mathbb {S}}} \hat{f}({\fancyscript{A}}_i,{\mathbb {D}}) \end{aligned}$$

(5.1)
while the total redundancy among the selected genes is


$$\begin{aligned} {\fancyscript{J}}_\mathrm{redun}= \sum _{{\fancyscript{A}}_i,{\fancyscript{A}}_j \in {\mathbb {S}}} {\tilde{f}}({\fancyscript{A}}_i,{\fancyscript{A}}_j). \end{aligned}$$

(5.2)
Therefore, the problem of selecting a set $${\mathbb {S}}$$ of relevant and nonredundant genes from the whole set $${\mathbb {C}}$$ of $$m$$ genes is equivalent to maximize $${\fancyscript{J}}_\mathrm{relev}$$ and minimize $${\fancyscript{J}}_\mathrm{redun}$$, that is, to maximize the objective function $${\fancyscript{J}}$$, where


$$\begin{aligned} {\fancyscript{J}}={\fancyscript{J}}_\mathrm{relev}- {\fancyscript{J}}_\mathrm{redun}; \end{aligned}$$

(5.3)



$$\begin{aligned} \text{ that } \text{ is, }~~ =\sum _i \hat{f}({\fancyscript{A}}_i,{\mathbb {D}})- \sum _{i,j} {\tilde{f}}({\fancyscript{A}}_i,{\fancyscript{A}}_j). \end{aligned}$$

(5.4)
To solve the above problem, a greedy algorithm is widely used that follows next [10, 55]:

1.

Initialize $${\mathbb {C}} \leftarrow \{{\fancyscript{A}}_1,\ldots , {\fancyscript{A}}_i,\ldots ,{\fancyscript{A}}_j,\ldots ,{\fancyscript{A}}_m\}, {\mathbb {S}} \leftarrow \emptyset $$.

 

2.

Calculate the relevance $$\hat{f}({\fancyscript{A}}_i,{\mathbb {D}})$$ of each gene $${\fancyscript{A}}_i \in {\mathbb {C}}$$.

 

3.

Select gene $${\fancyscript{A}}_i$$ as the most relevant gene that has the highest relevance $$\hat{f}({\fancyscript{A}}_i,{\mathbb {D}})$$. In effect, $${\fancyscript{A}}_i \in {\mathbb {S}}$$ and $${\mathbb {C}}={\mathbb {C}} \setminus {\fancyscript{A}}_i$$.

 

4.

Repeat the following two steps until the desired number of genes is selected.

 

5.

Calculate the redundancy between already selected genes of $${\mathbb {S}}$$ and each of the remaining genes of $${\mathbb {C}}$$.

 

6.

From the remaining genes of $${\mathbb {C}}$$, select gene $${\fancyscript{A}}_j$$ that maximizes


$$\begin{aligned} \hat{f}({\fancyscript{A}}_j,{\mathbb {D}})- \frac{1}{|{\mathbb {S}}|} \sum _{{\fancyscript{A}}_i \in {\mathbb {S}}} {\tilde{f}}({\fancyscript{A}}_i,{\fancyscript{A}}_j). \end{aligned}$$

(5.5)
As a result of that, $${\fancyscript{A}}_j \in {\mathbb {S}}$$ and $${\mathbb {C}}={\mathbb {C}} \setminus {\fancyscript{A}}_j$$.

 

7.

Stop.

 


5.2.2 $$f$$-Information Measures for Gene Selection


In this chapter, different $$f$$-information measures are reported to compute both gene-class relevance and gene-gene redundancy for selection of genes from microarray data. The $$f$$-information measures calculate the distance between a given joint probability $$p_{ij}$$ and the joint probability when the variables are independent $${p_i}{p_j}$$. In the following analysis, it is assumed that all probability distributions are complete, that is, $$\displaystyle {\sum _i {p_i}= \sum _j {p_j}=\sum _{i,\,j} p_{ij}=1}$$.

The extent to which two probability distributions differ can be expressed by a so-called measure of divergence. Such a measure will reach a minimum value when the two probability distributions are identical and the value increases with increasing disparity between the two distributions. A specific class of divergence measures is the set of $$f$$-divergence measures [56, 66]. For two discrete probability distributions $$P=\{p_i|~i=1,2,\ldots ,n\}$$ and $$Q=\{q_i|~i=1,2,\ldots ,n\}$$, the $$f$$-divergence is defined as


$$\begin{aligned} f(P||Q)=\sum _i q_i f\left( \frac{p_i}{q_i}\right) . \end{aligned}$$

(5.6)
The demands on the function $$f$$ are that

1.

$$f:[0,\infty )\rightarrow (-\infty ,\infty ]$$;

 

2.

$$f$$ is continuous and convex on $$[0,\infty )$$;

 

3.

finite on $$(0,\infty )$$; and

 

4.

strictly convex at some point $$x \in (0,\infty )$$.

 
The following definition completes the definition of $$f$$-divergence for the two cases for which (5.6) is not defined:


$$\begin{aligned} q_i f\left( \frac{p_i}{q_i}\right) =\left\{ \begin{array}{ll} 0, &{} \text{ if } p_i=q_i=0\\ p_i \displaystyle {\lim _{x \rightarrow \infty }} \frac{f(x)}{x}, &{} \text{ if } p_i>0, q_i=0.\\ \end{array} \right. \end{aligned}$$” src=”http://basicmedicalkey.com/wp-content/uploads/2017/05/A319338_1_En_5_Chapter_Equ7.gif”></DIV></DIV><br />
<DIV class=EquationNumber>(5.7)</DIV></DIV>A special case of <SPAN id=IEq69 class=InlineEquation><IMG alt=$$f$$ src=-divergence measures is the $$f$$-information measures. These are defined similarly to $$f$$-divergence measures, but apply only to specific probability distributions; namely, the joint probability of two variables $$P$$ and their marginal probabilities’ product $$P_1 \times P_2$$. Thus, the $$f$$-information is a measure of dependence: it measures the distance between a given joint probability and the joint probability when the variables are independent [56, 66]. The frequently used functions that can be used to form $$f$$-information measures include $$V$$-information, $$I_{\alpha }$$-information, $$M_{\alpha }$$-information, and $$\chi ^{\alpha }$$-information. On the other hand, the Renyi’s distance measure does not fall in the class of $$f$$-divergence measures as it does not satisfy the definition of $$f$$-divergence. However, it is divergence measure in the sense that it measures the distance between two distributions and it is directly related to $$f$$-divergence.


5.2.2.1 V-Information


One of the simplest measures of dependence can be obtained using the function $$V=|x-1|$$, which results in the $$V$$-information [56, 66]


$$\begin{aligned} V(P||P_1 \times P_2)=\sum _{i,j} |{p_{ij}}-{p_i}{p_j}| \end{aligned}$$

(5.8)
where $$P_1=\{{p_i}|~i=1,2,\ldots ,n\}$$, $$P_2=\{{p_j}|~j=1,2,\ldots ,n\}$$, and $$P=\{{p_{ij}}|~i=1,2,\ldots ,n; j=1,2,\ldots ,n\}$$ represent two marginal probability distributions and their joint probability distribution, respectively. Hence, the $$V$$-information calculates the absolute distance between joint probability of two variables and their marginal probabilities’ product.


5.2.2.2 $$I_\alpha $$-Information


The $$I_\alpha $$-information is defined as [56, 66]


$$\begin{aligned} I_\alpha (P||P_1 \times P_2)=\frac{1}{\alpha (\alpha -1)} \left( \sum _{i,j}\frac{{({p_{ij}})}^\alpha }{{({p_i}{p_j})}^{\alpha -1}}-1 \right) \end{aligned}$$

(5.9)
for $$\alpha \ne 0, \alpha \ne 1$$. The class of $$I_\alpha $$-information includes mutual information, which equals $$I_\alpha $$ for the limit $$\alpha \rightarrow 1$$. That is,


$$\begin{aligned} I_1(P||P_1 \times P_2)=\sum _{i,j}{p_{ij}}\mathrm{log}\left( \frac{{p_{ij}}}{{p_i}{p_j}} \right) ~\mathrm{for}~~\alpha \rightarrow 1. \end{aligned}$$

(5.10)


5.2.2.3 $$M_\alpha $$-Information


The $$M_\alpha $$-information, defined by Matusita [56, 66], is as follows:


$$\begin{aligned} M_\alpha (x)=|x^\alpha -1|^{\frac{1}{\alpha }},~~0 < \alpha \le 1. \end{aligned}$$

(5.11)
When applying this function in the definition of an $$f$$-information measure, the resulting $$M_\alpha $$-information measures are


$$\begin{aligned} M_\alpha (P||P_1 \times P_2)=\sum _{i,j}|{({p_{ij}})}^\alpha - {({p_i}{p_j})}^{\alpha }|^{\frac{1}{\alpha }} \end{aligned}$$

(5.12)
for $$0 < \alpha \le 1$$. These constitute a generalized version of $$V$$-information. That is, the $$M_\alpha $$-information is identical to $$V$$-information for $$\alpha = 1$$.


5.2.2.4 $$\chi ^\alpha $$-Information


The class of $$\chi ^\alpha $$-information measures, proposed by Liese and Vajda [66], is as follows:


$$\begin{aligned} \chi ^\alpha (x) = \left\{ \begin{array}{ll} |1-x^\alpha |^{\frac{1}{\alpha }} &{} \text{ for } 0 < \alpha \le 1\\ |1-x|^{\alpha } &{} \text{ for } \alpha > 1\text{. }\\ \end{array} \right. \end{aligned}$$” src=”http://basicmedicalkey.com/wp-content/uploads/2017/05/A319338_1_En_5_Chapter_Equ13.gif”></DIV></DIV><br />
<DIV class=EquationNumber>(5.13)</DIV></DIV>For <SPAN id=IEq111 class=InlineEquation><IMG alt=, this function equals to the $$M_\alpha $$ function. The $$\chi ^\alpha $$-information and $$M_\alpha $$-information measures are, therefore, also identical for $$0 < \alpha \le 1$$. For $$\alpha > 1$$” src=”http://basicmedicalkey.com/wp-content/uploads/2017/05/A319338_1_En_5_Chapter_IEq116.gif”></SPAN>, <SPAN id=IEq117 class=InlineEquation><IMG alt=-information can be written as


$$\begin{aligned} \chi ^\alpha (P||P_1 \times P_2)=\sum _{i,j} \frac{|{p_{ij}}-{p_i}{p_j}|^\alpha }{({p_i}{p_j})^{\alpha -1}}. \end{aligned}$$

(5.14)


5.2.2.5 Renyi Distance


The Renyi distance, a measure of information of order $$\alpha $$ [56, 66], can be defined as


$$\begin{aligned} R_{\alpha }(P||P_1 \times P_2)=\frac{1}{\alpha -1}\mathrm{log}\sum _{i,j} \frac{({p_{ij}})^{\alpha }}{({p_i}{p_j})^{\alpha -1}} \end{aligned}$$
for $$\alpha \ne 0, \alpha \ne 1$$. It reaches its minimum value when $${p_{ij}}$$ and $${p_i}{p_j}$$ are identical, in which case the summation reduces to $$\sum {p_{ij}}$$. As complete probability distribution is assumed, the sum is one and the minimum value of the measure is, therefore, equal to zero. The limit of Renyi’s measure for $$\alpha $$ approaching 1 equals $$I_1(P||P_1 \times P_2)$$, which is the mutual information.


5.2.3 Discretization


In microarray gene expression data sets, the class labels of samples are represented by discrete symbols, while the expression values of genes are continuous. Hence, to measure both gene-class relevance of a gene with respect to class labels and gene-gene redundancy between two genes using information theoretic measures such as mutual information [10, 55], normalized mutual information [39], and $$f$$-information measures [43], the continuous expression values of a gene are divided into several discrete partitions. The a prior (marginal) probabilities and their joint probabilities are then calculated to compute both gene-class relevance and gene-gene redundancy using the definitions for discrete cases. In this chapter, the discretization method reported in [10, 43, 55] is employed to discretize the continuous gene expression values. The expression values of a gene are discretized using mean $$\mu $$ and standard deviation $$\sigma $$ computed over $$n$$ expression values of that gene: any value larger than $$(\mu + \sigma /2)$$ is transformed to state 1; any value between $$(\mu - \sigma /2)$$ and $$(\mu + \sigma /2)$$ is transformed to state 0; any value smaller than $$(\mu - \sigma /2)$$ is transformed to state $$-1$$. These three states correspond to the over-expression, baseline, and under-expression of genes.


5.3 Experimental Results


The performance of different $$f$$-information measures is extensively compared with that of mutual information and normalized mutual information. Based on the argumentation given in Sect. 5.2.2, the following information measures are chosen to include in the study:

1.

$$I_{\alpha }$$– and $$R_{\alpha }$$-information measures for $$\alpha \ne 0$$ and $$\alpha \ne 1$$;

 

2.

mutual information ($$I_{1.0}$$– and $$R_{1.0}$$-information);

 

3.

$$M_{\alpha }$$-information measure for $$0 < \alpha \le 1$$;

 

4.

$$\chi ^{\alpha }$$-information measure for $$\alpha > 1$$” src=”http://basicmedicalkey.com/wp-content/uploads/2017/05/A319338_1_En_5_Chapter_IEq144.gif”></SPAN>; and</DIV></DIV><br />
<DIV class=ClearBoth> </DIV></DIV><br />
<DIV class=ListItem><SPAN class=ItemNumber>5.</SPAN><br />
<DIV class=ItemContent><br />
<DIV class=Para>normalized mutual information <SPAN id=IEq145 class=InlineEquation><IMG alt=$$U$$ src=.

 
In this chapter, these measures are applied to calculate both gene-class relevance and gene-gene redundancy. The minimum redundancy-maximum relevance (mRMR) criterion [10, 55] is used for gene selection. The source code of the $$f$$-information based mRMR ($$f$$-mRMR) algorithm [43], written in C language, is available at http://​www.​isical.​ac.​in/​~bibl/​results/​fmRMR/​fmRMR.​html. All the information measures are implemented in C language and run in LINUX environment having machine configuration Pentium IV, 3.2 GHz, 1 MB cache, and 1 GB RAM.

To analyze the performance of different $$f$$-information measures, the experimentation is done on three microarray gene expression data sets. The major metric for evaluating the performance of different measures is the classification accuracy of support vector machine (SVM) [67], K-nearest neighbor (K-NN) rule [12], and naive Bayes (NB) classifier [12].


5.3.1 Gene Expression Data Sets


In this chapter, three public data sets of cancer microarrays are used. Since binary classification is a typical and fundamental issue in diagnostic and prognostic prediction of cancer, different $$f$$-information measures are compared using following binary-class data sets.


5.3.1.1 Breast Cancer Data Set


The breast cancer data set contains expression levels of 7,129 genes in 49 breast tumor samples [69]. The samples are classified according to their estrogen receptor (ER) status: 25 samples are ER positive while the other 24 samples are ER negative.


5.3.1.2 Leukemia Data Set


The leukemia data set is an Affymetrix high-density oligonucleotide array that contains 7,070 genes and 72 samples from two classes of leukemia: 47 acute lymphoblastic leukemia and 25 acute myeloid leukemia [17]. The data set is publicly available at http://​www.​broad.​mit.​edu/​cgibin/​cancer/​datasets.​cgi.


5.3.1.3 Colon Cancer Data Set


The colon cancer data set contains expression levels of 2,000 genes and 62 samples from two classes [1]: 40 tumor and 22 normal colon tissues. The data set is available at http://​microarray.​princeton.​edu/​oncology/​affydata/​index.​html.


5.3.2 Class Prediction Methods


The SVM [67], K-NN rule [12], and NB classifier [12] are used to evaluate the performance of different $$f$$-information measures. A brief introduction of the SVM is reported in Chaps. 3 and 4. In this work, linear kernels are used in the SVM to construct the nonlinear decision boundary. On the other hand, descriptions of both K-NN rule and NB classifier are reported next.


5.3.2.1 K-Nearest Neighbor Rule


The K-nearest neighbor (K-NN) rule [12] is used for evaluating the effectiveness of the reduced feature set for classification. It classifies samples based on its closest training samples in the feature space. A sample is classified by a majority vote of its K-neighbors, with the sample being assigned to the class most common amongst its K-nearest neighbors. The value of K, chosen for the K-NN, is the square root of number of samples in training set.


5.3.2.2 Naive Bayes Classifier


The naive Bayes (NB) classifier [12] is one of the oldest classifiers. It is obtained by using the Bayes rule and assuming features or variables are independent of each other given its class. For the $$j$$th sample $$x_j$$ with $$m$$ gene expression levels $$\{w_{1j}, \ldots , w_{ij}, \ldots , w_{mj}\}$$ for the $$m$$ genes, the posterior probability that $$x_j$$ belongs to class $$c$$ is


$$\begin{aligned} p(c|x_j) \propto \prod _{i=1}^m p(w_{ij}|c) \end{aligned}$$

(5.15)
where $$p(w_{ij}|c)$$ are conditional tables or conditional density estimated from training examples.


5.3.3 Performance Analysis


The experimental results on three microarray data sets are presented in Tables 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7, 5.8 and 5.9. Subsequent discussions analyze the results with respect to the prediction accuracy of the NB, SVM, and K-NN classifiers. Tables 5.1, 5.2, 5.4, 5.5, 5.7, and 5.8 provide the performance of different $$f$$-information measures using the NB and SVM, respectively, while Tables 5.3, 5.6 and 5.9 shows the results using the K-NN rule. The values of $$\alpha $$ for $$f$$-information measures investigated are 0.2, 0.5, 0.8, 1.5, 2.0, 3.0, and 4.0. Some measures resemble mutual information for $$\alpha =1.0$$ ($$I_{\alpha }$$ and $$R_{\alpha }$$) and some resemble another measure ($$M_{1.0}$$ and $$\chi ^{1.0}$$ equal $$V$$). To compute the prediction accuracy of the NB, SVM, and K-NN, the leave-one-out cross-validation is performed on each gene expression data set. The number of genes selected ranges from 2 to 50 and each data set is preprocessed by standardizing each sample to zero mean and unit variance.


Table 5.1
Performance on breast cancer data set using NB classifier

































































































































































































































































































































































































$$f$$-Information measures

Number of selected genes
 
2

5

8

10

15

20

25

30

35

40

45

50

$$I_{0.2}$$

95.9

98.0

98.0

98.0

100

100

100

100

100

100

98.0

98.0

$$I_{0.5}$$

95.9

98.0

98.0

98.0

100

100

100

98.0

98.0

98.0

98.0

98.0

$$I_{0.8}$$

95.9

100

95.9

98.0

98.0

98.0

95.9

93.9

91.8

91.8

89.8

89.8

$$I_{1.0}$$

95.9

98.0

95.9

100

98.0

93.9

93.9

89.8

87.8

87.8

87.8

87.8

$$I_{1.5}$$

95.9

98.0

95.9

93.9

93.9

91.8

91.8

89.8

85.7

83.7

83.7

81.6

$$I_{2.0}$$

95.9

95.9

95.9

93.9

91.8

91.8

91.8

87.8

87.8

83.7

83.7

81.6

$$I_{3.0}$$

95.9

95.9

95.9

93.9

91.8

91.8

89.8

87.8

87.8

83.7

83.7

81.6

$$I_{4.0}$$

95.9

95.9

95.9

91.8

91.8

89.8

87.8

87.8

85.7

83.7

81.6

81.6

$$M_{0.2}$$

85.7

95.9

95.9

98.0

100

100

100

100

98.0

98.0

98.0

98.0

$$M_{0.5}$$

95.9

98.0

98.0

98.0

100

100

100

98.0

98.0

98.0

98.0

98.0

$$M_{0.8}$$

95.9

93.9

95.9

98.0

93.9

91.8

91.8

87.8

85.7

85.7

85.7

79.6

$$M_{1.0}$$

87.8

89.8

83.7

85.7

89.8

87.8

87.8

83.7

85.7

85.7

83.7

83.7

$$\chi ^{1.5}$$

95.9

98.0

95.9

98.0

93.9

89.8

89.8

85.7

83.7

81.6

79.6

79.6

$$\chi ^{2.0}$$

95.9

95.9

95.9

93.9

91.8

91.8

91.8

87.8

87.8

83.7

83.7

81.6

$$\chi ^{3.0}$$

95.9

95.9

95.9

93.9

93.9

93.9

93.9

89.8

87.8

85.7

83.7

83.7

$$\chi ^{4.0}$$

95.9

98.0

100

95.9

95.9

93.9

93.9

89.8

85.7

85.7

85.7

85.7

$$R_{0.2}$$

95.9

98.0

98.0

98.0

100

100

100

100

100

100

98.0

98.0

$$R_{0.5}$$

95.9

98.0

98.0

98.0

100

100

100

98.0

98.0

98.0

98.0

98.0

$$R_{0.8}$$

95.9

100

95.9

95.9

98.0

98.0

95.9

93.9

91.8

91.8

89.8

89.8

$$R_{1.0}$$

95.9

98.0

95.9

100

98.0

93.9

93.9

89.8

87.8

87.8

87.8

87.8

$$R_{1.5}$$

95.9

98.0

95.9

93.9

91.8

91.8

91.8

89.8

87.8

83.7

83.7

83.7

$$R_{2.0}$$

95.9

91.8

95.9

95.9

91.8

91.8

91.8

89.8

85.7

83.7

83.7

81.6

$$R_{3.0}$$

93.9

89.8

93.9

93.9

93.9

91.8

91.8

91.8

89.8

85.7

83.7

79.6

$$R_{4.0}$$

93.9

93.9

91.8

91.8

91.8

91.8

89.8

89.8

87.8

83.7

83.7

81.6

$$U$$

95.9

98.0

98.0

100

95.9

93.9

93.9

91.8

91.8

89.8

89.8

89.8



Table 5.2
Performance on breast cancer data set using SVM

































































































































































































































































































































































































$$f$$-Information measures

Number of selected genes
 
2

5

8

10

15

20

25

30

35

40

45

50

$$I_{0.2}$$

81.6

100

95.9

98.0

98.0

100

95.9

95.9

98.0

98.0

98.0

95.9

$$I_{0.5}$$

81.6

100

100

100

95.9

95.9

100

95.9

95.9

95.9

98.0

98.0

$$I_{0.8}$$

81.6

98.0

100

100

98.0

95.9

95.9

98.0

98.0

95.9

98.0

95.9

$$I_{1.0}$$

81.6

98.0

100

100

98.0

95.9

95.9

93.9

93.9

93.9

95.9

95.9

$$I_{1.5}$$

85.7

91.8

98.0

100

98.0

100

95.9

95.9

95.9

95.9

93.9

93.9

$$I_{2.0}$$

85.7

95.9

98.0

100

100

100

95.9

95.9

95.9

93.9

93.9

93.9

$$I_{3.0}$$

85.7

95.9

98.0

100

100

95.9

95.9

95.9

95.9

95.9

93.9

93.9

$$I_{4.0}$$

85.7

89.8

100

98.0

100

95.9

95.9

95.9

95.9

95.9

95.9

95.9

$$M_{0.2}$$

77.6

95.9

91.8

89.8

87.8

93.9

93.9

95.9

95.9

95.9

95.9

98.0

$$M_{0.5}$$

81.6

100

100

100

95.9

95.9

100

95.9

95.9

95.9

98.0

98.0

$$M_{0.8}$$

85.7

89.8

93.9

89.8

93.9

95.9

93.9

93.9

93.9

91.8

93.9

93.9

$$M_{1.0}$$

83.7

81.6

87.8

91.8

87.8

83.7

83.7

83.7

85.7

83.7

87.8

85.7

$$\chi ^{1.5}$$

85.7

87.8

91.8

89.8

93.9

91.8

95.9

95.9

93.9

93.9

93.9

93.9

$$\chi ^{2.0}$$

85.7

95.9

98.0

100

100

100

95.9

95.9

95.9

93.9

93.9

93.9

$$\chi ^{3.0}$$

85.7

89.8

100

95.9

98.0

95.9

98.0

93.9

93.9

93.9

93.9

93.9

$$\chi ^{4.0}$$

85.7

91.8

100

100

98.0

95.9

95.9

95.9

95.9

95.9

95.9

95.9

$$R_{0.2}$$

81.6

100

95.9

98.0

98.0

98.0

95.9

95.9

95.9

98.0

98.0

98.0

$$R_{0.5}$$

81.6

100

100

100

95.9

95.9

100

95.9

95.9

93.9

98.0

98.0

$$R_{0.8}$$

81.6

98.0

100

100

98.0

95.9

95.9

98.0

98.0

95.9

98.0

95.9

$$R_{1.0}$$

81.6

98.0

100

100

98.0

95.9

95.9

93.9

93.9

93.9

95.9

95.9

$$R_{1.5}$$

85.7

91.8

98.0

100

98.0

100

95.9

95.9

95.9

95.9

93.9

93.9

$$R_{2.0}$$

85.7

89.8

95.9

95.9

98.0

100

95.9

95.9

95.9

95.9

93.9

93.9

$$R_{3.0}$$

87.8

87.8

100

100

93.9

95.9

93.9

95.9

95.9

95.9

95.9

95.9

$$R_{4.0}$$

87.8

89.8

89.8

93.9

98.0

100

100

98.0

98.0

95.9

95.9

93.9

$$U$$

81.6

98.0

100

100

98.0

95.9

98.0

95.9

95.9

95.9

95.9

93.9



Table 5.3
Performance on breast cancer data set using K-NN rule







































































$$f$$-Information measures

Number of selected genes
 
2

5

8

10

15

20

25

30

35

40

45

50

$$I_{0.2}$$

89.8

93.9

93.9

95.9

98.0

95.9

95.9

93.9

95.9

98.0

98.0

98.0

$$I_{0.5}$$

89.8

93.9

95.9

95.9

98.0

98.0

95.9

95.9

98.0

98.0

95.9

98.0

$$I_{0.8}$$

89.8

98.0

95.9

95.9

98.0

95.9

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 25, 2017 | Posted by in GENERAL & FAMILY MEDICINE | Comments Off on f-Information Measures for Selection of Discriminative Genes from Microarray Data

Full access? Get Clinical Tree

Get Clinical Tree app for offline access