Overcoming Adverse Effects of Correlations in Microarray Data Analysis

Linlin Chen, Haiyan Su

Abstract


Due to the existence of the strong correlation between microarray gene expression levels, procedures which are commonly used to select the significant genes between two or more phenotypes cannot overcome the main problems: high instability of the number of false discoveries and low power. It may be impossible to completely understand these correlations due to the complexity of the biology nature. Gordon et al. [1] proposed a new multiple testing procedure to balance type I and II errors in an optimal way. However, the correlation structure of microarray data is still the main obstacle standing in the way of various gene selection procedures. To remove this obstacle, we improved the statistical methodology by exploiting the properties associated with the low dependency of the so-called delta-sequence proposed in Klebanov et al. [4]. Our study showed a similar behavior has been observed that both the mean and the standard deviation of the number of false positives are monotonically decreasing as a function of the threshold parameter. In addition, working with pairs, we have substantial reduction in both numbers, which means we gain power and stability in our new study.


Keywords


Correlation Structure; Microarray Gene Expresssion Data Analysis; Resampling

Full Text:

PDF

References


A. Gordon, L. Chen, G. Glazko and A. Yakovlev, Balancing type one and two errors in multiple testing for differential expression of genes, Computational Statistics & Data Analysis 53(5) (2009), 1622 – 1629, DOI: 10.1016/j.csda.2008.04.010.

L. Klebanov and A. Yakovlev, Diverse correlation structures in microarray gene expression data and their utility in improving statistical inference, Annals of Applied Statistics 1(2) (2007), 538 – 559, DOI: 10.1214/07-AOAS120.

L. Klebanov and A. Yakovlev, How high is the level of technical noise in micro array data?, Biology Direct 2 (2007), Article 0, DOI: 10.1186/1745-6150-2-9.

L. Klebanov, C. Jordan and A. Yakovlev, A new type of stochastic dependence revealed in gene expression data, Statistical Applications in Genetics and Molecular Biology 5(1) (2006), Article 7, DOI: 10.2202/1544-6115.1189.

L. Klebanov, L. Chen and A. Yakovlev, Revisiting adverse effects of cross-hybridization in Affymetrix gene expression data: Do they matter for correlation analysis?, Biology Direct 2(28) (2007), DOI: 10.1186/1745-6150-2-28.

M. J. Okoniewski and C. J. Miller, Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations, BMC Bioinformatics 7 (2006), Article 276, DOI: 10.1186/1471-2105-7-276.

A. Ploner, L. D. Miller, P. Hall, J. Bergh and Y. Pawitan, Correlation test to assess low-level processing of high-density oligonucleotide microarray data, BMC Bioinformatics 6 (2005), Article 80, DOI: 10.1186/1471-2105-6-80.

X. Qiu and A. Yakovlev, Comments on probabilistic models behind the concept of false discovery rate, J. Bioinformatics and Comput. Biol. 4 (2007), 963 – 975, DOI: 10.1142/S0219720007002965.

X. Qiu, L. Klebanov and A. Y. Yakovlev, Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes, Statistical Applications in Genetics and Molecular Biology 4 (2005), Article 34, DOI: 10.2202/1544-6115.1157.

E. H. Yeoh, M. E. Ross, S. A. Shurtleff, W. K. Williams, D. Patel, R. Mahfouz, F. G. Behm, S. C. Raimondi, M. V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C. H. Pui, W. E. Evans, C. Naeve, L. Wong and J. R. Downing, Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell 1(2) (2002), 133 – 143, DOI: 10.1016/S1535-6108(02)00032-6.




DOI: http://dx.doi.org/10.26713%2Fjims.v11i2.1214

eISSN 0975-5748; pISSN 0974-875X