Friday, July 4, 2014

Fwd: Gene set analysis: limitations in popular existing methods and proposed improvements

Fwd: please follow footer link

Gene set analysis: limitations in popular existing methods and proposed improvements: "

Motivation: Gene set analysis is the analysis of a set of genes that collectively contribute to a biological process. Most popular gene set analysis methods are based on empirical P-value that requires large number of permutations. Despite numerous gene set analysis methods developed in the past decade, the most popular methods still suffer from serious limitations.

Results: We present a gene set analysis method (mGSZ) based on Gene Set Z-scoring function (GSZ) and asymptotic P-values. Asymptotic P-value calculation requires fewer permutations, and thus speeds up the gene set analysis process. We compare the GSZ-scoring function with seven popular gene set scoring functions and show that GSZ stands out as the best scoring function. In addition, we show improved performance of the GSA method when the max-mean statistics is replaced by the GSZ scoring function. We demonstrate the importance of both gene and sample permutations by showing the consequences in the absence of one or the other. A comparison of asymptotic and empirical methods of P-value estimation demonstrates a clear advantage of asymptotic P-value over empirical P-value. We show that mGSZ outperforms the state-of-the-art methods based on two different evaluations. We compared mGSZ results with permutation and rotation tests and show that rotation does not improve our asymptotic P-values. We also propose well-known asymptotic distribution models for three of the compared methods.

Availability and implementation: mGSZ is available as R package from

Supplementary information: Available at


(Via Bioinformatics - Advance Access.)