Friday, October 24, 2014

Fwd: Error estimates for the analysis of differential expression from RNA-seq count data

Fwd: please follow footer link

Error estimates for the analysis of differential expression from RNA-seq count data: "

A number of algorithms exist for analysing RNA-sequencing data to infer profiles of differential gene expression. Problems inherent in building algorithms around statistical models of over dispersed count data are formidable and frequently lead to non-uniform p-value distributions for null-hypothesis data and to inaccurate estimates of false discovery rates (FDRs). This can lead to an inaccurate measure of significance and loss of power to detect differential expression.

Researchers from the Australian National University use synthetic and real biological data to assess the ability of several available R packages to accurately estimate FDRs. The packages surveyed are based on statistical models of overdispersed Poisson data and include edgeR, DESeq, DESeq2, PoissonSeq and QuasiSeq. Also tested is an add-on package to edgeR and DESeq which they introduce called Polyfit. Polyfit aims to address the problem of a non-uniform null p-value distribution for two-class datasets by adapting the Storey-Tibshirani procedure.

The researchers  find the best performing package in the sense that it achieves a low FDR which is accurately estimated over the full range of p-values, albeit with a very slow run time, is the QLSpline implementation of QuasiSeq. This finding holds provided the number of biological replicates in each condition is at least 4. The next best performing packages are edgeR and DESeq2. When the number of biological replicates is sufficiently high, and within a range accessible to multiplexed experimental designs, the Polyfit extension improves the performance DESeq (for approximately 6 or more replicates per condition), making its performance comparable with that of edgeR and DESeq2 in our tests with synthetic data.

rna-seq

Summary of performance of the packages edgeR, DESeq and their Polyfit extensions in estimating the FDR for genes out to a significance point corresponding to half the number of truly DE genes.

Availability - Polyfit can be downloaded from https://github.com/cjb105/Polyfit.


Burden CJ, Qureshi SE, Wilson SR. (2014) Error estimates for the analysis of differential expression from RNA-seq count data. PeerJ 2:e576. [article]

Error estimates for the analysis of differential expression from RNA-seq count data is a post from: RNA-Seq Blog

"

(Via RNA-Seq Blog.)