Saturday, April 30, 2011

Association studies for next-generation sequencing [METHODS]

Association studies for next-generation sequencing [METHODS]:

Genome-wide association studies (GWAS) have become the primary approach for identifying genes with common variants influencing complex diseases. Despite considerable progress, the common variations identified by GWAS account for only a small fraction of disease heritability, and are unlikely to explain the majority of phenotypic variations of common diseases. A potential source of the missing heritability is the contribution of rare variants. Next-generation sequencing technologies will detect millions of novel rare variants, but these technologies have three defining features: identification of large number of rare variants, a high proportion of sequence errors, and large proportion of missing data. These features raise challenges for testing the association of rare variants with phenotypes of interest. In this report, we use a genome continuum model and functional principal components as a general principle for developing novel and powerful association analysis methods designed for resequencing data. We use simulations to calculate the type I error rates and the power of six alternative statistics: two functional principal component analysis (FPCA)-based statistics, the generalized T2, Collapsing method, CMC method and individual test. We also examined the impact of sequence errors on their type I error rates. Finally, we apply the six statistics to published resequencing dataset from ANGPTL4 in the Dallas Heart Study. We report that FPCA-based statistics have higher power to detect association of rare variants and stronger ability to filter sequence errors than the other four methods. Our works represent a shift of the current single marker association analysis paradigm to sequence-based association analysis.

(Via Genome research (advanced).)