Saturday, May 22, 2010

Inferring combined CNV/SNP haplotypes from genotype data

Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate.


Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets.


Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin


Contact: l.coin@imperial.ac.uk


Supplementary information: Supplementary data are available at Bioinformatics online.


(this Post content was reproduced from: http://bioinformatics.oxfordjournals.org/cgi/content/short/26/11/1437?rss=1, Via Bioinformatics - current issue.)