Monday, July 25, 2011

Variant calling comparison CASAVA1.8 and GATK

Variant calling comparison CASAVA1.8 and GATK: This work aims at addressing the question of whether the new CASAVA1.8, which boasts improvements such as local realignments of reads, is at par with the well accepted pipeline of BWA mapping, duplicate removal, local realignment, re-calibration and variant calling using GATK. We therefore compare the two methods on chromosome 21 of a Yoruba trio and compare the results to the genotype identified by the 1000 genomes project. We find that the mapping performance is the same for CASAVA1.8 and the academic pipeline, resulting in a mean coverage of about 22. CASAVA1.8 and GATK both call about 70.000 SNPs per individual of which 80% overlap between CASAVA1.8, GATK and the 1000 genomes project. This stands in contrast to the indel calling performance where CASAVA1.8 calls about 12,000 indels while GATK calls 16,000. Furthermore, CASAVA1.8 has a higher Mendelian error rate and frequently more than one alternative allele per locus indicating a non-optimal alignment. We conclude that CASAVA1.8 has come a long way and can be considered a mature SNP calling approach. However, CASAVA1.8 does not deliver the same quality in the indel calling set compared to the newly incorporated Dindel-algorithm of GATK. It hence remains the best practice to use CASAVA1.8 for producing fastq files and switch at this stage to the academic tools for mapping, alignment improvement and variant calling.

(Via Browsing Bioinformatics : Nature Precedings.)