Friday, April 30, 2010

The Genome Analysis Toolkit - BROAD:GATK

Screen shot 2010-04-30 at 10.22.10 AM.png
I was directed recently to this 'place', maybe a nice platform to integrate existing NGS and build upon.
What is the GATK?

The GATK is a structured software library that makes writing efficient analysis tools using next-generation sequencing data very easy, and second it's a suite of tools for working with human medical resequencing projects such as 1000 Genomes and The Cancer Genome Atlas. These tools include things like a depth of coverage analyzers, a quality score recalibrator, a SNP/indel caller and a local realigner.
We aim to work well with both samtools and Picard by providing complementary tools to those available in those two packages. Our SNP calling pipeline (Q score recalibration -> multiple sequence realignment -> snp/index calling) is a particular area of focus, and have been pushing to make these capabilities as general-purpose and powerful as possible. My group's mandate is to ensure the success of the human medical resequencing projects we've undertaken at the Broad over the next 2-3 years, which involves providing a robust, production-quality development library that underlies tools for common analysis problems (like SNP calling) as well as enabling exploratory research on NGS data.
Take a look at File:CBBO 100709 v3.pptx.pdf to view a presentation that provides an introduction to some of the capabilities of the GATK and its application to the 1000 Genomes project.

