Sunday, July 1, 2012

Fwd: TAPDANCE: An Automated tool to identify and annotate Transposon insertion CISs and associations between CISs from next generation sequence data

Fwd: please follow footer link
TAPDANCE: An Automated tool to identify and
annotate Transposon insertion CISs and associations
between CISs from next generation sequence data
: Background:
Next generation sequencing approaches applied to the analyses of transposon insertionjunction fragments generated in high throughput forward genetic screens has created the needfor clear informatics and statistical approaches to deal with the massive amount of datacurrently being generated. Previous approaches utilized to 1) map junction fragments withinthe genome and 2) identify Common Insertion Sites (CISs) within the genome are notpractical due to the volume of data generated by current sequencing technologies. Previousapproaches applied to this problem also required significant manual annotation.
Results:
We describe Transposon Annotation Poisson Distribution Association Network ConnectivityEnvironment (TAPDANCE) software, which automates the identification of CISs withintransposon junction fragment insertion data. Starting with barcoded sequence data, thesoftware identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied toassess and rank genomic regions showing significant enrichment for transposon insertion.Novel methods of counting insertions are used to ensure that the results presented have theexpected characteristics of informative CISs. A persistent mySQL database is generated andutilized to keep track of sequences, mappings and common insertion sites. Additionally,associations between phenotypes and CISs are also identified using Fisher's exact test withmultiple testing correction. In a case study using previously published data we show that theTAPDANCE software identifies CISs as previously described, prioritizes them based on pvalue,allows holistic visualization of the data within genome browser software and identifiesrelationships present in the structure of the data.
Conclusions:
The TAPDANCE process is fully automated, performs similarly to previous labor intensiveapproaches, provides consistent results at a wide range of sequence sampling depth, has thecapability of handling extremely large datasets, enables meaningful comparison acrossdatasets and enables large scale meta-analyses of junction fragment data. The TAPDANCEsoftware will greatly enhance our ability to analyze these datasets in order to increase ourunderstanding of the genetic basis of cancers.

(Original Post: BMC Bioinformatics - Latest articles.)