Saturday, December 12, 2009

deepBase: a database for deeply annotating and mining deep sequencing data

Advances in high-throughput next-generation sequencing technology have reshaped the transcriptomic research landscape. However, exploration of these massive data remains a daunting challenge. In this study, we describe a novel database, deepBase, which we have developed to facilitate the comprehensive annotation and discovery of small RNAs from transcriptomic data. The current release of deepBase contains deep sequencing data from 185 small RNA libraries from diverse tissues and cell lines of seven organisms: human, mouse, chicken, Ciona intestinalis, Drosophila melanogaster, Caenhorhabditis elegans and Arabidopsis thaliana. By analyzing ~14.6 million unique reads that perfectly mapped to more than 284 million genomic loci, we annotated and identified ~380 000 unique ncRNA-associated small RNAs (nasRNAs), ~1.5 million unique promoter-associated small RNAs (pasRNAs), ~4.0 million unique exon-associated small RNAs (easRNAs) and ~6 million unique repeat-associated small RNAs (rasRNAs). Furthermore, 2038 miRNA and 1889 snoRNA candidates were predicted by miRDeep and snoSeeker. All of the mapped reads can be grouped into about 1.2 million RNA clusters. For the purpose of comparative analysis, deepBase provides an integrative, interactive and versatile display. A convenient search option, related publications and other useful information are also provided for further investigation. deepBase is available at:

(source URL, Via NAR - Advance Access.)