Saturday, May 22, 2010

The Genomedata format for storing large-scale functional genomics data

Summary: We present a format for efficient storage of multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. We have also developed utilities to load data into this format. We show that retrieving data from this format is more than 2900 times faster than a naive approach using wiggle files.


Availability and Implementation: Reference implementation in Python and C components available at http://noble.gs.washington.edu/proj/genomedata/ under the GNU General Public License.


Contact: william-noble@uw.edu


(this Post content was reproduced from: http://bioinformatics.oxfordjournals.org/cgi/content/short/26/11/1458?rss=1, Via Bioinformatics - current issue.)