Tuesday, June 25, 2013

Fwd: The Human Genome Contracts Again

Fwd: please follow footer link

The Human Genome Contracts Again: "

Summary: The number of human genomes that have been sequenced completely for different individuals has increased rapidly in recent years. Storing and transferring complete genomes between computers for the purpose of applying various applications and analysis tools will soon become a major hurdle, hindering the analysis phase. Therefore there is a growing need to compress this data efficiently. Here we describe a technique to compress human genomes based on entropy-coding, utilizing a reference genome and known SNPs (Single Nucleotide Polymorphisms) retrieved from dbSNP [1]. Furthermore, we explore several intrinsic features of genomes as well as information in other genomic databases to further improve the compression attained. Using these methods we compress James Watson's genome to 2.5MB, improving upon recent work by 37%. Similar compression is obtained for most genomes available from the 1000 genomes project [2]. Our biologically-inspired techniques promise even greater gains for genomes of lower organisms, and for human genomes as more genomic data becomes available.

Availability: code is available at biozon.org/software/GenomeZip/

Contact:dmitrip@stanford.edu, tsachy@stanford.edu, golan.yona@stanford.edu

Supplemental Information: see biozon.org/software/GenomeZip/


(Via Bioinformatics - Advance Access.)