Saturday, January 30, 2010

BEDTools: A flexible suite of utilities for comparing genomic features

Motivation: Testing for correlations between different sets of genomic features is a fundamental task in genomics research. However, searching for overlaps between features with existing web-based methods is complicated by the massive datasets that are routinely produced with current sequencing technologies. Fast and flexible tools are therefore required to ask complex questions of these data in an efficient manner.

Results: This article introduces a new software suite for the comparison, manipulation, and annotation of genomic features in BED and GFF format. BEDTools also supports the comparison of sequence alignments in BAM format to both BED and GFF features. The tools are extremely efficient and allow the user to compare large datasets (e.g., next-generation sequencing data) with both public and custom genome annotation tracks. BEDTools can be combined with one another as well as with standard UNIX commands, thus facilitating routine genomics tasks as well as pipelines that can quickly answer intricate questions of large genomic datasets.

Availability and implementation: BEDTools was written in C++. Source code and a comprehensive user manual are freely available at http://code.google.com/p/bedtools.

Supplementary information: An additional figure is available online at Bioinformatics.

list of implemented ===BEDTools=== (2010-01-30)
  • intersectBed Returns overlaps between two BED files.
  • pairToBed Returns overlaps between a paired-end BED file and a regular BED file.
  • pairToPair Returns overlaps between two paired-end BED files.
  • bamToBed Converts alignments in BAM format to BED or BEDPE format
  • windowBed Returns overlaps between two BED files within a user-defined window.
  • closestBed Returns the closest feature to each entry in a BED file.
  • subtractBed Removes the portion of an interval that is overlapped by another feature.
  • mergeBed Merges overlapping features into a single feature.
  • coverageBed Summarizes the depth and breadth of coverage of features in one BED file versus genomic intervals defined in another.
  • genomeCoverageBed Creates either a histogram or a "per base" report of genome coverage.
  • fastaFromBed Creates FASTA sequences from intervals define in a BED file.
  • maskFastaFromBed Mask a fasta file based on BED coordinates.
  • shuffleBed Randomly permute the locations of a BED (-i) file among a genome.
  • slopBed Adjust each BED entry by a requested number of base pairs.
  • sortBed Sorts a BED file by genomic position or size.
  • linksBed Creates an HTML file of links to the UCSC or a custom browser.
  • complementBed Returns all genomic intervals not spanned by the features in a BED file.
(this Post content was reproduced from: http://bioinformatics.oxfordjournals.org/cgi/content/short/btq033v1?rss=1, Via Bioinformatics - Advance Access.)