Thursday, August 26, 2010

bedtools 2.9.0 - great new commands!!


Latest news (Version 2.9.0, 16-August-2010)

  • New unionBedGraphs tool that combines multiple BEDGRAPH files into a single file such that one can compare coverage (and other text-values) across multiple samples
  • New "distance feature" (-d) added to closestBed. In addition to finding the closest feature to each feature in A, the -d option will report the distance to the closest feature in B. Overlapping features have a distance of 0.
  • New "per base depth feature" (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b).
  • Useful new groupBy tool. This is a very useful new utility that mimics the "groupBy" clause in database systems. Given a file or stream that is sorted by the appropriate "grouping columns", groupBy will compute multiple statistics/operations on other columns in the file or stream. This will work with output from all BEDTools as well as any other tab-delimited file or stream. Please see the help for the tools for examples.
  • Native, "mix and match" support for BED, GFF, VCF (v4.0), BAM, and BEDPE files. All input files can be "gzipped"; such files are auto-detected.
  • Proper support for "split" BAM alignments and "blocked" BED (aka BED12) features. By using the "-split" option, intersectBed, coverageBed, genomeCoverageBed, and bamToBed will now correctly compute overlaps/coverage solely for the "split" portions of BAM alignments or the "blocks" of BED12 features such as genes.
  • New "-tag" option in bamToBed. Allows one to choose a numeric BAM tag to be used to populate the score field. For example, one could populate the score field with the alignment score with "-tag AS".

