Sunday, January 23, 2011

NEW: bedtools 2.11.0 (check the original page for more info)

bedtools - Project Hosting on Google Code: "Please cite the following article if you use BEDTools in your research:

Quinlan, AR and Hall, IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.
Latest news (Version 2.11.0, 21-January-2011)

Support for zero length features (i.e., start = end) such as insertions in the reference genome.
Both 8 and 9 column GFF files are now supported.
slopBed can now extend the size of features by a percentage of it's size (-pct) instead of just a fixed number of bases.

Two improvements to shuffleBed:
The -f (overlapFraction) parameter that defines the maximum overlap that a randomized feature can have with an -excl feature. That is, if a chosen locus has more than -f overlap with an -excl feature, a new locus is sought.
A new -incl option (thanks to Michael Hoffman and Davide Cittaro) that, defines intervals in which the randomized features should be placed. This is used instead of placing the features randomly in the genome. Note that a genome file is still required so that a randomized feature does not go beyond the end of a chromosome.
bamToBed can now optionally report the CIGAR string as an additional field.
pairToPair can now report the entire paired feature from the B file when overlaps are found.
complementBed now reports all chromosomes, not just those with features in the BED file.
Improved randomization seeding in shuffleBed. This prevents identical output for runs of shuffleBed that occur in the same second (often the case).

New annotateBed tool that annotates one BED/VCF/GFF file with the coverage and number of overlaps observed from multiple other BED/VCF/GFF files. In this way, it allows one to ask to what degree one feature coincides with multiple other feature types with a single command.

New unionBedGraphs tool that combines multiple BEDGRAPH files into a single file such that one can compare coverage (and other text-values) across multiple samples
Support for writing uncompressed BAM output with the -ubam option.

New 'distance feature' (-d) added to closestBed. In addition to finding the closest feature to each feature in A, the -d option will report the distance to the closest feature in B. Overlapping features have a distance of 0.
New 'per base depth feature' (-d) added to coverageBed. This reports the per base coverage (1-based) of each feature in file B based on the coverage of features found in file A. For example, this could report the per-base depth of sequencing reads (-a) across each capture target (-b).

Useful new groupBy tool. This is a very useful new utility that mimics the 'groupBy' clause in database systems. Given a file or stream that is sorted by the appropriate 'grouping columns', groupBy will compute multiple statistics/operations on other columns in the file or stream. This will work with output from all BEDTools as well as any other tab-delimited file or stream. Please see the help for the tools for examples.

New freqdesc and freqasc operations for groupBy. Computes histograms of the values observed in a column in a file or stream.
Native, 'mix and match' support for BED, GFF, VCF (v4.0), BAM, and BEDPE files. All input files can be 'gzipped'; such files are auto-detected.

Proper support for 'split' BAM alignments and 'blocked' BED (aka BED12) features. By using the '-split' option, intersectBed, coverageBed, genomeCoverageBed, and bamToBed will now"

