Friday, October 10, 2014

Fwd: [Samtools-help] Picard Release 1.122

from a post by G. Grant

Fwd: please follow footer link

Picard Release 1.122
8 October 2014

- New Command Line Program "GenotypeConcordance"
-- Calculates the concordance between genotype data for two samples in two different VCFs - one being considered the truth (or reference) the other being considered the call. The concordance is broken into separate results sections for SNPs and indels. Summary and detailed statistics are reported.
Note that for any pair of variants to compare, only the alleles for the samples under interrogation are considered and MNP, Symbolic, and Mixed classes of variants are not included.

- New Command Line Program "UpdateVcfDictionary"
-- Updates the sequence dictionary of a VCF from another file (SAM, BAM, VCF, dictionary, interval_list, fasta, etc).

- New Command Line Program "VcfToIntervalList"
-- Create an interval list from a VCF

- New Command Line Program "MarkDuplicatesWithMateCigar"
-- A new tool with which to mark duplicates:
This tool can replace MarkDuplicates if the input SAM/BAM has Mate CIGAR (MC) optional tags
pre-computed (see the tools RevertOriginalBaseQualitiesAndAddMateCigar and
FixMateInformation). This allows the new tool to perform a streaming duplicate
marking routine (i.e. a single-pass). This tool cannot be used with
alignments that have large gaps or reference skips, which happens
frequently in RNA-seq data.

There were many refactors of the old MarkDuplicates and
MarkDuplicatesWithMateCigar, since the share common code.
EstimateLibraryComplexity was caught up in this too.

Many, many, many unit tests were added to were added to prove
equivalency of MarkDuplicatesWithMateCigar to MarkDuplicates. This also
exposed a few one in a million corner cases in MarkDuplicates both in
duplicate marking as well as optical duplicate detection. This results
in MarkDuplicates needing to write slightly larger temporary files when
running. SamFileTester was also improved to handle the various test
cases for duplicate marking testing.

- Updates to IntervalList:
-- Added capacity to create a simple interval list from a string (the name of the contig)
-- Added the capacity to subtract one interval list from another (currently
it would only work if they were both wrapped inside a container)

- Updates to SamLocusIterator
-- Performance optimizations gaining about 35% speed up...

- Updates to MarkDuplicates:
-- Removed unnecessary storage of a string in the Read Ends in Mark
-- Clarifed the size of ReadEndsForMarkDuplicates

- Updated the minimum number of times that the BAIT_INTERVALS (in CalculateHsMetrics) and TARGET_INTERVALS (in CollectTargetedMetrics) must be set to one.

- Moved CollectHiSeqPfFailMetrics into picard public

- Updates to documentation generation (internal):
-- changed link to IntervalList.java documentation
-- updated how _includes/command-line-usage.html is generated

- Moved SAMSequenceDictionaryExtractor and tests from picard to htsjdk

- George"

(Via Samtools-help mailing list.)