Sunday, November 20, 2011

Fwd: Now available: data for CNV and SV baseline genome set

Now available: data for CNV and SV baseline genome set:

Hi everyone,

Many of you have been asking us to make available the CNV baseline set that is used for normalization of CNV data.  We are happy to report that we are now making this available.  You can download the composite files for the 52 genomes that make up the baseline set here:  At the same location, you will find documentation that describes the file content and format. 

 The CNV baseline set data may be useful for a number of reasons, including:

  • Manual review of patterns of coverage in the baseline genomes at the site of a CNV call (or lack thereof) may be useful in deciding whether the call, or underlying data, is trustworthy; for instance, the median scaled GC-corrected or normalized data can be used to create a multi-genome coverage plot.
  • Regeneration of normalized coverage on older Complete Genomics datasets.
  • Development of an alternative normalization profile, perhaps as input to an external CNV-calling tool.
  • Identification of regions where coverage is artificially low due to matching a high-copy repeat, not resolved by the Complete Genomics mapping process

 In addition, we are also providing the SV baseline composite files that are used to annotate each reported junction with frequency of detection in baseline set.  These files, and the documentation that describes their content and format, can be found here:

 I hope you will find these resources helpful for your CNV and SV analysis!  Please let us know if you have any questions or feedback. 


Pam Tangvoranuntakul, PhD
Senior Product Manager, Analysis Pipeline

(Original Post: CGI user forum.)