Saturday, November 26, 2011

Fwd: finding reference sequences and interpretation of data from RepeatMasker UCSC tables - BioStar

Important question raised + quite interesting progress report so far.
Fwd: please follow footer link

I (Casey Bergman) downloaded the RepeatMasker track for the mm9 genome from the Tables section of the UCSC genome browser. I get entries like these:

#bin swScore milliDiv milliDel milliIns genoName genoStart genoEnd genoLeft strand repName repClass repFamily repStart repEnd repLeft id
607 687 174 0 0 chr1 3000001 3000156 -194195276 - L1_Mur2 LINE L1 -4311567 1413 1
My question is: how can I know what is the reference sequence that this particular genomic location was aligned to? I understand that the Smith-Waterman alignment score is a result of aligning this piece of the genome to the reference, but it is the actual reference sequence that of the particular repeat that I'm trying to find. How can this be accessed?

Also, are the coordinates repStart, repEnd, repLeft in the coordinate space of the reference or of the genome? It sounds to me from googling that it is the former, but in that case it seems impossible to interpret without having the reference sequence -- we don't know how long it is, for example, just by looking at this table, right?

Finally, I was hoping someone can explain what the milliDiv, milliIns, and milliDev fields are and what those units mean.
...



(Original Post by Casey Bergman: finding reference sequences and interpretation of data from RepeatMasker UCSC tables - BioStar.)