Tuesday 12 October 2010

Discarding quality scores for small RNA analysis.

Going through CLCbio's Small RNA analysis using Illumina data.
got to this part
Make sure the Discard quality scores and Discard read names checkboxes are checked. Information about quality scores and read names are not used
in this analysis anyway, so it will just take up disk space when importing the data.

which led me thinking. The reads are short to begin with. I would expect more information is always better. But in some cases, I guess having a 2nd metric is confusing, when there's
1)sequencing error
2)bona fide SNP
3)relatively low quality scores anyway. (how would one weight the seq quality in a fair way)


I believe CLCbio uses BWT for index and compression of the genomes to be searched, I am curious how they differ from BWA and Bowtie though.

No comments:

Post a Comment

Datanami, Woe be me