Kevin's GATTACA World: ConDeTri - A Content Dependent Read Trimmer for Illumina Data. and RAPSearch2: a fast and memory-efficient protein similarity search tool for next generation sequencing data.

Tuesday 8 November 2011

ConDeTri - A Content Dependent Read Trimmer for Illumina Data. and RAPSearch2: a fast and memory-efficient protein similarity search tool for next generation sequencing data.

1.	Proteomic Analysis of Excretory-Secretory Products of Heligmosomoides polygyrus Assessed with Next-Generation Sequencing Transcriptomic Information.
	Moreno Y, Gros PP, Tam M, Segura M, Valanparambil R, Geary TG, Stevenson MM.
	PLoS Negl Trop Dis. 2011 Oct;5(10):e1370. Epub 2011 Oct 25.
	PMID: 22039562 [PubMed - in process]
	Related citations

2.	ConDeTri - A Content Dependent Read Trimmer for Illumina Data.
	Smeds L, Künstner A.
	PLoS One. 2011;6(10):e26314. Epub 2011 Oct 19.
	PMID: 22039460 [PubMed - in process]
	Related citations Abstract During the last few years, DNA and RNA sequencing have started to play an increasingly important role in biological and medical applications, especially due to the greater amount of sequencing data yielded from the new sequencing machines and the enormous decrease in sequencing costs. Particularly, Illumina/Solexa sequencing has had an increasing impact on gathering data from model and non-model organisms. However, accurate and easy to use tools for quality filtering have not yet been established. We present ConDeTri, a method for content dependent read trimming for next generation sequencing data using quality scores of each individual base. The main focus of the method is to remove sequencing errors from reads so that sequencing reads can be standardized. Another aspect of the method is to incorporate read trimming in next-generation sequencing data processing and analysis pipelines. It can process single-end and paired-end sequence data of arbitrary length and it is independent from sequencing coverage and user interaction. ConDeTri is able to trim and remove reads with low quality scores to save computational time and memory usage during de novo assemblies. Low coverage or large genome sequencing projects will especially gain from trimming reads. The method can easily be incorporated into preprocessing and analysis pipelines for Illumina data. AVAILABILITY AND IMPLEMENTATION: Freely available on the web at http://code.google.com/p/condetri.

3.	Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next generation sequence data.
	Gusnanto A, Wood HM, Pawitan Y, Rabbitts P, Berri S.
	Bioinformatics. 2011 Oct 28. [Epub ahead of print]
	PMID: 22039209 [PubMed - as supplied by publisher]
	Related citations

4.	RAPSearch2: a fast and memory-efficient protein similarity search tool for next generation sequencing data.
	Zhao Y, Tang H, Ye Y.
	Bioinformatics. 2011 Oct 28. [Epub ahead of print]
	PMID: 22039206 [PubMed - as supplied by publisher]
	Related citations Abstract SUMMARY: With the wide application of next generation sequencing (NGS) techniques, fast tools for protein similarity search that scale well to large query datasets and large databases are highly desirable. In a previous work, we developed RAPSearch, an algorithm that achieved a ~20-90 fold speedup relative to BLAST while still achieving similar levels of sensitivity for short protein fragments derived from NGS data. RAPSearch, however, requires a substantial memory footprint to identify alignment seeds, due to its use of a suffix array data structure. Here we present RAPSearch2, a new memory-efficient implementation of the RAPSearch algorithm that uses a collision-free hash table to index a similarity search database. The utilization of an optimized data structure further speeds up the similarity search-another 2-3 times. We also implemented multi-threading in RAPSearch2, and the multi-thread modes achieve significant acceleration (e.g., 3.5X for 4-thread mode). RAPSearch2 requires up to 2G memory when running in single thread mode, or up to 3.5G memory when running in 4-thread mode.Implementation and AVAILABILITY: Implemented in C++, the source code is freely available for download at the RAPSearch2 website: http://omics.informatics.indiana.edu/mg/RAPSearch2/.

5.	Genetic diagnosis of neuroacanthocytosis disorders using exome sequencing.
	Walker RH, Schulz VP, Tikhonova IR, Mahajan MC, Mane S, Arroyo Muniz M, Gallagher PG.
	Mov Disord. 2011 Oct 28. doi: 10.1002/mds.24020. [Epub ahead of print]
	PMID: 22038564 [PubMed - as supplied by publisher]
	Related citations Abstract BACKGROUND: Neuroacanthocytoses are neurodegenerative disorders marked by phenotypic and genetic heterogeneity. There are several associated genetic loci, and many defects, including gene deletions and insertions, and missense, nonsense, and splicing mutations, have been found spread over hundreds of kilobases of genomic DNA. In some cases, specific diagnosis is unclear, particularly in the early stages of disease or when there is an atypical presentation. Determination of the precise genetic defect allows assignment of the diagnosis and permits carrier detection and genetic counseling. The objective of this report was to utilize exome sequencing for genetic diagnosis in the neuroacanthocytosis syndromes. METHODS: Genomic DNA from 2 patients with clinical features of chorea-acanthocytosis was subjected to targeted exon capture. Captured DNA was subjected to ultrahigh throughput next-generation sequencing. Sequencing data were assembled, filtered against known human variant genetic databases, and results were analyzed. RESULTS: Both patients were compound heterozygotes for mutations in the VPS13A gene, the gene associated with chorea-acanthocytosis. Patient 1 had a 4-bp deletion that removes the 5' donor splice site of exon 58 and a nucleotide substitution that disrupts the 5' donor splice site of exon 70. Patient 2 had a dinucleotide deletion in exon 16 and a dinucleotide insertion in exon 33. No mutations were identified in the XK, PANK2, or JPH3 gene loci. CONCLUSIONS: Exome sequencing is a valuable diagnostic tool in the neuroacanthocytosis syndromes. These studies may provide a better understanding of the function of the associated proteins and provide insight into the pathogenesis of these disorders. © 2011 Movement Disorder Society.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)