Wednesday 29 February 2012

Translational Genomics Research Institute, personalised genomics to improve chemotherapy, cloud computing for pediatric cancer


I think it's fantastic that this is happening right now. Given that the cost of sequencing and computing is still relatively high, I can see how the first wave of personalized medicine will be lead by non-profit organizations. I am personally curious how this might pan out and would this be cost-effective for the patients ultimately? Would they be able to quantify it? 
Kudos for Dell for being a part of this exercise, though I wondered if they could have donated more to the data center or alternatively setup a mega cloud center and donate compute resources instead. Since i think the infrastructure and knowledge gleaned will be useful for their marketing and sales. 




http://www.hpcinthecloud.com/hpccloud/2012-02-29/cloud_computing_helps_fight_pediatric_cancer.html

Cloud technology is being used to speed computation, as well as manage and store the resulting data. Cloud also enables the high degree of collaboration that is necessary for science research at this level. The scientists have video-conferences where they work off of "tumor boards" to make clinical decisions for the patients in real-time. Before they'd have to ship hard drives to each other to have that degree of collaboration and now the data is always accessible through the cloud platform.


"We expect to change the way that the clinical medicine is delivered to pediatric cancer patients, and none of this could be done without the cloud," Coffin says emphatically. "With 12 cancer centers collaborating, you have to have the cloud to exchange the data."


Dell relied on donations to build the initial 8.2 teraflop high-performance machine. A second round of donations has meant a doubling in resources for this important work, up to an estimated 13 teraflops of sustained performance.


"Expanding on the size of the footprint means we can treat more and more patients in the clinic trial so this is an exciting time for us. This is the first pediatric clinic trial using genomic data ever done. And Dell is at the leading edge driving this work from an HPC standpoint and from a science standpoint."


The donated platform is comprised of Dell PowerEdge Blade Servers, PowerVault Storage Arrays, Dell Compellent Storage Center arrays and Dell Force10 Network infrastructure. It features 148 CPUs, 1,192 cores, 7.1 TB of RAM, and 265 TB Disk (Data Storage). Dell Precision Workstations are available for data analysis and review. TGen's computation and collaboration capacity has increased by 1,200 percent compared to the site's previous clinical cluster. In addition, the new system has reduced tumor mapping and analysis time from a matter of months to days.

Tuesday 28 February 2012

SPE, 32 bit wxPython, Python on Mac

Jumped a couple of hoops to install SPE my favourite python IDE in Mac,
Python 2.7 ships with Lion (yeah!)

I need to install wxPython but only 32 bit is avail due to Carbon API ( boo )

the nice thing is that you can set an env to make it work
so my bash script for starting SPE is

#installed 32 bit version of wxPython
export VERSIONER_PYTHON_PREFER_32_BIT=yes
python _spe/SPE.py




It works! 

SPE v0.8.4.i (c)2003-2008 www.stani.be
If spe fails to start:
 - type "pythonw SPE.py --debug > debug.txt 2>&1" at the command prompt
   (or if you use tcsh: "pythonw SPE.py --debug >& debug.txt")
 - send debug.txt with some info to spe.stani.be[at]gmail.com




Python on the Macintosh

http://www.python.org/getit/mac/

Python comes pre-installed on Mac OS X, but due to Apple's release cycle, it's often one or even two years old. The overwhelming recommendation of the "MacPython" community is to upgrade your Python by downloading and installing a newer version from the Python standard release page.

:) that's outdated ...
on Lion,

Python 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Monday 27 February 2012

Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions. [Sci Rep. 2011] - PubMed - NCBI

http://www.ncbi.nlm.nih.gov/pubmed/22355574

Abstract

The rapid development of next generation sequencing (NGS) technology provides a new chance to extend the scale and resolution of genomic research. How to efficiently map millions of short reads to the reference genome and how to make accurate SNP calls are two major challenges in taking full advantage of NGS. In this article, we reviewed the current software tools for mapping and SNP calling, and evaluated their performance on samples from The Cancer Genome Atlas (TCGA) project. We found that BWA and Bowtie are better than the other alignment tools in comprehensive performance for Illumina platform, while NovoalignCS showed the best overall performance for SOLiD. Furthermore, we showed that next-generation sequencing platform has significantly lower coverage and poorer SNP-calling performance in the CpG islands, promoter and 5'-UTR regions of the genome. NGS experiments targeting for these regions should have higher sequencing depth than the normal genomic region.

Monozygotic twins: genes are not the destiny? [Bioinformation. 2011] - PubMed - NCBI

http://www.ncbi.nlm.nih.gov/pubmed/22355239

Abstract

Monozygotic twins are considered to be genetically identical, yet can show high discordance in their phenotypes and disease susceptibility. Several studies have emphasized the influence of external factors and the role of epigenetic polymorphism in conferring this variability. However, some recent high-resolution studies on DNA methylation show contradicting evidence, which poses questions on the extent of epigenetic variability between twins. The advent of next-generation sequencing technologies now allow us to interrogate multiple epigenomes on a massive scale and understand the role of epigenetic modification, especially DNA methylation, in regulating complex traits. This article briefly discusses the recent key findings, unsolved questions in the area, and speculates on the future directions in the field.

Identification of genetic risk variants for... [BMC Med Genomics. 2012] - PubMed - NCBI

http://www.ncbi.nlm.nih.gov/pubmed/22353194

Abstract

ABSTRACT:

BACKGROUND:

Next-generation DNA sequencing is opening new avenues for genetic association studies in common diseases that, like deep vein thrombosis (DVT), have a strong genetic predisposition still largely unexplained by currently identified risk variants. In order to develop sequencing and analytical pipelines for the application of next-generation sequencing to complex diseases, we conducted a pilot study sequencing the coding area of 186 hemostatic/proinflammatory genes in 10 Italian cases of idiopathic DVT and 12 healthy controls.

RESULTS:

A molecular-barcoding strategy was used to multiplex DNA target capture and sequencing, while retaining individual sequence information. Genomic libraries with barcode sequence-tags were pooled (in pools of 8 or 16 samples) and enriched for target DNA sequences. Sequencing was performed on ABI SOLiD-4 platforms. We produced >12 gigabases of raw sequence data to sequence at high coverage (average: 42X) the 700-kilobase target area in 22 individuals. A total of 1876 high-quality genetic variants were identified (1778 single nucleotide substitutions and 98 insertions/deletions). Annotation on databases of genetic variation and human disease mutations revealed several novel, potentially deleterious mutations. We tested 576 common variants in a case-control association analysis, carrying the top-5 associations over to replication in up to 719 DVT cases and 719 controls. We also conducted an analysis of the burden of nonsynonymous variants in coagulation factor and anticoagulant genes. We found an excess of rare missense mutations in anticoagulant genes in DVT cases compared to controls and an association for a missense polymorphism of FGA (rs6050; p=1.9 x 10-5, OR 1.45; 95% CI, 1.22-1.72; after replication in >1400 individuals).

CONCLUSIONS:

We implemented a barcode-based strategy to efficiently multiplex sequencing of hundreds of candidate genes in several individuals. In the relatively small dataset of our pilot study we were able to identify bona fide associations with DVT. Our study illustrates the potential of next-generation sequencing for the discovery of genetic variation predisposing to complex diseases.

PLoS ONE: Direct Comparisons of Illumina vs. Roche 454 Sequencing Technologies on the Same Microbial Community DNA Sample

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0030087
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R2>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.

Saturday 25 February 2012

Omics! Omics!: Why Oxford Nanopore Needs to Release Some Data Pronto (Besides Bailing Me Out)

Keith Robinson makes a very compelling call for Oxford Nanopore to release actual data. I totally agree with his point about not making the same mistake with the other laggards in seq tech especially when I think in sequencing the bioinformatics (or data analysis ) will make or break the platform.

http://omicsomics.blogspot.com/2012/02/why-oxford-nanopore-needs-to-release.html?m=1

Friday 24 February 2012

bash_profile bashrc Mac users think differently

Been trying to hack my Macbook to replace my Ubuntu work environment.
although MOST things are portable .. but I am finding that Mac users actually live with a lot of inconveniences for which there are hacks / solutions ..

like the aerosnap feature in Win7?
do yourself a favour and get this
https://github.com/fikovnik/ShiftIt


One thing that brings a chuckle to my face.
There's a lot of sites that 'solves' the missing bashrc problem by inserting in the the bash_profile that 'Mac uses instead'
http://superuser.com/questions/147043/where-to-find-the-bashrc-file-on-mac-os-x-snow-leopard-and-lion

Oh gosh just rename your bashrc to bash_profile and hope ur bash customizations ain't Mac averse

I am still trying to find how to write to NTFS in Lion .... I can't believe that if it's working fine in Ubuntu I might possibly have to fork out money to implement this. Seriously Apple should just pay for the NTFS licence just to make the point that Mac is friendlier to multi platforms OR allow users to implement NTFS write with the caveat that it's a reverse engineered hack


Update:
vim syntax isn't turned on by default (WHY??)
quick fix
cp  /usr/share/vim/vim73/vimrc_example.vim ~/.vimrc


context coloring in bash Terminal (Why would Mac ship with default B&W color schemes?)
http://superuser.com/questions/324207/how-do-i-get-context-coloring-in-mac-os-x-terminal
Check out the link above,
essentially insert these 2 lines

export CLICOLOR=1export LSCOLORS=GxFxCxDxBxegedabagaced

into your .bash_profile (not .profile )

Note: when working with vim, try to remember that crtl works as control and command is as per Linux / Win (not fun to mix up the keys on important documents)

add this alias will keep you from pulling your hair out when working in Mac and Linux environments
alias md5sum='md5 -r'
Note that it will mean nothing in Linux
But if you liked to use 'md5sum -c' like me, you might have to install md5sum proper :( I am delaying this but I don't think looking md5sums by eye is fun or accurate)

Wednesday 22 February 2012

BioNumber of the month

http://bionumbers.hms.harvard.edu/

In BioNumbers we aim to enable you to find in one minute any useful molecular biology number that can be important for your research. BioNumbers currently attracts >3000 visitors a month from over 50 countries.

To cite BioNumbers please refer to: Milo et al. Nucl. Acids Res. (2010) 38 (suppl 1): D750-D753.

The BioNumbers database started in 2007 by Ron Milo, Paul Jorgensen and Mike Springer while sharing a bay at the Systems Biology department in Harvard. It was inspired by a table comparing values of key properties in bacteria, yeast and a mammalian cell line in Uri Alon’s book – Introduction to systems biology and by the CyberCell Project .

BioNumbers is coordinated and developed at the Milo lab in the Weizmann Institute in Israel. Feel free to write us a note at BioNumbers@gmail.com. The current database format was designed and implemented by Griffin Weber at Harvard. The full version was programmed and is being developed by Zaztech and ProperDev. The BioNumbers logo was designed by Ricardo Vidal.

It is our hope that the database will facilitate quantitative analysis and reasoning in a field of research where numbers tend to be “soft” and difficult to vouch for. Financial as well as moral support for the effort is being given by the Systems biology department in Harvard and by the Weizmann Institute.


BioNumber of the month


Amazon S3 for temporary storage of large datasets?

Just did a rough calculation on AWS calculator, the numbers are quite scary!

For a hypothetical 50 TB dataset (haven't found out the single S3 object max file size yet, seem to recall it's 1 Gbytes)
it costs $4160.27 to store it for a month!

to transfer it out it costs $4807.11!

For 3 years, the cost of storage is $149,000 which I guess you can pay for an enterprise storage solution and transfer costs are zero.

At this point in time, I guess one can't really use AWS S3 for sequence archival. I wonder if data deduplication can help reduce cloud storage costs ... I am sure in terms of bytes, BAM files should be quite similar .. no?


Next Generation Genetic Testing for retinitis pigmentosa (RP) [Hum Mutat. 2012] - PubMed - NCBI

http://www.ncbi.nlm.nih.gov/pubmed/22334370

Hum Mutat. 2012 Feb 14. doi: 10.1002/humu.22045. [Epub ahead of print]

Next Generation Genetic Testing for Retinitis Pigmentosa.

Source

Department of Human Genetics, Radboud University Nijmegen Medical Centre, 6525 GA Nijmegen, The Netherlands; Institute for Genetic and Metabolic Disease, 6525 GA Nijmegen, The Netherlands.

Abstract

Molecular diagnostics for patients with retinitis pigmentosa (RP) has been hampered by extreme genetic and clinical heterogeneity, with 52 causative genes known to date. Here, we developed a comprehensive next generation sequencing approach (NGS) for the clinical molecular diagnostics of RP. All known inherited retinal disease genes (n=111) were captured and simultaneously analyzed using NGS in 100 RP patients without a molecular diagnosis. A systematic data analysis pipeline was developed and validated to prioritize and predict the pathogenicity of all genetic variants identified in each patient, which enabled us to reduce the number of potential pathogenic variants from ∼1,200 to 0-9 per patient. Subsequent segregation analysis and in silico predictions of pathogenicity resulted in a molecular diagnosis in 36 RP patients, comprising 27 recessive, 6 dominant and 3 X-linked cases. Intriguingly, de novo mutations were present in at least 3 out of 28 isolated cases with causative mutations. This study demonstrates the enormous potential and clinical utility of NGS in molecular diagnosis of genetically heterogeneous diseases such as RP. De novo dominant mutations appear to play a significant role in patients with isolated RP, having major implications for genetic counselling.

© 2012 Wiley Periodicals, Inc.

Click here to read

PMID:
 
22334370
 
[PubMed - as supplied by publisher]

standardized-velvet-assembly-report - a set of scripts and a Sweave report used to iterate through parameters and generate a report on Velvet-generated sequence assemblies - Google Project Hosting

where was this when I was playing with Velvet?

a set of scripts and a Sweave report used to iterate through parameters and generate a report on Velvet-generated sequence assemblies - Google Project Hosting

http://code.google.com/p/standardized-velvet-assembly-report/

Screenshots:

Requirements:

  • velvet (velveth,velvetg should be in your PATH)
  • R (with Sweave, usually included)
  • R libraries (from R prompt type install.packages("ggplot2","proto","xtable"))
  • pdflatex (usually part of TeTeX)
  • Perl (with PerlIO::gzip)

Optional:

  • To generate alignments against a reference genome, use either
    • BLAT (add to your PATH)
    • BLAST (add to your PATH)

To Download:

To Run:

  • Edit permute.sh to your liking, paying particular attention to the kmer, cvCut, expCov, and other crucial flags like shortPaired
  • perl fastaAllSize mysequences.fa > mysequences.stat
  • ./permute.sh mysequences (leave out the .fa)
  • If NOT using a reference genome skip this section
    • If using Blat:
      • faToTwoBit myrefgenome.fa myrefgenome.2bit
      • gfServer start localhost 9999 myrefgenome.2bit
      • for f in out*dir; do if [ ! -e $f/contigsVsRef.psl ]; then echo $f; gfClient localhost 9999 ./ $f/contigs.fa $f/contigsVsRef.psl; fi; done
    • If using BLAST:
      • formatdb -i myrefgenome -p F
      • for f in out*dir; do if [ ! -e $f/contigsVsRef.m8 ]; then echo $f; blastall -i $f/contigs.fa -p blastn -d myrefgenome -m 8 -o $f/contigsVsRef.m8; fi; done
  • for f in out*dir; do if [ ! -e $f/metadata.txt ]; then perl generateAssemblyStats.pl $f > $f/metadata.txt; fi; done
  • for f in out*dir; do echo "groupDir<-\"$f\";statFile<-\"mysequences\";statTab<-\"$f/stats.txt\";metaTab<-\"$f/metadata.txt\";source(\"calculateStats.R\")" | R --no-save --quiet; done
  • Choose one of three report formats
    • If you wish to skip the individual contig length histograms (much quicker)
    • echo "assmName<-\"mysequences\";statFile<-\"mysequences\"; Sweave(\"shortReport.Rnw\",output=\"mysequences.tex\");" | R --no-save --quiet
    • If using no reference genome:
    • echo "assmName<-\"mysequences\";statFile<-\"mysequences\"; Sweave(\"report.Rnw\",output=\"mysequences.tex\");" | R --no-save --quiet
    • If using the reference genome alignments:
    • echo "refName<-\"My reference genome\";assmName<-\"mysequences\";statFile<-\"mysequences\"; Sweave(\"refReport.Rnw\",output=\"mysequences.tex\");" | R --no-save --quiet
  • pdflatex mysequences.tex
  • View the pdf report mysequences.pdf

Tuesday 21 February 2012

DNA from the Beginning - An animated primer of 75 experiments that made modern genetics.

DNA from the Beginning 

is a collection of 75 experiments that made modern genetics. 

The science behind each concept is explained by:
animation, image gallery, video interviews, problem, biographies, and links.
http://www.dnaftb.org/

Sunday 19 February 2012

Comparison between Normalised and Unnormalised 454-Sequencing Libraries for Small-Scale RNA-Seq Studies.

Comp Funct Genomics. 2012;2012:281693. Epub 2012 Jan 26.

Comparison between Normalised and Unnormalised 454-Sequencing Libraries for Small-Scale RNA-Seq Studies.

Ekblom R, Slate J, Horsburgh GJ, Birkhead T, Burke T.

Source

Department of Ecology and Genetics, Uppsala University, Norbyvägen 18 D, 75236 Uppsala, Sweden.

Abstract

Next-generation sequencing of transcriptomes (RNA-Seq) is being used increasingly in studies of nonmodel organisms. Here, we evaluate the effectiveness of normalising cDNA libraries prior to sequencing in a small-scale study of the zebra finch. We find that assemblies produced from normalised libraries had a larger number of contigs but used fewer reads compared to unnormalised libraries. Considerably more genes were also detected using the contigs produced from normalised cDNA, and microsatellite discovery was up to 73% more efficient in these. There was a positive correlation between the detected expression level of genes in normalised and unnormalised cDNA, and there was no difference in the number of genes identified as being differentially expressed between blood and spleen for the normalised and unnormalised libraries. We conclude that normalised cDNA libraries are preferable for many applications of RNA-Seq and that these can also be used in quantitative gene expression studies.

http://www.ncbi.nlm.nih.gov/pmc/articles/pmid/22319409/?tool=pubmed

The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals.

PLoS Genet. 2012 Feb;8(2):e1002496. Epub 2012 Feb 2.

The empirical power of rare variant association methods: results from sanger sequencing in 1,998 individuals.

Source

Department of Human Genetics, McGill University, Montreal, Canada.

Abstract

The role of rare genetic variation in the etiology of complex disease remains unclear. However, the development of next-generation sequencing technologies offers the experimental opportunity to address this question. Several novel statistical methodologies have been recently proposed to assess the contribution of rare variation to complex disease etiology. Nevertheless, no empirical estimates comparing their relative power are available. We therefore assessed the parameters that influence their statistical power in 1,998 individuals Sanger-sequenced at seven genes by modeling different distributions of effect, proportions of causal variants, and direction of the associations (deleterious, protective, or both) in simulated continuous trait and case/control phenotypes. Our results demonstrate that the power of recently proposed statistical methods depend strongly on the underlying hypotheses concerning the relationship of phenotypes with each of these three factors. No method demonstrates consistently acceptable power despite this large sample size, and the performance of each method depends upon the underlying assumption of the relationship between rare variants and complex traits. Sensitivity analyses are therefore recommended to compare the stability of the results arising from different methods, and promising results should be replicated using the same method in an independent sample. These findings provide guidance in the analysis and interpretation of the role of rare base-pair variation in the etiology of complex traits and diseases.

Click here to read Click here to read

Saturday 18 February 2012

Oxford Nanopore megaton announcement: “Why do you need a machine?” – exclusive interview for this blog!


http://pathogenomics.bham.ac.uk/blog/2012/02/oxford-nanopore-megaton-announcement-why-do-you-need-a-machine-exclusive-interview-for-this-blog/

woke up this morning to see a whole bunch of excited tweets on Oxford Nanopore and I can totally understand why. This is the real democratization of DNA sequencing. Move over benchtop / desktop sequencers for 'laptop sequencers'!

Hmmm or a cluster of sequencers, on your compute cluster ... !

Using USB powered sequencers, and a pipette to put in the dsDNA and you might have your sequence read to FASTQ directly to your laptop. 
No known limit to read length. 
4% seq error (the good thing is that the form of error is known and therefore correctable)


Do read the url above for more info, here's the excerpted executive summary for the impatient

Executive Summary
  • Nanopore have announced a strand sequencing method, made possible by a heavily modified biological nanopore and an industrially-fabricated polymer
  • DNA passes through the nanopore and tri-nucleotides in contact with the pore are detected through electrochemistry
  • Demonstrated 2x50kb sense & anti-sense of same molecules (lambda phage) – no theoretical read length limit
  • Can sequence direct from blood without need for sample preparation
  • Two products announced:
    • MinIon – USB disposable sequencer for ~ $900 has 512 nanopores – target 150mb/hour
    • MinIon can run at 120-1000 bases/minute per pore for up to 6 hours
    • GridIon – two versions of rack-mountable sequencer with 2000 nanopores (2nd half 2012), 8000 nanopores (2013)
    • GridIons can be racked in parallel, 20 could do a whole human genome in 15 minutes
    • Each GridIon can do "tens of gigabases" over 24 hours
  • Both machines commercially available 2nd half 2012
  • Sequencing can be paused, sample recovered, replaced, started again
  • Accuracy is 96%, errors are deletions, error profile will improve through software


Check out Forbes interview with 454 / PGM inventor Jon Rothberg

"Rothberg noted that Ion Torrent’s new machine, the Proton, the company showed three completed human genomes yesterday at AGBT. More importantly, he had the machine – not a mock-up or a design – on the stage. “That’s where you need to be to ship mid-year,” he writes."


Over at Genomes Unzipped 
Oxford Nanopore CTO Clive Brown related how sequencing library prep is as simple as diluting rabbit's blood with water. Now that is impressive!




This post is getting too long because I keep updating it. 
Over at the BioITWorld, there's an interview with Clive Brown which cites other interesting info. 
First of which is the opening paragraph which is amusing in the light of ONT's rivals comments
"Clive Brown, vice president of development and informatics for Oxford Nanopore Technologies (ONT), a.k.a “the most honest guy in all of next-gen sequencing,” as dubbed by The Genome Center's David Dooling, is hoping to catch lightning in a bottle again. "


Oxford Nanopore has not yet revealed details of its future platform, but in early 2009, published a lovely paper in Nature Nanotechnology showing that its alpha-hemolysin nanopores can discriminate between the four bases of DNA (not to mention a fifth, methyl C)




Directly get methylation information from your sequencing sans complicated sample prep? That has to be another selling point. 


Not sure whether Nanopore is truly vaporware. However, gauging by the excitement over the blogosphere and the hit rates for the first to blog about it. I think Nanopore is upping the ante for the next IT sequencer. 
maybe we can only survive 2 more AGBT like this and AGBT might fizzle out as new sequencing technologies fade as our computation advances trails behind the ability to generate more data. 
Maybe you will see scientists start attending Big Data tech conferences or AGBT's  main draw will  fancy new software to assemble, align and make sense out of all the data being generated ... 




This picture tells quite a story (Wordle constructed from 3,386 tweets and retweets tagged #AGBT with @s removed).
No prizes for guessing the winner ... 

Everybody is talking about #nanopore

Woke up to read a whole bunch of tweets on nanopore.

Guess this is going to be the kind of technology that enables  and inspires science ...
100 kb + 4% error ...
Not sure what else I missed on a sleepy sat morning ...
here's a tweet with videos ..

@pathogenomenick: I've embedded all the nanopore videos on the blog here http://bit.ly/w6dDaU #AGBT
Shared via TweetCaster

@galaxyproject: MACS 1.4 (ChIP-Seq peak calling) now available in Galaxy Tool Shed (Main has 1.0.1) http://bit.ly/gxyshed #usegalaxy Shared via TweetCaster

@galaxyproject: MACS 1.4 (ChIP-Seq peak calling) now available in Galaxy Tool Shed (Main has 1.0.1) http://bit.ly/gxyshed #usegalaxy
Shared via TweetCaster

Friday 17 February 2012

a tour of various bioinformatics functions in Avadis NGS

Not affliated with Avadis but this might be useful for you 




We are hosting an online seminar series on the alignment and analysis of genomics data from “benchtop” sequencers, i.e. MiSeq and Ion Torrent. Our webinar panelists will give a tour of various bioinformatics functions in Avadis NGS that will enable researchers and clinicians to derive biological insights from their benchtop sequencing data.

Seminar #1: MiSeq Data Analysis

Avadis NGS 1.3 provides special support for analyzing data generated by MiSeq™ sequencers. In this webinar, we will describe how the data in a MiSeq generated “run folder” is automatically loaded into the Avadis NGS software during small RNA alignment and DNA variant analysis. This is especially helpful in processing the large number of files generated when the TruSeq™ Amplicon Kits are used. We will describe how to use the Quality Control steps in Avadis NGS to check if the amplicons have sufficient coverage in all the samples. Regions with unexpected coverages can easily be identified using the new region list clustering feature. Webinar attendees will learn how to use the “Find Significant SNPs” feature to quickly identify high-confidence SNPs present in a majority of the samples, rare variants, etc.


Seminar #2: Ion Torrent Data Analysis

Avadis NGS 1.3 includes a new aligner – COBWeb – that is fully capable of aligning the long, variable-length reads generated by Ion Torrent sequencers. In this webinar, we will show the pre-alignment QC plots and illustrate how they can be used to set appropriate alignment parameters for aligning Ion Torrent reads. For users who choose to import the BAM format files generated by the Ion Torrent Server, we will describe the steps needed for importing amplicon sequencing data into Avadis NGS. Users of the Ion AmpliSeq™ Cancer Panel will learn how to easily import the targeted mutation list and verify the genotype call at the mutation sites. We will also show the new “Find Significant SNPs” feature which helps quickly identify high-confidence SNPs present in a majority of the samples, rare variants, etc.


Free registration - http://www.avadis-ngs.com/webinar

Datanami, Woe be me