Friday, 28 September 2012

Familial cosegregation of rare genetic variants with disease in complex disorders

Item 1 of 1    (Display the citation in PubMed)

1. Eur J Hum Genet. 2012 Sep 26. doi: 10.1038/ejhg.2012.194. [Epub ahead of print]

Familial cosegregation of rare genetic variants with disease in complex disorders.

Helbig I, Hodge SE, Ottman R.


Department of Neuropediatrics, Christian-Albrechts-University Kiel, Kiel, Germany.


Family-based designs are increasingly being used for identification of rare variants in complex disorders. This paper addresses two questions related to the utility of these designs. First, under what circumstances are rare disease-related variants expected to cosegregate with disease in families? Second, under what circumstances is a disease-variant association expected to be greater in studies restricted to familial cases than in studies of unselected cases? To investigate these questions, we developed a probability model of disease causation involving two loci. To address cosegregation, we examined the probability that an affected first-degree relative of a variant-carrying proband would also carry the variant. We find that this probability increases with increasing odds ratio (OR) for the variant, but declines with increasing sibling recurrence risk ratio (λ(s)). For example, under reasonable assumptions, the 15q13.3 microdeletion in idiopathic generalized epilepsy, with an OR estimate of 68 in large case-control studies, is expected to be present in >95% of affected first-degree relatives of variant-carrying probands. However, for a variant with OR=5, the probability an affected relative has the variant ranges from 82% (when λ(s)=2) to 58% (when λ(s)=50). We also find that restriction of a study to familial cases does not necessarily increase a rare variant's association with disease, especially if λ(s) is high and the variant contributes little to overall disease familial aggregation. These findings provide guidance for the design of family-based studies of rare variants in complex disorders.European Journal of Human Genetics advance online publication, 26 September 2012; doi:10.1038/ejhg.2012.194.

PMID: 23010752 [PubMed - as supplied by publisher]
Icon for Nature Publishing Group

A metagenome-wide association study of gut microbiota in type 2 diabetes : Nature : Nature Publishing Group

A metagenome-wide association study of gut microbiota in type 2 diabetes

30 August 2011 
27 July 2012 
Published online
26 September 2012


Assessment and characterization of gut microbiota has become a major research area in human disease, including type 2 diabetes, the most prevalent endocrine disease worldwide. To carry out analysis on gut microbial content in patients with type 2 diabetes, we developed a protocol for a metagenome-wide association study (MGWAS) and undertook a two-stage MGWAS based on deep shotgun sequencing of the gut microbial DNA from 345 Chinese individuals. We identified and validated approximately 60,000 type-2-diabetes-associated markers and established the concept of a metagenomic linkage group, enabling taxonomic species-level analyses. MGWAS analysis showed that patients with type 2 diabetes were characterized by a moderate degree of gut microbial dysbiosis, a decrease in the abundance of some universal butyrate-producing bacteria and an increase in various opportunistic pathogens, as well as an enrichment of other microbial functions conferring sulphate reduction and oxidative stress resistance. An analysis of 23 additional individuals demonstrated that these gut microbial markers might be useful for classifying type 2 diabetes.

Backup your entire harddisk with bitcasa

For those of you without the habit of regularly backing up your HDD (or lack software like Time Machine on the MacOS) 
You might want to check out Bitcasa that promises infinite storage (at least for the beta period). 
Having a sync'd secure offsite copy of your data / scripts is definitely a lifesaver especially if you are using a HDD that is already a few years old. 

Filesystem      Size   Used  Avail Capacity  Mounted on
/dev/disk0s2   465Gi  365Gi  100Gi    79%    /
My Infinite      0Bi    0Bi    0Bi   100%    /Users/kevin/Bitcasa/My Infinite
MYLINUXLIVE    512Ti    0Bi  512Ti     0%    /Users/kevin/Bitcasa/MYLINUXLIVE

I want to share infinite storage with you. Sign up
for @Bitcasa for free!

How much data can I manage using Bitcasa?
During the beta program, you can manage as much data using Bitcasa as you want. Absolutely no limits.

EPACTS v2.1 is released

Dear EPACTS Users,

A new version of EPACTS are released at 

Thursday, 27 September 2012

NAVER Ndrive gives you 30 Gb online storage - Android Apps on Google Play

Definitely will give dropbox and sugarsync a run for their money! 
Unfortunately it's in japanese but with google translate you shld be able to make things out 

30GB free online file storage space.You can view valuable files offline.

Free 30GB file storage space on the Web,Android app, that is NAVER Ndrive.
You can access to your files at anytime,anywhere with Ndrive APP.

-Easily upload photos and videos on your Android phone.
※each file is limited under 500MB.

-You can view doc,xls,ppt,pdf files,of course in offline mode.

-Sync files in your Android phone to Ndrive.

-Works good as a music player:shows your music files by artist or album,and also shows cover images.
*Only mp3 files are available on NdriveAPP.

-Enable to view photos on slide show.

※ NAVER member registration is required to use this application.

using gzipped geno files for snptest

seems like using gzipped input files doesn't affect the analysis timings of snptest
will do this with binary geno (bgen) files and update

$ grep 'User' User time (seconds): 669.87 User time (seconds): 662.10

Wednesday, 26 September 2012

Next Genetics Blog

Ah chanced on an interesting blog by Damian 

Read his 'about me' and saw a nice quote 

I also strongly believe that science should be accessible. Rational thinking, discussion of ideas, and curiosity should be encouraged and not be relegated to esotericism. I refer to this quote by Neil deGrasse Tyson:

“I would teach how science works as much as I would teach what science knows. I would assert (given that essentially, everyone will learn to read) that science literacy is the most important kind of literacy they can take into the 21st century. I would undervalue grades based on knowing things and find ways to reward curiosity. In the end, it's the people who are curious who change the world.” 

― Neil deGrasse Tyson

he posts code with examples for various tasks like 

Next-generation Phylogenomics Using a Target Restricted Assembly Method.

Very interesting to turn the assembly problem backwards ... though it has limited applications outside of phylogenomics I suppose since you need to have the protein sequences avail in the first place. 

I am not sure if there are tools that can easily extract mini-assemblies from BAM files i.e. extract aligned reads (in their entirety instead of being trimmed by the region you specify) 
which should be nice / useful to do when trying to look at assemblies in regions and trying to add new reads or info to them (Do we need a phrap/consed for NGS de novo assembly? ) 

 2012 Sep 18. pii: S1055-7903(12)00364-8. doi: 10.1016/j.ympev.2012.09.007. [Epub ahead of print]

Next-generation Phylogenomics Using a Target Restricted Assembly Method.


Illinois Natural History Survey, University of Illinois, 1816 South Oak Street, Champaign, IL 61820, USA. Electronic address:


Next-generation sequencing technologies are revolutionizing the field of phylogenetics by making available genome scale data for a fraction of the cost of traditional targeted sequencing. One challenge will be to make use of these genomic level data without necessarily resorting to full-scale genome assembly and annotation, which is often time and labor intensive. Here we describe a technique, the Target Restricted Assembly Method (TRAM), in which the typical process of genome assembly and annotation is in essence reversed. Protein sequences of phylogenetically useful genes from a species within the group of interest are used as targets in tblastn searches of a data set from a lane of Illumina reads for a related species. Resulting blast hits are then assembled locally into contigs and these contigs are then aligned against the reference "cDNA" sequence to remove portions of the sequences that include introns. We illustrate the Target Restricted Assembly Method using genomic scale datasets for 20 species of lice (Insecta: Psocodea) to produce a test phylogenetic data set of 10 nuclear protein coding gene sequences. Given the advantages of using DNA instead of RNA, this technique is very cost effective and feasible given current technologies.
Copyright © 2012. Published by Elsevier Inc.
Icon for Elsevier Science

[PubMed - as supplied by publisher]

CORTEX update contains scripts for assembly of large numbers of samples with large genomes - i.e. for the 1000 Genomes project.

cortex_var is a tool for genome assembly and variation analysis from sequence data. You can use it to discover and genotype variants on single or multiple haploid or diploid samples. If you have multiple samples, you can use Cortex to look specifically for variants that distinguish one set of samples (eg phenotype=X, cases, parents, tumour) from another set of samples (eg phenotype=Y, controls, child, normal). See our Nature Genetics paper and the documentation for detailed descriptions.

The Cortex paper is now out in Nature Genetics! 

cortex_var features

  • Variant discovery by de novo assembly - no reference genome required
  • Supports multicoloured de Bruijn graphs - have multiple samples loaded into the same graph in different colours, and find variants that distinguish them.
  • Capable of calling SNPs, indels, inversions, complex variants, small haplotypes
  • Extremely accurate variant calling - see our paper for base-pair-resolution validation of entire alleles (rather than just breakpoints) of SNPs, indels and complex variants by comparison with fully sequenced (and finished) fosmids - a level of validation beyond that demanded of any other variant caller we are aware of - currently cortex_var is the most accurate variant caller for indels and complex variants.
  • Capable of aligning a reference genome to a graph and using that to call variants
  • Support for comparing cases/controls or phenotyped strains
  • Typical memory use: 1 high coverage human in under 80Gb of RAM, 1000 yeasts in under 64Gb RAM, 10 humans in under 256 Gb RAM
23rd August 2012: Bugfix release v1.0.5.11. Get it here.. The main change in this release is in the scripts/1000genomes directory, which I have not advertised previously. It contains scripts for running Cortex on large numbers (tens, hundreds) of samples with large genomes - i.e. for the 1000 Genomes project. These are to allow collaborators across the world to reliably run a consistent Cortex pipeline on human populations. However this is the first time people other than me have done this, so I expect there may be some smoothing-out of issues in the near future. You can see aPDF describing the pipeline here. I've had enough people ask me about running Cortex on lots of samples with big genomes, that I thought people would find it useful to see the process. This release is a bugfix for a script in that 1000 Genomes directory, plus fixes for a few potential bugs-in-waiting (array overflow errors) in Cortex itself.

Saturday, 22 September 2012

NGS of microdroplet-based PCR amplified target exons.

 2012 Sep 20;13(1):500. [Epub ahead of print]

Accurate variant detection across non-amplified and whole genome amplified DNA using targeted next generation sequencing.




Many hypothesis-driven genetic studies require the ability to comprehensively and efficiently target specific regions of the genome to detect sequence variations. Often, sample availability is limited requiring the use of whole genome amplification (WGA). We evaluated a high-throughput microdroplet-based PCR approach in combination with next generation sequencing (NGS) to target 384 discrete exons from 373 genes involved in cancer. In our evaluation, we compared the performance of six non-amplified gDNA samples from two HapMap family trios. Three of these samples were also preamplified by WGA and evaluated. We tested sample pooling or multiplexing strategies at different stages of the tested targeted NSG (T-NGS) workflow.


The results demonstrated comparable sequence performance between non-amplified and preamplified samples and between different indexing strategies [sequence specificity of 66.0% +/- 3.4%, uniformity (coverage at 0.2x of the mean) of 85.6% +/- 0.6%]. The average genotype concordance maintained across all the samples was 99.5% +/- 0.4%, regardless of sample type or pooling strategy. We did not detect any errors in the Mendelian patterns of inheritance of genotypes between the parents and offspring within each trio. We also demonstrated the ability to detect minor allele frequencies within the pooled samples that conform to predicted models.


Our described PCR-based sample multiplex approach and the ability to use WGA material for NGS may enable researchers to perform deep resequencing studies and explore variants at very low frequencies and cost.
[PubMed - as supplied by publisher]

An informatics approach to analyzing the incidentalome.

'incidental finding' OME ....
 2012 Sep 20. doi: 10.1038/gim.2012.112. [Epub ahead of print]

An informatics approach to analyzing the incidentalome.


1] Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA [2] Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA [3] Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA.


Purpose:Next-generation sequencing has transformed genetic research and is poised to revolutionize clinical diagnosis. However, the vast amount of data and inevitable discovery of incidental findings require novel analytic approaches. We therefore implemented for the first time a strategy that utilizes an a priori structured framework and a conservative threshold for selecting clinically relevant incidental findings.Methods:We categorized 2,016 genes linked with Mendelian diseases into "bins" based on clinical utility and validity, and used a computational algorithm to analyze 80 whole-genome sequences in order to explore the use of such an approach in a simulated real-world setting.Results:The algorithm effectively reduced the number of variants requiring human review and identified incidental variants with likely clinical relevance. Incorporation of the Human Gene Mutation Database improved the yield for missense mutations but also revealed that a substantial proportion of purported disease-causing mutations were misleading.Conclusion:This approach is adaptable to any clinically relevant bin structure, scalable to the demands of a clinical laboratory workflow, and flexible with respect to advances in genomics. We anticipate that application of this strategy will facilitate pretest informed consent, laboratory analysis, and posttest return of results in a clinical context.Genet Med advance online publication 20 September 2012.Genetics in Medicine (2012); doi:10.1038/gim.2012.112.
[PubMed - as supplied by publisher]

Friday, 21 September 2012

Release of 23andMe Personal Genome API

Wow! 23andme has released their API, and I think it's a great move for the company looking at how Apple's app store and Google Play have taken off to make their devices more appealing to the public by giving the public a chance to write apps rather than hire a juggernaut of software developers working to produce what they think would be popular. 

They have taken roughly Apple's model by asking to review applications first but the API is free (for now). Not sure if 3rd party developers can earn from this model or if 23andme will co-earn. 
Also, if the T&C would allow one to create a mega genetic survey by simply asking for anonymous contributions of phenotype and genotype (essentially copying data from 23andme's client base)
Imagine a 23andme 'volunteer' track alongside 1000 genomes track on UCSC browser

If you're interested in creating a great genome app, visit Developers who'd like to use the API must first apply for authorization from 23andMe, describing the proposed application and the information it will request from end users. While 23andMe will not endorse or promote specific applications, 23andMe will individually evaluate applications on a case-by-case basis. 23andMe customers can decide for themselves whether to grant access to an app that uses the API. View the current API terms here.

The Personal Genome API is a work in progress and is available for free during this early access period. The scope and terms of the API will evolve as we gather feedback and learn from developers, users and others.

We are excited to welcome third-party developers as our partners in building new features for personal genomes. There is no app too small — so if you'd like to create some "genetic code", we invite you to apply!

Read more in our press release, or get started at

Managing Big Bad Biology data: BioTeam Inc. Blog

I have always loved BioTeam's blog posts and slides (perhaps my true calling lies in that direction). Sometimes I find myself advocating something that goes against the grain of user convenience/perception for the 'greater good' and often I would budge simply cos I feel I don't have a real case as I do not have actual numbers to support that the benefits outweight the costs. 
I think what the guys at BioTeam are doing is great .. cutting through user perceptions and the technology maze to come up with useable practical suggestions. (although sometimes I think that they mainly serve bigger labs, and actually tips for small labs with big dreams would be cool too!) 

btw my neighbouring lab has a backblaze 'clone' for secondary storage most probably inspired by BioTeam's posts .... hahah shall investigate further .. 

BioTeam Inc. Blog

Posted: 19 Sep 2012 04:00 PM PDT
Intel hosted a seminar series event at the Broad Institute in Cambridge, MA on 9/6/2012. The topic was Managing Big Data in Life Sciences and Healthcare, a critically important topic in life sciences at the moment. I was invited to give a talk about how BioTeam sees the state of the art and how we approach the management of this expansive sea of data. Since we see an incredibly broad range of implementations of data management solutions, my presentation covered the 10,000ft overview of how I see the state of big data storage and the new challenges that are being faced. Since we often approach data management issues by implementing our own MiniLIMS software, I demonstrated how a system like MiniLIMS can help manage sprawling file systems from a research perspective.

My slides can be viewed below. If you can't see the embedded presentation, please follow this link to SlideShare:

Wednesday, 19 September 2012

GenomeBrowse TM by Golden Helix;Broad's IGV's competitor?

Golden Helix has released a genome browser that gives you a visual representation of variants on a genome whilst pulling variant data from various sources "from the cloud" Not sure if it means Golden Helix is hosting the dbs or it's just pulling from the respective db sites. 

should be interesting to see how this matches up with IGV


VCF File Format Support

  • Lightning fast, direct VCF visualization as a sample-based variant plot.
  • Automatically sort and index files for quick access.

Tabular View of Any Data Source

  • Step through a list of putative causal variants or highly differentiated genes.
  • Get an infinitely scroll-able tabular view of any data source with details on each feature.
  • Use tables to zoom GenomeBrowse and focus on a feature.

Build Custom Annotation Tracks

  • Convert any text file, whether a BED file or a tabular dump of UCSC, into your own annotation track.
  • Curate reference sequences and genomes of new species.

Access Your Large Cloud-Based Sequencing Data

  • Illumina BaseSpace account integration coming soon to view your alignments and variant calls directly streamed from the cloud.

GenomeBrowse was launched on September 12th via a live webcast.

View the webcast launch »

In a one-hour webcast launch on September 12th, Gabe Rudy, Vice President of Product Development, will showcase GenomeBrowse including showing you how to:

  • View cloud-based public and private NGS samples with the context of public annotations like the 1000 Genomes variant list, NHLBI Exome Sequencing Project, and OMIM catalog.

  • Validate putative causal variants by investigating the read-based evidence from BAM files. Mismatch emphasis, read depth, and quality scores gives you confidence in your variants and lets you throw out false-positives.

  • Analyze a trio to browse variant inheritance between parents and child. Follow up on putative recessive, de Novo, or compound heterozygous variants to ensure their quality.

  • Investigate differentially expressed genes and splicing structure through the coverage profiles and pile-ups of RNA-seq data.

  • Navigate and fluidly browse from the single base to whole genome view without losing the context of what your data is telling you and without the disorienting jitters of other browsers.

webinar: Join Mathematica Virtual Conference 2012


Whether you are new to Mathematica or an experienced user, this free virtual conference will help you get the most out of the Mathematica platform.


Wed, 26 September 12:00am-3.45am Singapore Time (or 25 September 12pm to 3:45pm EDT)

Wed, 26 September 08:00am-11.45am Singapore Time (or 25 September 8pm to 11:45pm EDT)

Detailed schedule can be found here:


To register visit:

Register early as virtual seats are limited! Please feel free to pass this information along to your team, co-workers, and anyone in your network who might be interested in Mathematica.

X PRIZE Announces Bioinformatics Challenge!

From: Grant Campany

Dear Colleagues:

It is my pleasure to announce the Archon Genomics X PRIZE presented by Express Scripts "Bioinformatics Challenge."  Please click on the image below for additional information.

Please forward this email to your colleagues.

Good Luck!

Grant R. Campany | Senior Director & Prize Lead Archon Genomics X PRIZE presented by Express Scripts

Need for Speed HDD Fwd: My Book VelociRaptor Duo

Western Digital just announced the My Book VelociRaptor Duo yesterday, featuring 2 1TB VelociRaptor HDDs and 2 Thunderbolt ports for ultra-fast transfer and daisy chaining - so fast, it can transfer a 22GB HD movie in just 65 seconds!

Click the link below for more information on this new product!

on the WDC facebook page it says that " My Book Thunderbolt Duo, transferred data at the rate of 250MB/s. In contrast, the new My Book VelociRaptor Duo handles data at 400MB/s. That's a blazing 60% increase in transfer speed! "

my personal benchmark which is extremely flaky using dd I think i managed 250 mb/s max on a 12 Tb RAID 5 local volume on SATA drives. 
I think the current pain with requirement for big ram machines to do de novo assembly can move the 'memory' requirements to HDD (which you can use SSDs to make up for the loss in speed of RAM)

Extreme speed of WD VelociRaptor drives inside.
With the extreme speed of two 10,000 RPM WD VelociRaptor drives inside united with the revolutionary speed of Thunderbolt technology your creative inspirations have never moved so fast.

Enhanced workflow efficiency.
The dual Thunderbolt ports make it easy to daisy chain more drives for even greater speeds and higher capacity. Add peripherals to further enhance your productivity.

Exome sequencing and complex disease: practical aspects of rare variant association studies

Exome sequencing and complex disease: practical aspects of rare variant association studies

R Do, S Kathiresan, GR Abecasis - Human Molecular Genetics, 2012


Genetic association and linkage studies can provide insights into complex disease biology, guiding the development of new diagnostic and therapeutic strategies. Over the past decade, genetic association studies have largely focused on common, easy to measure genetic variants shared between many individuals. These common variants typically have subtle functional consequence and translating the resulting association signals into biological insights can be challenging. In the last few years, exome sequencing has emerged as a cost-effective strategy for extending these studies to include rare coding variants, which often have more marked functional consequences. Here, we provide practical guidance in the design and analysis of complex trait association studies focused on rare, coding variants.

Fwd: [GATK-Forum] Welcome Aboard!

anyone else still facing probs?

---------- Forwarded message ----------
From: GATK-Forum <>
Date: 19 September 2012 02:29
Subject: [GATK-Forum] Welcome Aboard!

Hello kevin!

You have successfully registered for an account at GATK-Forum. Here is your information:

  Username: kevin

You can access the site at

You need to confirm your email address before you can continue. Please confirm your email address by clicking on the following link:

Have a great day!

Separating the Pseudo From Science - The Chronicle Review - The Chronicle of Higher Education

Shadows are also an inevitable consequence of light. Carl Sagan and
other anti-Velikovskians believed that greater scientific literacy
could "cure" the ill of pseudoscience. Don't get me wrong—scientific
literacy is a wonderful thing, and I am committed to expanding it. But
it won't eradicate the fringe, and it won't prevent the proliferation
of doctrines the scientific community decries as pseudoscience.

Nevertheless, something needs to be done. Demarcation may be an
activity without rules, a historically fluctuating marker of the
worries of the scientific community, but it is also absolutely vital.
Not everything can or should be taught in science courses in school.
Not every research proposal can or should receive funds. When
individuals spread falsehood and misinformation, they must be exposed.

We can sensibly build science policy only upon the consensus of the
scientific community. This is not a bright line, but it is the only
line we have. As a result, we need to be careful about demarcation, to
notice how we do it and why we do it, and stop striving for a goal of
universal eradication of the fringe that is frankly impossible. We
need to learn what we are talking about when we talk about

I used to think correlation implied causation ...

Alerted to this by 


Biologist at UC Berkeley & HHMI; open access advocate and co-founder of Public Library of Science
Berkeley, CA ·

A genetic variant near olfactory receptor genes influences cilantro preference | Haldane's Sieve|ArXived

OMG my wife might actually have a debilitating genetic reason for hating chinese parsley (which i love btw) very tempted to secretly sequence her blood to find out if it's true for her. 
Good Work 23andMe for propagating genetic tolerance of oft misunderstood genetic preferences that have led to divides in human culture!

Our paper: A genetic variant near olfactory receptor genes influences cilantro preference

For our next guest post Nick Eriksson (@nkeriks) writes about his ArXived paper with other23andMe folks: A genetic variant near olfactory receptor genes influences cilantro preference ArXived here

First a little background about research at 23andMe. We have over 150,000 genotyped customers, a large proportion of whom answer surveys online. We run GWAS on pretty much everything trait you can think of (at least everything that is easily reported and possibly related to genetics). Around 2010, we started to ask a couple of questions about cilantro: if people like it, and if they perceive a soapy taste to it.

Fast forward a couple of years, and we have tens of thousands of people answering these questions. We start to see an interesting finding: one SNP significantly associated with both cilantro dislike and perceiving a soapy taste. Best of all, it was in a cluster of olfactory receptor genes.

The sense of smell is pretty cool. Humans have hundreds of olfactory receptor genes that encode G protein-coupled receptors. We perceive smells due to the binding of specific chemicals ("odorants") to these receptors. There are maybe 1000 total olfactory receptors in various mammalian genomes, but it's not totally clear which are pseudogenes. There has probably been some loss of these genes in humans as our sense of smell has become less critical. These genes appear in clusters in the genome, which makes it pretty hard for GWAS to pick out a specific gene. For example, in the first 23andMe paper, we identified a variant in a different cluster of olfactory receptors that affected whether you perceive a certain smell in your urine after eating asparagus. However, we still don't know what the true functional variant in that region is.

Datanami, Woe be me