Monday 3 September 2012

Using your Mac for Seq analysis Updated thread 'Hello - I use to think I was good with a computer'



Jon_Keats has just replied to a thread you have subscribed to entitled - Hello - I use to think I was good with a computer - in the Introductions forum of SEQanswers.

This thread is located at:
http://seqanswers.com/forums/showthread.php?t=4589&goto=newpost

Here is the message that has just been posted:
***************
Hello everyone its been a very long time since I updated this thread.  Over the last year I've had people in my lab maintain a detail protocol on how to build up a new machine to do most NGS analysis, well at least those we do in my group.  So thanks to David and Teja we have a pretty comprehensive instruction set.  Sadly, some are already a bit out of data and I'll try to have a test done on the next machin in our lab and update as needed.

FULL INSTRUCTION LIST:


How To Transform Your Mac Into A Sequencing Analysis Machine

*Introduction*

I'm a newly hired RA from Jonathan Keats's lab who will be helping with a bunch of new sequencing stuff. I have been working on installing the suite of sequencing programs on our new workstation. Before I started, I knew virtually nothing about Terminal, Unix or manipulating sequencing files when I started. (In my mind, Terminal was where you board trains and Unix was some Talaxian from Star Trek: Voyager.) The learning curve has been steep, obviously, but Jonathan's previous posts have been invaluable in making the adjustment.

I wanted to update those posts, however, because (a) some of the instructions have changed as newer versions of applications have appeared; (b) posts could be combined into one gigantic "master-post"; (c) some of the instructions are much more advanced/complicated than others; and (d) some helpful instructions for certain applications weren't included.

To make things easier on the next bright-eyed generation of programming-illiterate biologists, I have included specific code instructions at practically every step of the installation process. After a couple times mentioning a particular command, I will stop including it to save space, so if you're starting from the middle of the instruction set, refer to previous instructions for more information.

If you find this compilation of instructions frustratingly simplistic, then I suggest you read through the previous posts, if only to read through Jonathan's wry comments about the entire bioinformatics process. Hopefully, this post will be helpful to extreme sequencing/Unix novices like myself.

Please let me know if you have any questions, good luck, and happy hunting!

David K. Edwards V and Jonathan Keats

*Before You Begin: Programs*

_Unix_

Before you begin, you should familiarize yourself with Terminal (Applications>Utilities>Terminal). Or better yet, you should invest some time working though the Unix portion of the "Unix and Perl for Biologists" course (http://groups.google.com/group/unix-...for-biologists), made public by Keith Bradham and Ian Korf at UC Davis. Or preorder their book on Amazon: http://www.amazon.com/UNIX-Perl-Rescue-Sciences-Data-rich/dp/0521169828/ref=sr_1_1?ie=UTF8&qid=1330189572&sr=8-1. Tell your PI it will be the best $50 investment of their career!

It's really helpful for beginners understanding non-GUI file manipulations and gives you a good list of important Unix commands. (If you're completely new to programming, it might be too confusing or complicated, but nobody said this was going to be easy.)

Download the entire course package: http://korflab.ucdavis.edu/Unix_and_Perl/index.html.

Here is a general list of helpful Unix commands:

[ul]
[li]To get a manual on any command, type "man command". Type "space" to page down, "b" to back-up, and "q" to quit. [/li]
[li]To see what folder you are in currently, type "pwd".[/li]
[li]To see what folders and files exist in the current directory, type "ls".[/li]
[li]To move into a folder in the current directory, type "cd myfolder". (Note: You can move multiple levels downstream with "cd myfolder/myfolder2".)[/li]
[li]To go back one directory, type "cd ..". (Note: You can move back multiple levels upstream with "cd ../..".) [/li]
[li]To copy a file from the current directory to a downstream folder, type "cp myfile myfolder/". (Note: You can copy a file up one directory with "cp myfile ../".) [/li]
[li]To move a file from the current directory, type "mv" instead of typing "cp". [/li]
[li]A folder immediately downstream of the root directory (i.e. absolute top of the tree) is always defined by "command /folder". (This means if you type "cd /something", it looks for the folder "something" downstream of the root directory.) [/li]
[li]To note the current directory, type ".".[/li]
[li]To change the permissions of the compiled applications, type "chmod 755 myfile". (This makes the file readable and executable by everyone but only writable by you. To allow everybody to do everything to the file, type "chmod 777 myfile".) [/li]
[li]To become a super user for a particular command (and become Superman!), type "sudo".[/li]
[li]To decompress a tarball file, type "tar -xvzf file.tar.gz", where "file.tar.gz" is the decompressed file.[/li]
[/ul]

_Xcode_ (http://developer.apple.com/technolog...ols/xcode.html)

*NOTE: Xcode 4.2 no longer comes with the essential C compiler GCC! (For more information, please visit: http://ask.metafilter.com/200231/How-to-install-gcc-42-on-a-macbook-with-Xcode-42.)*

To install GCC after installing Xcode, follow these instructions (and see some of the proceeding instructions for specific explanations on how to "decompress" and "navigate to" files):

GCC (https://github.com/kennethreitz/osx-gcc-installer/blob/master/README.rst)

[ol]
[li]Click on the link above and download the "GCC-10.7.pkg" file. li]
[li]Move the "GCC-10.7.pkg" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in "./configure", then "make", then "sudo make install".[/li]
[/ol]

You need to install Xcode on your computer so you can compile the various applications.

The newest version available on the App store, Xcode 4.2, is only compatible with OSX Lion (10.7.x). If you have Leopard (10.5.x) or Snow Leopard (10.6.x), then you can install the package from your OS installation disks. Insert Mac OS X Install Disc 2, open the "Xcode Tools" folder, and double click "XCodeTools.mpkg". Otherwise, you need to sign up to be a developer and download it from the website.

_MacPorts_ (http://www.macports.org/)

You need to install some packages to run certain applications. There are two programs to install those packages, Fink and MacPorts. There isn't much difference between both programs; in general, Fink is more conservative about upgrading packages that MacPorts, but both are perfectly acceptable. I simply chose MacPorts for this protocol.

_R and Bioconductor_

R: You will need R to perform statistical computations and generate graphs from your data. To install, visit http://www.r-project.org/, then select preferred CRAN mirror and follow the instructions.

Bioconductor: You will probably need Bioconductor to analyze your high-throughput genomic data. To install Bioconductor, you must have the most recent release version of R. The most common packages you will need to install are affy, simpleaffy, and gcmra.

To install these packages, starting first with affy, simply start R and type in the following:


Code:
---------
source("http://bioconductor.org/biocLite.R")
biocLite("affy")
---------
Press enter. R will automatically install the dependencies 'Biobase', 'affyio', and 'preprocessCore' during this installation.

To install simplaffy, replace "affy" with "simplaffy" in the above code and press enter. R will automatically install the dependencies 'DBI', 'RSQLite', 'xtable', 'IRanges', 'AnnotationDbi', 'annotate', 'Biostrings', 'genefilter', and 'gcrma'.

There are three other dependencies you should install:

[ul]
[li]DESeq: Follow instructions above and type in "DESeq". (For more information, please visit: http://www-huber.embl.de/users/anders/DESeq/.)[/li]
[li]edgeR: Follow instructions above and type in "DESeq". (For more information, please visit: http://www.bioconductor.org/packages/release/bioc/html/edgeR.html.)[/li]
[li]CummeRbund: Follow instructions above and type in "cummeRbund".[/li]
[/ul]

(NOTE: The version of cummeRbund that is installed through the current BioConductor development version is 1.0.0.  The latest version, version 1.1.3 will be available as part of the Bioconductor development version 2.10, which will be made available in April 2012. For more information, please visit: http://compbio.mit.edu/cummeRbund/index.html.)

For more installation instructions, visit http://www.bioconductor.org/install/. (For this protocol, the current release version of R is 2.14, and the currently released Bioconductor version is 2.9.)

*Before You Begin: Folders*

You should establish a series of folders to manage your sequencing data and move around after each step is completed. You don't necessarily have to follow this system of folders and subfolders, but all of our instructions for installing programs are based on this file hierarchy, so if you want to avoid confusion, and jump on our awesome folder-managing bandwagon, then read carefully!

Here is our system of folders and subfolders:

We have a main working directory called "ngs" in our $HOME directory (Users/YourUserName/). This is our home base for data analysis, and all of our steps and scripts will be called from this folder. Here are our subfolders within "ngs":

[ul]
[li]ngs/{applications,bwa,run_parameters,run_parameters,scripts,temp,tophat,tophat_fusion}[/li]
[li]ngs/analyzed_read_files/{chipseq,exomes,genomes,matepair,rnaseq}[/li]
[li]ngs/finaloutputs/{chipseq,exomes,genomes,matepair,rnaseq}[/li]
[li]ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,downloads}[/li]
[li]ngs/refgenomes/downloads/{ncbi36_hg18,grch37_hg19}[/li]
[li]ngs/refgenomes/downloads/ncbi36_hg18/{annotation_files,reference_sequences}[/li]
[li]ngs/refgenomes/downloads/grch37_hg19/{annotation_files,reference_sequences}[/li]
[/ul]

Each of these subfolders have subfolders, so instead of listing everything here, please visit the script "create_ngs_directorystructure_v4.sh" (http://seqanswers.com/forums/showthread.php?t=4589&page=4, post #61) for more information. (To run the script, simply copy and paste the code included in that post when you immediately start Terminal, or when you are in the home directory. The corresponding files and folders will be created.)

*Before You Begin: Picking Genome Files*

[Maq is no longer included in this protocol because of recent improvements to BWA. If you need to install Maq, please see Jon's preceding post on how to install it.]

This step is important and can be the source of most issues.  You need to pick a source for all information genome sequence files and annotations.  We use ensembl over UCSC for many reasons.  For human genome reference files, we recommend the 1000 genomes versions.  They think about the human genome much more than you do, so give them some credit. Besides, many of the applications you will use are published by those groups, so running them is streamlined and less complicated.

We will be using BWA to align our sequencing data against the reference genome (see BWA installation instructions under "Installing Programs"). You might think to use ensembl (http://www.ensembl.org/info/data/ftp/index.html) to get your reference genome, but the full human genome file (Homo_sapiens.GRCh37.66.dna_rm.toplevel.fa.gz) exceeds the maximum character length allowed by BWA's index command.

Instead, you should use the 1000 Genomes reference genome (ftp://ftp.sanger.ac.uk/pub/1000genom...ect_reference/). You need to save the reference genome onto your computer:

[ol]
[li]Copy the file human_g1k_v37.fasta.gz" to your "ngs/refgenomes" folder.[/li]
[li[Decompress the file by double clicking on it.[/li]
[/ol]

*Installing Applications*

Welcome to the meat-and-potatoes of this somewhat bloated post: program installation. This section has been written in chronological order, meaning that I started with the first program and proceeded onward to the last program. Some of the programs require that you have installed other programs, and unfortunately, unless explicitly mentioned, I don't know which programs have those requirements.

Therefore, I recommend you follow the same installation order for your own computer. This will certainly make things simpler for newbies like myself, especially since I included the commonly used programs (e.g. BWA) before the less commonly used programs (e.g. Cairo).

As mentioned above, if you're skipping around, I have written next to each application if it requires one of the preceding applications. However, I can't be sure that this information is correct, so if you encounter a problem during installation, please let us know and we can amend our instructions.

Final note: The version numbers of programs might be out-of-date, so please change the instructions based on those new version numbers. We will try to update this document periodically to avoid this problem, but you should be forewarned!

_Setting Your Path Directory_

To run many of the applications, you will need to either place the applications in the PATH, define additional PATH locations, or note the location of the application each time you call it. To find the current PATH directories used by Unix, type "$PATH". You should see something similar to the following:


Code:
---------
-bash: /sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin
---------


These folders are directly below the root directory and represent the places Unix looks when running an application. If you want to run any of these applications, you must download and compile the application. Before you begin installing any application, do the following (thanks to Nils Homer for the suggestion):

[ol]
[li]Create a directory in your home directory for the applications:


Code:
---------
mkdir -p $HOME/local/bin
---------
[/li]

[li]Edit your .profile file so this directory is in your PATH directories (you should see a file called ".profile".):


Code:
---------
ls –a
---------
[/li]

[li]Open with nano by typing:


Code:
---------
nano .profile
---------
[/li]

[li]Add the following lines to your .profile file but *DO NOT* remove things in the current version: (you don't need "sudo"):


Code:
---------
export PATH=$HOME/local/bin:$PATH
---------
[/li]

[li]Save your changes by typing "control O".[/li]
[li]Exit nano by typing "control X".[/li]
[/ol]

Additionally, when you install applications, place the executable files in this directory so they are in a $PATH directory. You can either copy the application to the directory $HOME/local/bin or install using install script "./configure --prefix=$HOME/local"

_BWA_ (http://sourceforge.net/projects/bio-bwa/files/; change naming in instructions based on BWA version)
NOTE: Reference indexes created in previous versions do not work in version 6 so you need to reindex each reference if you have worked with previous versions or more importantly if you are setting up and someone is providing you pre-index reference files
[ol]
[li]Click on the link above and download the newest version (called "bwa-0.6.1.tar.bz2").[/li]
[li]Move the "bwa-0.6.1.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file by double clicking on it.[/li]
[li]Open Terminal (if previously open, ensure you are in your home directory).[/li]
[li]Navigate to the decompressed folder by typing:


Code:
---------
cd ngs/applications/bwa-0.6.1
---------
[/li]

[li]Compile the application by typing:


Code:
---------
make
---------
[/li]

[li]Lines of code will start appearing under your command. Make sure that no errors are listed![/li]
[/ol]

You can confirm that the installation was successful by typing:


Code:
---------
./bwa
---------


This should bring up a window with the BWA command options. (The first line is "Program: bwa (alignment via Burrows-Wheeler transformation)".)

_SAMtools_ (http://sourceforge.net/projects/samtools/; change naming in instructions based on SAMtools version)

[ol]
[li]Click on the link above and download the newest version (called " samtools-0.1.18.tar.bz2").[/li]
[li]Move the "samtools-0.1.18.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file by double clicking on it.[/li]
[li]Open Terminal (if previously open, ensure you are in your home directory).[/li]
[li]Navigate to the decompressed folder by typing:


Code:
---------
cd ngs/applications/samtools-0.1.18
---------
[/li]

[li]Compile the application by typing:


Code:
---------
make
---------
[/li]

[li]Lines of code will start appearing under your command. Make sure that no errors are listed![/li]

You can confirm that the installation was successful by typing:


Code:
---------
./samtools
---------


This should bring up a window with the SAMtools command options. (The first line is "Program: samtools (Tools for alignments in the SAM format)".)

[li]Copy "samtools" to your path directory by typing:


Code:
---------
cp samtools $HOME/local/bin
---------
(We are assuming you followed our path directory here. If not, then change "$HOME/local/bin" to your location of choice.)[/li]
[/ol]

*Note: To save space, we have reduced the number of specific instructions, so instead of writing the exact lines of code required for commands, we will simply summarize them. This applies to decompressing the file, navigating to the decompressed folder, compiling the application, and copying to your path directory.*

*GATK* (ftp://ftp.broadinstitute.org/pub/gsa/GenomeAnalysisTK/GenomeAnalysisTK-latest.tar.bz2)

According to the website (http://www.broadinstitute.org/gsa/wiki/index.php/Downloading_the_GATK, "Outside the Broad Institute"), before you install GATK, you need to install three applications: JVM (Java Virtual Machine), Apache Ant, and Git. GATK requires that your version of JVM is 1.6 or greater, and your version of Apache Ant is 1.7.1 or greater.

JVM (Java Virtual Machine)

You should have JVM already installed on your computer. To confirm this, open Terminal and type:


Code:
---------
java –version
---------
Three lines of code should appear, starting with java version "1.6.0_29". To update Java, search "Java" on the Apple website and find the most recent version that corresponds to your operating system.

Ant (http://ant.apache.org/)

You should already have Apache Ant installed on your computer. To confirm this, open Terminal and type:


Code:
---------
ant –version
---------
You should see something like this: "Apache Ant(TM) version 1.8.2 compiled on October 14 2011". If that doesn't work, here's how to install Ant manually:

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "apache-ant-1.8.2-bin.tar.bz2".)[/li]
[li]Move the "apache-ant-1.8.2-bin.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[li]Follow the somewhat complex instructions in the manual. To access the manual, click on the decompressed folder and look under docs/manual/install.html.[/li]
[/ol]

Git (http://git-scm.com/download)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "git-1.7.9.1-intel-universal-snow-leopard.dmg".)[/li]
[li]Install like any ordinary Mac application. (You thought it would be more complicated, right? You're welcome!)[/li]
[/ol]

Now, onto installing GATK:

[ol]
[li]Click on the link above.[/li]
[li]Move the "GenomeAnalysisTK-latest.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[/ol]

To confirm this, Type in "java –jar GenoneAnalysisTK.jar --help".  (Do not copy this text! You will need to handtype it.)

You should see a message like: The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04.


_Bowtie_ (http://sourceforge.net/projects/bowtie-bio/files/bowtie)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "bowtie-0.12.7-src.zip".)[/li]
[li]Move the "bowtie-0.12.7-src.zip" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Compile the application ("make").[/li]
[li]Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory.[/li]
[/ol]

To test the installation, navigate to the bowtie folder and type:


Code:
---------
bowtie indexes/e_coli reads/e_coli_1000.fq
---------
You should see a bunch of information stream onto the screen, and at the bottom, you should see:


Code:
---------
# reads processed: 1000
# reads with at least one reported alignment: 699 (69.90%)
# reads that failed to align: 301 (30.10%)
Reported 699 alignments to 1 output stream(s)
---------
_Boost_ (http://www.boost.org/)[Prerequisites: SAMtools, $PATH configuration.]

*WARNIHG: Do not download the newest version of Boost (i.e., version 1.48.0)! This version will not natively work with this protocol.  Instead, install any earlier version of Boost—we recommend version 1.47.0—and follow the instructions below. (For more information, and instructions on how to modify the latest version of Boost, please visit: http://seqanswers.com/forums/showthread.php?t=16637.)*

[ol]
[li]Click on the link above and download the latest version. (MAKE SURE THIS IS VERSION "boost_1_47_0.tar.bz2" OR EARLIER.)[/li]
[li]Move the "boost_1_47_0.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Build/bootstrap the package by typing:


Code:
---------
./bootstrap.sh
---------
[/li]

[li]:Type in the following command:


Code:
---------
./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install
---------


This command will take awhile, so take your coworkers out for cappuccinos or something while you wait. Once it's finished, the command will create "include" and "lib" subfolders in $HOME/local. You might get some error messages for which targets failed or were skipped, but ignore that because it won't affect your other applications.[/li]
[li]In the new "include" folder, create a subfolder "bam".[/li]
[li]Using Terminal, navigate to the SAMtools folder within ngs/applications.[/li]
[li]Copy the "libbam.a" file in the SAMtools folder to $HOME/local/lib by typing:


Code:
---------
cp libbam.a $HOME/local/lib
---------
[/li]

[li]Copy the header files (files ending in .h) in the SAMtools folder to $HOME/local/include/bam by typing:


Code:
---------
cp *.h $HOME/local/include/bam
---------
[/li]
[/ol]

_Tophat_ (http://tophat.cbcb.umd.edu/) [Prerequisites: Bowtie, SAMtools.]

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "tophat-1.4.1.tar.gz". Click on the option that says "Source Code.")[/li]
[li]Move the "tophat-1.4.1.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Build the package by typing


Code:
---------
./configure --prefix=$HOME/local --with-bam=$HOME/local
---------
[/li]

[li]Compile the application (by typing "make").[/li]
[li]Make the executable available in your $PATH directory by typing:


Code:
---------
make install
---------
[/li]
[/ol]

To test the Tophat installation, please visit the download website (http://tophat.cbcb.umd.edu/tutorial.html; search under "Testing the installation") and follow these instructions:

[ol]
[li] Click on the link above and download the file. (This file will probably be "test_data.tar.gz".[/li]
[li]Decompress the folder and navigate to it.[/li]
[li]To process the data, type:


Code:
---------
tophat -r 20 test_ref reads_1.fq reads_2.fq
---------
[/li]
[/ol]

You should see lines of code after your command, beginning with something like the following:


Code:
---------
[Mon May  4 11:07:23 2009] Beginning TopHat run (v1.1.1)
-----------------------------------------------
---------
_Cufflinks_ (http://cufflinks.cbcb.umd.edu/tutorial.html) [Prerequisites: Boost (SAMtools).]

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "cufflinks-1.3.0.tar.gz". Click on the option that says "Source Code.")[/li]
[li]Move the "cufflinks-1.3.0.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Build the package (with Boost, so different from Tophat instructions!) by typing


Code:
---------
./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local
---------
[/li]

[li]Compile the application (by typing "make").[/li]
[li]Make the executable available in your $PATH directory by typing:


Code:
---------
make install
---------
[/li]
[/ol]

To test the installation, you will need to download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html#ref; look under "Testing the installation). You can download the test text file anywhere (e.g. within your username folder) and navigate to that folder.

Process the test data by typing:


Code:
---------
cufflinks test_data.sam
---------
You should see the following at the beginning of your output:


Code:
---------
You are using Cufflinks v1.3.0, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
---------
_VarScan_ (http://varscan.sourceforge.net/) (Prerequisites: Samtools?)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "VarScan.v2.2.8.jar".)[/li]
[li]Move the "VarScan.v2.2.8.jar" file to your "ngs/applications" folder.[/li]
[li]Navigate to your "applications" folder.[/li]
[/ol]

To test the installation, type:


Code:
---------
java -jar VarScan.v2.2.8.jar
---------
You should see the following at the beginning of your output:


Code:
---------
VarScan v2.2

USAGE: java net.sf.varscan.VarScan [COMMAND] [OPTIONS]
---------
_Picard_ (http://picard.sourceforge.net/)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "picard-tools-1.62.zip".)[/li]
[li]Move the "picard-tools-1.62.zip" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[li]Copy all .jar applications to your $PATH directory by typing:


Code:
---------
cp .jar $HOME/local/bin
---------
While this step isn't required, it makes things easier and the pipelines we provide use this concept.[/li]
[/ol]

_snpEff_ (http://snpeff.sourceforge.net/download.html)

To install snpEff, you must install both the program and the corresponding reference genome. These instructions include installing the most recent human genome from Ensembl (which is provided on their website). If you use a different genome, make sure that your genome version matches your snpEff version. (In other words, in this example, the genome version is for "v2_0_5" and the snpEff version is for "v2_0_5d".)

[ol]
[li]Click on the link above and download the latest version of snpEff. (This version will probably be "snpEff_v2_0_5d_core.zip".)[/li]
[li]Move the "snpEff_v2_0_5d_core.zip" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[li]In the link above, download the latest version of the reference genome (This version will probably be "snpEff_v2_0_5_GRCh37.65.zip".)[/li]
[li]Move the "snpEff_v2_0_5_GRCh37.65.zip" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[/ol]


At this point, you're probably feeling comfortable with these instructions, maybe even patting yourself on the back for understanding them. Well, prepare for more confusion, because we're entering the wonderful seafaring world of ports!

For the following applications, you will need to install additional ports on your computer. There are two websites you can use to install them: MacPorts (http://www.macports.org/) and fink (http://www.finkproject.org/). The difference between them is that, in general, fink is more conservative about upgrading packages than MacPorts, so while the MacPorts version will be newer, the fink version might be more stable. We selected MacPorts for installing our packages, so our instructions will be tailored toward that program.

_MacPorts_ (http://www.macports.org/install.php) [Prerequisites: XCode.]

To install MacPorts, please visit that website. Choose your operating system under the "Mac OS X Package (.pkg) Installer" section. Install like any ordinary software application.

To test the installation, close Terminal, meaning completely quit the application, and restart to run MacPorts. To begin the program, type in "sudo port". You should see:


Code:
---------
MacPorts 2.0.3
Entering interactive mode... ("help" for help, "quit" to quit)
---------
To install any port, type:


Code:
---------
install program
---------
where "program" is name of port you're installing. This is the method for installing any of the ports used by the subsequent applications. As the program indicates, to exit MacPorts, type "quit" and press enter.

_FastX_ (http://hannonlab.cshl.edu/fastx_toolkit/download.html)

MacPorts: Install "pkgconfig". (The program is called "pkgconfig 0.26", found on page 171 of the MacPorts website.)

[ol]
[li]Click on the link above and download libgtextutils. (This version will probably be "libgtextutils-0.6.tar.bz2".)[/li]
[li]Move the "libgtextutils-0.6.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in "./configure" and press enter.
[li]To compile the application completely, type in "make" and press enter, then type in "sudo make install" and press enter.
[li]Make sure the program can identify gtextutils" by typing:


Code:
---------
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
---------
[/li]
[li]Once that command is processed, type:


Code:
---------
pkg-config --cflags gtextutils-0.1
---------


You should see the following response:


Code:
---------
-I/usr/local/include/gtextutils-0.1/
---------
(If you have any questions about this step, or have any troubleshooting concerns about installing this application, please visit: http://hannonlab.cshl.edu/fastx_toolkit/pkg_config_email.txt.)[/li]

[li]Click on the link above and download the latest version of FastX. (This version will probably be "fastx_toolkit-0.0.13.tar.bz2".)[/li]
[li]Move the "fastx_toolkit-0.0.13.tar.bz2" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in "./configure", then "make", then "make install".[/li]
[/ol]
_Circos_ (http://mkweb.bcgsc.ca/circos/software/download/)
Before installing Circos, you will need to update your perl distribution to install all of Circos's required packages. To install the packages, type the following in Terminal:


Code:
---------
sudo perl -MCPAN -e shell
---------
When it asks if you would like the program to configure things automatically, and choose the best CPAN mirror sites, type "yes".

To install any package, type:


Code:
---------
install program
---------
where "program" is name of package you're installing. Before installing these packages, however, you will need to install GD. (I know, it's like Inception, with a program installation within a program installation within a program….)

GD (http://code.google.com/p/google-desktop-for-linux-mirror/downloads/detail?name=gd-2.0.35.tar.gz&can=2&q)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "gd-2.0.35.tar.gz".)[/li]
[li]Move the "gd-2.0.35.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]To install the program, like installing previous programs, type in "./configure", then "make", then "sudo make install".[/li]
[/ol]

Here is a list of the packages you will need to install (please install them in the following order because some of the packages require other packages):

YAML
Config::General (v2.50 or later)
GD::Polyline (requires YAML)
List::MoreUtils
Math::Bezier
Math::Round
Math::VecStat
Params::Validate
Readonly
Regexp::Common
Set::IntSpan (v1.16 or later)
Clone
Text::Format

Also, if you get the message that says something like this:


Code:
---------
New CPAN.pm version (v1.9800) available.
  [Currently running version is v1.9456]
---------
then type "install CPAN", then "reload CPAN", to update to the latest CPAN version. (This process takes a couple minutes.)

All right, here are the instructions for installing Circos:

[ol]
[li]Click on the link above and download the bug fixes version. (This version will be something like"circos-0.56-1.tgz".)[/li]
[li]Move the "circos-0.56-1.tgz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[li]Click on the link above and download the latest version. (This version will probably be "circos-0.56.tgz".)[/li]
[li]Move the "circos-0.56.tgz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
 [li]Drag the decompressed file within the bug fixes version into the file of the latest Circos version. When prompted, choose "replace file".[/li]
[/ol]

To test the Circos installation, please visit this website (http://circos.ca/software/download/tutorials/) and follow these instructions:

[ol]
[li]Click on the link above and download the tutorial file (This version will be something like"circos-tutorials-0.56.tgz".)[/li]
[li]Move the "circos-tutorials-0.56.tgz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file.[/li]
[li]Drag the decompressed tutorial file into the file of the latest Circos version. When prompted, choose "replace file".[/li]
[li]Navigate to the "circos-0.56" folder.[/li]
[li]Access the tutorial by typing:


Code:
---------
cd tutorials/2/2
---------
[/li]
[li]Test the tutorial by typing:


Code:
---------
../../../bin/circos -conf ./circos.conf
---------
[/li]
[/ol]

You should see a series of commands flash onto the screen, eventually ending with:


Code:
---------
debuggroup summary,output 4.85s created PNG image ./circos.png (839 kb)
debuggroup summary,output 4.86s created SVG image ./circos.svg (356 kb)
---------
If you navigate to that folder manually ("circos-0.56/tutorials/2/2") and click on the "circos.png" file, you should see a circular graph of each human chromosome in different colors.

Finally, we copied the binary and library files to your path directory so you can just type "circos" instead of "bin/circos" each time you run the program. If you follow our folder hierarchy, then type the following commands in sequential order:


Code:
---------
cd ngs/applications/circos-0.56/bin
cp circos $HOME/local/bin
cd ../lib
cp circos.pm $HOME/local/lib
---------
Also, within the circos folder, to create a couple directories for your personal use, type the following commands in sequential order:


Code:
---------
cd ngs/applications/circos-0.52
mkdir my_plots
mkdir my_reference_files
mkdir my_config_files
mkdir my_data_files
---------
Once you've created those directories, you need to populate your reference files. (For more information, please visit: http://circos.ca/tutorials/.) When you visit that website, you can download the hg19 karyotype, decompress the corresponding file, and drag it into your newly created "my_reference_files" folder.
_BEDTools_ (http://code.google.com/p/bedtools/)

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "BEDTools.v2.15.0.tar.gz".)[/li]
[li]Move the "BEDTools.v2.15.0.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it. (NOTE: The file will be renamed to something like "BEDTools-Version-2.15.0".)[/li]
[li]To install the program, type in "make clean, then "make all". You should see a series of commands being processed.[/li]
[li]To list the available binaries and confirm that they installed, type "ls bin". You should see columns of files beginning with "annotateBed" in the upper lefthand corner and ending with "windowMaker" in the lower righthand corner.[/li]
[li]Copy the binaries to your PATH directory by typing:

Code:
---------
cp bin/* $HOME/local/bin
---------
[/li]
[/ol]
_Pairoscope_ (http://pairoscope.sourceforge.net/) [Prerequisite: SAMTools]
Truthfully, installing this program is difficult, so brace yourselves, folks. Or as Samuel Jackson says in Jurassic Park, "hold onto your butts."

Before installing pairoscope, you need to install Cairo. To install Cairo, type:


Code:
---------
sudo port install cairo
---------
You should get the following response:


Code:
---------
 --->  Computing dependencies for cairo
--->  Cleaning cairo
---------
Also, before installing pairoscope, you need to install CMake (http://www.cmake.org/cmake/help/install.html). To install the program, click on the link above and download the latest version. (This version will probably be "cmake-2.8.7-Darwin64-universal.dmg".) Simply install like you would a normal application. (Oh, and when the bouncing colorful triangle appears on your Dock, click to "install command line links".)

Finally, here are the instructions to install pairoscope:

[ol]
[li]Click on the link above and download the latest version. (This version will probably be "pairoscope-0.2.tgz".)[/li]
[li]Move the "pairoscope-0.2.tgz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to the applications folder.
To install pairoscope, type:


Code:
---------
ccmake pairoscope-0.2
---------
The screen will transform and you will see a series of capitalized instructions on the left and corresponding answers written in white text on the right. To toggle advanced mode, type "t".

Scroll all the way down with the arrow keys until you reach "Page 2 of 2". (NOTE: The following series of instructions are based on our folder architecture, and assume that you followed our instructions for installing SAMTools. If your folder architecture is different, please point ccmake to your corresponding SAMTools directories.)

To edit the Samtools include and library locations, follow these instructions:

[ul]
[li]Under "Samtools_INCLUDE_DIR", type "/-----/local/include/bam".[/li]
[li]Under "Samtools_LIBRARY", type "/-----/local/lib/libbam.a".[/li]
[/ul]

where "-----" is the exact folder hierarchy of your computer. (To access that exact hierarchy, type in "cd" in the command line and type in "pwd". The resulting line of code should be pasted into the "-----" section described above.)

To configure, type "c". You should see a warning appear that starts with:


Code:
---------
CMake Warning (dev) in CMakeLists.txt:
---------
You can ignore this warning, so type "e". To generate and exit, type "g".

Now, pairoscope is ready. To make pairoscope, navigate to the "applications" folder and type:


Code:
---------
cmake pairoscope-0.2
---------
You should see a series of commands ending with:


Code:
---------
 -- Build files have been written to: /-----/ngs/applications
---------
where the "-----" is the same prefix described above.

A new folder called "CMakeFiles" has been created in the "applications" folder. To make, navigate to the "applications" folder and type "make". You will see a bunch of purple and green commands beginning with:


Code:
---------
Scanning dependencies of target pairoscope
---------
Copy the newly-created pairoscope program to your $PATH by typing:


Code:
---------
cp pairoscope $HOME/local/bin
---------
To test the installation, type "pairoscope". You should see a series of commands beginning with the following:


Code:
---------
Usage:   pairoscope [options] <align.bam> <chr> <start> <end> <align2.bam> <chr2> <start2> <end2>
---------
 _FastQC_ (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be "Source Code for FastQC v0.10.0 (zip file)". Please download the Source Code version.)[/li]
[li]Move the "fastqc_v0.10.0_source.zip" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it. (NOTE: The file will be renamed "FastQC".)[/li]

And that's it! (Seriously! According to the installation files: "Once unzipped it's ready to go.")
_HTSeq_ (http://pypi.python.org/pypi/HTSeq)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be "HTSeq-0.5.3p3.tar.gz".)[/li]
[li]Move the "HTSeq-0.5.3p3.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Install the program by typing:

Code:
---------
sudo python setup.py install
---------
[/li]
[/ol]

You should see a series of commands being processed and ending with:

Code:
---------
Finished processing dependencies for HTSeq==0.5.3p3
---------
(For more information about program installation, please visit: http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html.)
_chimerascan_ (http://code.google.com/p/chimerascan/)
[ol]
[li]Click on the link above and download the latest version. (This version will probably be "chimerascan-0.4.5a.tar.gz".)[/li]
[li]Move the "chimerascan-0.4.5a.tar.gz" file to your "ngs/applications" folder.[/li]
[li]Decompress the file and navigate to it.[/li]
[li]Build the program by typing:

Code:
---------
python setup.py build
---------
[/li]
[li]Install the program by typing:

Code:
---------
sudo python setup.py install
---------
[/li]

To test the installation, you need to access python. To do that, leave the directory (you can type "cd ../" to move into the "applications" folder) and type:


Code:
---------
python
---------
You should see something like:


Code:
---------
Python 2.6.1
Type "help", "copyright", "credits" or "license" for more information.
---------
To test that the chimerascan libraries are in your PYTHONPATH, type "import chimerascan", then "chimerascan.__version__". (Just in case that last command is obscured, you should type in "chimerascan" followed by a period, followed by two underscores, then "version", then two underscores.) You should see the following:


Code:
---------
'0.4.5'
---------
Success! To exit python, type:


Code:
---------
exit()
---------
Congratulations! You now have a working computer that can handle just about any sequencing data you throw into it!
If you have any problems during the installation process, I recommend that you search online for the error message you received. That's how I managed to resolve many of the difficulties I encountered during this whole process.
Additionally, you should read the README files (you can by typing "less README" when you are in the program's directory) when you have problems, because they might give you helpful information about what's going wrong with that program.
Finally, please remember that this document is a work in progress. Right now, we have created a system that can manage the installation of the current application versions, but these versions often change, and with those changes come new program requirements or permissions. If you encounter any problems with future versions, please respond to this thread (preferably with a solution!) and we will make the corresponding updates.
(This document was made with help from Venkata Yellapantula.)
Last updated: March 7, 2012
***************


No comments:

Post a Comment

Datanami, Woe be me