Computing Resources

The computing resources of the Center for Comparative Genomics include a computer lab with 4 poweful desktop computers and many molecular and phylogenetic analysis applications. Including, MacClade, Geneious, Sequencher, PAUP*, RAxML, MrBayes, TreeView, Fig Tree, Clustal, MAFFT, Muscle, MS Office, Adobe CS3, etc.

The CCG features an Apple Xserve High Performance Computing Cluster with 17 nodes and 136-cores with 8GB ram per node (136GB total). The cluster applications may be accessed via iNquiry, a convenient web interface designed by BioTeam .

The CCG PhyloCluster is a 136-core Apple Xserve High Performance Computing Cluster with 8GB RAM/node (136 GB total). The cluster runs on the Apple Snow Leopard server OS.

Access to the cluster is via the BioTeam iNquiry web interface, which permits submission of various analyses to the cluster through a web browser. Currently available applications include PAUP, MrBayes, RAxML, Garli, BEAST, Clustal, and MAFFT. To request access to the PhyloCluster contact the CCG Lab Manager

Nerdy Specifications

Processors
11 of the 17 nodes have dual quad-core Intel Xeon 2.8GHz processors with 8MB L3 cache per processor. The remaining 6 nodes have dual quad-core 2.26GHz Intel Xeon 5500 series “Nehalem” processors with 8MB L3 cache per processor.
Memory
All 17 nodes have 8 GB of 1066MHz DDR3 ECC memory for a total of 136 GB of RAM.
Storage
The Headnode has 1x3TB 7200 SATA drives with 32MB disk cache configured as RAID 5.The computing nodes each have 1TB SATA drives
all data is read and written to an NSF mounted NetApp FAS 3140 file server with more than 100TB of disk space with maximum capacity of 420TB.
OS
MacOS X Server v10.6 Snow Leopard

The CCG Cumputing Lab has numerous computers available with the latest in molecular and phylogenetic analysis applications. There is a computer room with 3 powerful desktop computers (2 PCs and a MacPro) and a printer.

VALUES in RED should be edited to match YOUR DATA!

Sample NEXUS Sets Block

begin sets;
  CHARSET file=COI=1-688;
  CHARSET file=16S=689-1250;
  CHARSET file=morph=1251-1322;

  CHARSET file=COIpos1=1-688\3;
  CHARSET file=COIpos2=2-688\3;
  CHARSET file=COIpos3=3-688\3;

   TAXSET outgroup=taxon1 taxon2 taxon3;
   TAXSET NoMorph=taxon33 taxon38 taxon50;
   TAXSET COIonly=1-33;
   TAXSET beetles=22 25 27 33 35 40;
END;

 

Command blocks for PAUP*

Simple Parsimony analysis

begin paup;

    log start replace=yes file=FILENAME_log.txt;
    set autoclose=yes criterion=parsimony root=outgroup storebrlens=yes increase=auto;

    outgroup MyOutgroup;

    hsearch addseq=random nreps=1000 swap=tbr hold=1;

    savetrees file=mytrees.tre format=altnex brlens=yes;

log stop; END;

Parsimony Bootstrap analysis

begin paup;

    log start replace=yes file=FILENAME_log.txt;

    set autoclose=yes criterion=parsimony root=outgroup storebrlens=yes increase=auto;

    outgroup MyOutgroup;

    bootstrap nreps=1000 search=heuristic/ addseq=random nreps=10 swap=tbr hold=1;

    savetrees from=1 to=1 file=MyBootTree.tre format=altnex brlens=yes savebootp=NodeLabels

   MaxDecimals=0;

   log stop;

END;

Simple Maximum Likelihood analysis

begin paup;

    log start file=MyML_log.txt;   

    set criterion=distance autoclose=yes storebrlens=yes increase=auto root=outgroup;   

    outgroup MyOutgroup;

    DSet distance=JC objective=ME base=equal rates=equal pinv=0 subst=all negbrlen=setzero;   

    NJ showtree=no breakties=random;

    set criterion=like;

   Lset Base=(0.2892 0.2928 0.1309) Nst=6 Rmat=(3.7285 46.5293 1.3888 2.3793 16.4374)

   Rates=gamma Shape=0.9350 Pinvar=0.5691;

   hsearch addseq=random nreps=5 swap=tbr;

   savetrees file=MyML_tree.tre format=altnex brlens=yes maxdecimals=6;

   log stop;

END;

Maximum Likelihood Bootstrap analysis

begin paup;
   log start file=MyML_log.txt;
   set criterion=distance autoclose=yes storebrlens=yes increase=auto root=outgroup;
   outgroup myOutgroup;
   Lset Base=(0.2892 0.2928 0.1309) Nst=6 Rmat=(3.7285 46.5293 1.3888 2.3793 16.4374);

   Rates=gamma Shape=0.9350 Pinvar=0.5691;
   bootstrap nreps=1000 search=heuristic/ addseq=random swap=tbr hold=1;

   savetrees from=1 to=1 file=MyMLboot_tree.tre format=altnex brlens=yes savebootp=NodeLabels;

   MaxDecimals=0;  
   log stop;
END;

Simple Maximum Parsimony analysis

begin paup;

   log start replace=yes file=FILENAME_log.txt;

   set autoclose=yes criterion=parsimony root=outgroup storebrlens=yes increase=auto;

   outgroup MyOutgroup;

   bootstrap nreps=1000 search=heuristic/ addseq=random nreps=10 swap=tbr hold=1;     

   savetrees from=1 to=1 file=MyBootTree.tre format=altnex brlens=yes savebootp=NodeLabels;

  MaxDecimals=0;     

  log stop;

END;

Command Block for MrBayes

MrBayes analysis using GTR+I+Γ

begin mrbayes;

    log start replace=yes file=FILENAME_log.txt;

   set autoclose=yes;

    lset nst=6 rates=invgamma;    

   mcmc ngen=50000000 samplefreq=1000 printfreq=1000 nchains=4 savebrlens=yes;

    sump burnin=12500;

    sumt burnin=12500;&

   ; log stop;

END;

Command Block for MrBayes

begin mrbayes;

   log start replace=yes file=FILENAME_log.txt;

   set autoclose=yes;

   lset nst=6 rates=invgamma;

   mcmc ngen=50000000 samplefreq=1000 printfreq=1000 nchains=4 savebrlens=yes;

   sump burnin=12500;

   sumt burnin=12500;

   log stop;

END;

Partitioned MrBayes analysis using mixed models

begin mrbayes;

    log start replace=yes file=FILENAME_log.txt;

    set autoclose=yes;

   charset 1stpos = 1-720\3;

   charset 2ndpos = 2-720\3;

   charset 3rdpos = 3-720\3

   partition bycodon = 3:1stpos,2ndpos,3rdpos;

   unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all);

   lset applyto=(1,3) nst=6 rates=invgamma;

   lset applyto=(2) nst=2 rates=gamma;   

    mcmc ngen=50000000 samplefreq=1000 printfreq=1000 nchains=4 savebrlens=yes;   

    sump burnin=12500;

    sumt burnin=12500;   

   log stop;

END;

RAxML allows you to specify the regions of your alignment for which an individual model of nucleotide substitution should be estimated.

This will typically be useful to infer trees for long (in terms of base–pairs) multigene alignments. If, e.g.,-m GTRGAMMA is used, individual α-shape parameters, GTR-rates, and empirical base frequencies will be estimated and optimized for each partition.

To run multiple model analyses you will need to create a simple text file with a name such as "COI_18Sraxml_part.txt".

If you have a pure DNA alignment with 1,000bp from two genes; gene1 (positions 1–500) and gene2
(positions 501–1,000) the information in the multiple model text file should look as follows:

      DNA, gene1 = 1-500

      DNA, gene2 = 501-1000

If gene1 is scattered through the alignment, e.g. positions 1–200, and 800–1,000 you specify this with:

      DNA, gene1 = 1-200, 800-1000

      DNA, gene2 = 201-799

You can also assign distinct models to the codon positions, i.e. if you want a distinct model to be estimated for each codon position in gene1 you can specify:

      DNA, gene1codon1 = 1-500\3

      DNA, gene1codon2 = 2-500\3

      DNA, gene1codon3 = 3-500\3

      DNA, gene2 = 501-1000

If you only need a distinct model for the 3rd codon position you can write:

      DNA, gene1codon1andcodon2 = 1-500\3, 2-500\3

      DNA, gene1codon3 = 3-500/3

      DNA, gene2 = 501-1000

As already mentioned, for AA data you must specify the transition matrices for each partition:

      JTT. gene1 = 1-500

      WAGF, gene2 = 501-800

      WAG, gene3 = 801-800

 

BioTeam's iNquiry web interface provides easy access to command-line based programs using pull-down menus and text fields. iNquiry is the primary way CCG users may submit analyses to the cluster.

To gain access to iNquiry and the PhyloCluster contact the CCG Lab Manager, Anna Sellas.

Programs currently available to CCG PhyloCluster users:

Phylogenetics Population Genetics

Align/Assemble

Primer Design BLAST
PAUP* Haploview ClustalW Primer 3 batchblast
MrBayes   MAFFT   Hmmer
PHYLIP   TIGR Assembler    
BEAST BEAST      
GARLI        
RAxML        

Submitting jobs to the PhyloCluster via iNquiry

PAUP*

The only way to submit a PAUP* analysis to the PhyloCluster is to include a command block in your NEXUS file. For info on creating PAUP* command block go to the NEXUS Commands tab above.*

*For likelihood analyses it is recommended that you estimate a substitution model for your data first. Modeltest and MrModeltest are the most common applications used.

MrBayes

There are two options for submitting MrBayes analyses (For info on creating MrBayes* command blocks go to the NEXUS Commands tab above):

  1. use a command block in your NEXUS file
  2. use the iNquiry "Advanced" interface for MrBayes. This interface has most MrBayes options available in pull-down menus and test fields.*

*For Bayesian analyses it is recommended that you estimate a substitution model for your data first. MrModeltest is the most common application used.

RAxML

There are two options for submitting RAxML analyses:

  1. A simple ML & Bootstrap interface with minimal options.
  2. An "Advanced" interface with all RAxML options.

BEAST

We are currently testing BEAST. Check back in November for updates.

GARLI

We are currently testing GARLI. Check back in November for updates.

News

CCG adding more cores (48), RAM & functionality to the phylocluster.

The CCG submitted a proposal for an NSF Biological Research Collections grant at the end of July 2009 ($196,000).