Computing Resources

The computing resources of the Center for Comparative Genomics include a computer lab with 4 powerful desktop computers containing many molecular and phylogenetic analysis applications. Available programs include MacClade, Geneious, Sequencher, PAUP*, RAxML, MrBayes, TreeView, Fig Tree, Clustal, MAFFT, Muscle, MS Office, Adobe CS3, etc.

 

The CCG features an Apple Xserve High Performance Computing Cluster with 23 nodes and 280-cores with 8-12 GB RAM per node (232 GB total). The cluster applications may be accessed via a convenient web interface.

The CCG PhyloCluster is a 280-core Apple Xserve High Performance Computing Cluster with 8-12 GB RAM/node (232 GB total). The cluster runs on the Apple Snow Leopard server OS.

Access to the cluster is via the BioTeam iNquiry web interface, which permits submission of various analyses to the cluster through a web browser. Currently available applications include PAUP, MrBayes, RAxML, Garli, BEAST, Clustal, and MAFFT. Due to license agreements with software publishers, access to the PhyloCluster is restricted to employees of the California Academy of Sciences. If you want access and are an employee of CAS contact Brian Simison.

 

http://phylocluster.calacademy.org/inquiry

 

Nerdy Specifications

Processors
11 of the 23 nodes have dual quad-core Intel Xeon 2.8GHz processors with 8MB L3 cache per processor. The remaining 12 nodes have dual quad-core 2.26GHz Intel Xeon 5500 series “Nehalem” processors with 8MB L3 cache per processor.
Memory
11 nodes have 8 GB of 1066MHz DDR3 ECC memory and 12  nodes have 12 GB each for a total of 232 GB of RAM.
Storage
The Headnode has 1x3TB 7200 SATA drives with 32MB disk cache configured as RAID 5.The computing nodes each have 1TB SATA drives. All data is read and written to an NFS mounted NetApp FAS 3140 file server with more than 100TB of disk space with maximum capacity of 420TB.
Power APC Smart-UPS RT 8000 208V - 6400 watts
OS
MacOS X Server v10.6 Snow Leopard
iNquiry xml web interface by BioTeam
Managed by Sun Grid Engine 6.2 

The CCG Cumputing Lab has numerous computers available with the latest in molecular and phylogenetic analysis applications. There is a computer room with 3 powerful desktop computers (2 PCs and a MacPro) and a printer.

VALUES in RED should be edited to match YOUR DATA!

Sample NEXUS Sets Block

  
begin sets;
  CHARSET COI=1-688;
  CHARSET 16S=689-1250;
  CHARSET morph=1251-1322;
   CHARSET COIpos1=1-688\3;
  CHARSET COIpos2=2-688\3;
  CHARSET COIpos3=3-688\3;
  TAXSET outgroup=taxon1 taxon2 taxon3;
  TAXSET NoMorph=taxon33 taxon38 taxon50;
  TAXSET COIonly=1-33;
  TAXSET beetles=22 25 27 33 35 40;
END;

 

Command blocks for PAUP*

Simple Parsimony analysis

begin paup;
  log start replace=yes file=FILENAME_log.txt;
  set autoclose=yes criterion=parsimony root=outgroup storebrlens=yes increase=auto;
  outgroup MyOutgroup;
  hsearch addseq=random nreps=1000 swap=tbr hold=1;
  savetrees file=mytrees.tre format=altnex brlens=yes;
  contree all / majrule=yes strict=no treefile=myConsensustree.tre;
  log stop;
END;

 

Parsimony Bootstrap analysis

begin paup;
  log start replace=yes file=FILENAME_log.txt;
  set autoclose=yes criterion=parsimony root=outgroup storebrlens=yes increase=auto;
  outgroup MyOutgroup;
  bootstrap nreps=1000 search=heuristic/ addseq=random nreps=10 swap=tbr hold=1;
  savetrees from=1 to=1 file=MyBootTree.tre format=altnex brlens=yes savebootp=NodeLabels MaxDecimals=0;
  log stop;
END;

 

Simple Maximum Likelihood analysis

begin paup;
  log start file=MyML_log.txt;   
  set criterion=distance autoclose=yes storebrlens=yes increase=auto root=outgroup;
  outgroup MyOutgroup;
  DSet distance=JC objective=ME base=equal rates=equal pinv=0 subst=all negbrlen=setzero;
  NJ showtree=no breakties=random;
  set criterion=like;
  Lset Base=(0.2892 0.2928 0.1309) Nst=6 Rmat=(3.7285 46.5293 1.3888 2.3793 16.4374)
  Rates=gamma Shape=0.9350 Pinvar=0.5691;
  hsearch addseq=random nreps=5 swap=tbr;
  savetrees file=MyML_tree.tre format=altnex brlens=yes maxdecimals=6;
  log stop;
END;

 

Maximum Likelihood Bootstrap analysis

begin paup;
  log start file=MyML_log.txt;
  set criterion=like autoclose=yes storebrlens=yes increase=auto root=outgroup;
  outgroup myOutgroup;
  Lset Base=(0.2892 0.2928 0.1309) Nst=6 Rmat=(3.7285 46.5293 1.3888 2.3793 16.4374)
  Rates=gamma Shape=0.9350 Pinvar=0.5691;
  bootstrap nreps=1000 search=heuristic/ addseq=random swap=tbr hold=1;
  savetrees from=1 to=1 file=MyMLboot_tree.tre format=altnex brlens=yes savebootp=NodeLabels MaxDecimals=0;
  log stop;
END;

  

Partition Homogeneity test (a.k.a Incongruence Length Difference test)

Begin sets;
        charset Molecular = 1 - 650;
        charset Morphological = 651 - .;
    End;
   
begin paup;
set increase;
log file=LetsLogResults.log append;
charpartition P1 = Molecular:Molecular, Morphological:Morphological;

[!Molecules vs Morphology]
exclude uninf ;
[because invariant and autapomorphic characters increase probability of false negatives in the ILD test]
hompart partition=P1 nreps=1000 / start=stepwise addseq=random nreps=3  savereps=no randomize=addseq rstatus=no hold=1 swap=tbr multrees=yes timelimit=600;
log stop;
[Do ILD test for datasets of partition P1, 1000 replicates, 3 random additions per replicate, use TBR swapping, move on to next if random addition replicate takes longer than 10 minutes.]
include all;
log stop;
end;

 


Command Block for MrBayes

MrBayes analysis using GTR+I+Γ

begin mrbayes;
  set autoclose=yes nowarn=yes;
  lset nst=6 rates=invgamma;
  mcmc ngen=10000000 relburnin=yes burninfrac=0.25 samplefreq=1000 printfreq=10000 nchains=4 savebrlens=yes;
  sump burnin=2500;
  sumt burnin=2500;
END;

 

Partitioned MrBayes analysis using mixed models

begin mrbayes;
  set autoclose=yes nowarn=yes;
  charset 1stpos = 1-720\3;
  charset 2ndpos = 2-720\3;
  charset 3rdpos = 3-720\3;
  partition bycodon = 3:1stpos,2ndpos,3rdpos;
  set partition = bycodon;
  unlink shape=(all) pinvar=(all) statefreq=(all) revmat=(all);
  prset applyto=(all) ratepr=variable;

lset applyto=(1,3) nst=6 rates=invgamma;
  lset applyto=(2) nst=2 rates=gamma;
  mcmc ngen=10000000 relburnin=yes burninfrac=0.25 samplefreq=1000 printfreq=10000 nchains=4 savebrlens=yes;
  sump burnin=2500;
  sumt burnin=2500;
END;
RAxML allows you to specify the regions of your alignment for which an individual model of nucleotide substitution should be estimated.

This will typically be useful to infer trees for long (in terms of base–pairs) multigene alignments. If, e.g.,-m GTRGAMMA is used, individual α-shape parameters, GTR-rates, and empirical base frequencies will be estimated and optimized for each partition.

To run multiple model analyses you will need to create a simple text file with a name such as "COI_18Sraxml_part.txt".

If you have a pure DNA alignment with 1,000bp from two genes; gene1 (positions 1–500) and gene2
(positions 501–1,000) the information in the multiple model text file should look as follows:

      DNA, gene1 = 1-500

      DNA, gene2 = 501-1000

 

 

 

 

If gene1 is scattered through the alignment, e.g. positions 1–200, and 800–1,000 you specify this with:

      DNA, gene1 = 1-200, 800-1000

      DNA, gene2 = 201-799

 

 

 

 

You can also assign distinct models to the codon positions, i.e. if you want a distinct model to be estimated for each codon position in gene1 you can specify:

      DNA, gene1codon1 = 1-500\3

      DNA, gene1codon2 = 2-500\3

      DNA, gene1codon3 = 3-500\3

      DNA, gene2 = 501-1000

 

 

 

 

 

 

If you only need a distinct model for the 3rd codon position you can write:

      DNA, gene1codon1andcodon2 = 1-500\3, 2-500\3

      DNA, gene1codon3 = 3-500/3

      DNA, gene2 = 501-1000

 

 

 

 

 

As already mentioned, for AA data you must specify the transition matrices for each partition:

      JTT, gene1 = 1-500

      WAGF, gene2 = 501-800

      WAG, gene3 = 801-800

 

BioTeam's iNquiry web interface provides easy access to command-line based programs using pull-down menus and text fields. iNquiry is the primary way CCG users may submit analyses to the cluster (phylocluster.calacademy.org/inquiry). Due to license agreements with software publishers, access to the PhyloCluster is restricted to employees of the California Academy of Sciences. If you want access and are an employee of CAS contact Brian Simison.

 

Programs currently available to CCG PhyloCluster users:

Phylogenetics Population Genetics

Align/Assemble

Primer Design BLAST
PAUP* 4.0b10 Haploview ClustalW Primer 3 batchblast
MrBayes-MPI 3.1.2   MAFFT   Hmmer
PHYLIP   TIGR Assembler    
BEAST 1.5.4 BEAST      
GARLI        
RAxML-HPC-PTHREADS 7.2.6        

Submitting jobs to the PhyloCluster via iNquiry

PAUP*

The only way to submit a PAUP* analysis to the PhyloCluster is to include a command block in your NEXUS file. For info on creating PAUP* command block go to the NEXUS Commands tab above.*

*For likelihood analyses it is recommended that you estimate a substitution model for your data first. Modeltest and MrModeltest are the most common applications used.

 

MrBayes

There are two options for submitting MrBayes analyses (For info on creating MrBayes* command blocks go to the NEXUS Commands tab above):

    1. use a command block in your NEXUS file

    2. use the iNquiry "Advanced" interface for MrBayes. This interface has most MrBayes options available in pull-down menus and test fields.*

 

*For Bayesian analyses it is recommended that you estimate a substitution model for your data first. MrModeltest is the most common application used.

 

RAxML

There are two options for submitting RAxML analyses:

    1. A simple ML & Bootstrap interface with minimal options.

    2. An "Advanced" interface with all RAxML options.

 

BEAST

We have successfully tested BEAST with simple xml input files, but have had reports of problems with more complex xml files. Please report all errors to Brian Simison.

 

GARLI

We have successfully tested GARLI with simple input files. Please report all errors to Brian Simison.

News

natural history collection

The Importance of Natural History Collections

...

Front Cover Hearst Publication

CAS Special Publication: The Coral Triangle - The 2011 Hearst Philippine Biodiversity Expedition

...

CCG adds six new Xserve nodes to the PhyloCluster.

The CCG PhyloCluster is now connected to the CABI NetApp Network File Server.

A new APC Smart-UPS RT 8000VA uninterruptible power supply (UPS) has been installed for PhyloCluster.

CCG PhyloCluster has been successfully upgraded from 88 cores to 184 cores with the addition of 12 Intel Xeon Nehalem processors.

 

Video Tutorials

The CCG has begun creating video tutorials for some of the applications used for phylogenetic analyses.

Check it out