WHAT THE PROGRAMS DO
Here is a short description of each of the programs. For more detailed
discussion you should definitely read the documentation file for the individual
program and the documentation file for the group of programs it is in.
- PROTPARS
- Estimates phylogenies from protein sequences (input using the
standard one-letter code for amino acids) using the parsimony method, in
a variant which counts only those nucleotide changes that change the amino
acid, on the assumption that silent changes are more easily accomplished.
- DNAPARS
- Estimates phylogenies by the parsimony method using nucleic acid
sequences. Allows use the full IUB ambiguity codes, and estimates
ancestral nucleotide states. Gaps treated as a fifth nucleotide state.
- DNAMOVE
- Interactive construction of phylogenies from nucleic acid sequences,
with their evaluation by parsimony and compatibility and the display of
reconstructed ancestral bases. This can be used to find parsimony or
compatibility estimates by hand.
- DNAPENNY
- Finds all most parsimonious phylogenies for nucleic acid sequences
by branch-and-bound search. This may not be practical (depending on the
data) for more than 10 or 11 species.
- DNACOMP
- Estimates phylogenies from nucleic acid sequence data using the
compatibility criterion, which searches for the largest number of sites
which could have all states (nucleotides) uniquely evolved on the same
tree. Compatibility is particularly appropriate when sites vary greatly in
their rates of evolution, but we do not know in advance which are the less
reliable ones.
- DNAINVAR
- For nucleic acid sequence data on four species, computes Lake's and
Cavender's phylogenetic invariants, which test alternative tree topologies.
The program also tabulates the frequencies of occurrence of the different
nucleotide patterns. Lake's invariants are the method which he calls
"evolutionary parsimony".
- DNAML
- Estimates phylogenies from nucleotide sequences by maximum
likelihood. The model employed allows for unequal expected frequencies of
the four nucleotides, for unequal rates of transitions and transversions,
and for different (prespecified) rates of change in different categories of
sites, with the program inferring which sites have which rates.
- DNAMLK
- Same as DNAML but assumes a molecular clock. The use of the
two programs together permits a likelihood ratio test of the
molecular clock hypothesis to be made.
- DNADIST
- Computes four different distances between species from nucleic acid
sequences. The distances can then be used in the distance matrix programs.
The distances are the Jukes-Cantor formula, one based on Kimura's 2-
parameter method, Jin and Nei's distance which allows for rate variation
from site to site, and a maximum likelihood method using the model employed
in DNAML. The latter method of computing distances can be very slow.
- PROTDIST
- Computes a distance measure for protein sequences, using maximum
likelihood estimates based on the Dayhoff PAM matrix,
Kimura's 1983
approximation to it, or a model based on the genetic code plus a
constraint on changing to a different category of amino acid. The
distances can then be used in the distance matrix programs.
- RESTML
- Estimation of phylogenies by maximum likelihood using restriction
sites data (not restriction fragments but presence/absence of individual
sites). It employs the Jukes-Cantor symmetrical model of nucleotide
change, which does not allow for differences of rate between transitions
and transversions. This program is VERY slow.
- SEQBOOT
- Reads in a data set, and produces multiple data sets from
it by bootstrap resampling. Since most programs in the current version of
the package allow processing of multiple data sets, this can be used
together with the consensus tree program CONSENSE to do bootstrap (or
delete-half-jackknife) analyses with most of the methods in this package.
This program also allows the Archie/Faith technique of permutation of
species within characters, as well as block bootstrap resampling.
- COALLIKE
- May be used, after using SEQBOOT and DNAMLK, to take a treefile
that they produce, and make an estimate of the likelihood curve for
the parameter 4Nu (4 times the product of effective population size and
mutation rate) when the sequences are a sample from a population
and the tree is assumed to be produced by the "coalescent" process.
- FITCH
- Estimates phylogenies from distance matrix data under the "additive
tree model" according to which the distances are expected to equal the sums
of branch lengths between the species. Uses the Fitch-Margoliash criterion
and some related least squares criteria. Does not assume an evolutionary
clock. This program will be useful with distances computed from DNA
sequences, with DNA hybridization measurements, and with genetic distances
computed from gene frequencies.
- KITSCH
- Estimates phylogenies from distance matrix data under the
"ultrametric" model which is the same as the additive tree model except
that an evolutionary clock is assumed. The Fitch-Margoliash criterion and
other least squares criteria are assumed. This program will be useful with
distances computes from DNA sequences, with DNA hybridization measurements,
and with genetic distances computed from gene frequencies.
- NEIGHBOR
- An implementation by Mary Kuhner and John Yamato of Saitou and
Nei's "Neighbor Joining Method," and of the UPGMA (Average Linkage
clustering) method. Neighbor Joining is a distance matrix method producing
an unrooted tree without the assumption of a clock. UPGMA does assume a
clock. The branch lengths are not optimized by the least squares criterion
but the methods are very fast and thus can handle much larger data sets.
- CONTML
- Estimates phylogenies from gene frequency data by maximum likelihood
under a model in which all divergence is due to genetic drift in the
absence of new mutations. Does not assume a molecular clock. An
alternative method of analyzing this data is to compute Nei's genetic
distance and use one of the distance matrix programs.
- GENDIST
- Computes one of three different genetic distance formulas from gene
frequency data. The formulas are Nei's genetic distance, the Cavalli-
Sforza chord measure, and the genetic distance of Reynolds et. al. The
former is appropriate for data in which new mutations occur in an infinite
isoalleles neutral mutation model, the latter two for a model without
mutation and with pure genetic drift. The distances are written to a file
in a format appropriate for input to the distance matrix programs.
- CONTRAST
- Reads a tree from a tree file, and a data set with continuous
characters data, and produces the independent contrasts for those
characters, for use in any multivariate statistics package. Will also
produce covariances, regressions and correlations between characters for
those contrasts.
- MIX
- Estimates phylogenies by some parsimony methods for discrete character
data with two states (0 and 1). Allows use of the Wagner parsimony method,
the Camin-Sokal parsimony method, or arbitrary mixtures of these. Also
reconstructs ancestral states and allows weighting of characters.
- MOVE
- Interactive construction of phylogenies from discrete character data
with two states (0 and 1). Evaluates parsimony and compatibility criteria
for those phylogenies and displays reconstructed states throughout the
tree. This can be used to find parsimony or compatibility estimates by
hand.
- PENNY
- Finds all most parsimonious phylogenies for discrete-character data
with two states, for the Wagner, Camin-Sokal, and mixed parsimony criteria
using the branch-and-bound method of exact search. May be impractical
(depending on the data) for more than 10-11 species.
- DOLLOP
- Estimates phylogenies by the Dollo or polymorphism parsimony criteria
for discrete character data with two states (0 and 1). Also reconstructs
ancestral states and allows weighting of characters. Dollo parsimony is
particularly appropriate for restriction sites data; with ancestor states
specified as unknown it may be appropriate for restriction fragments data.
- DOLMOVE
- Interactive construction of phylogenies from discrete character data
with two states (0 and 1) using the Dollo or polymorphism parsimony
criteria. Evaluates parsimony and compatibility criteria for those
phylogenies and displays reconstructed states throughout the tree. This
can be used to find parsimony or compatibility estimates by hand.
- DOLPENNY
- Finds all most parsimonious phylogenies for discrete-character data
with two states, for the Dollo or polymorphism parsimony criteria using the
branch-and-bound method of exact search. May be impractical (depending on
the data) for more than 10-11 species.
- CLIQUE
- Finds the largest clique of mutually compatible characters, and the
phylogeny which they recommend, for discrete character data with two
states. The largest clique (or all cliques within a given size range of
the largest one) are found by a very fast branch and bound search method.
The method does not allow for missing data. For such cases the T
(Threshold) option of MIX may be a useful alternative. Compatibility
methods are particular useful when some characters are of poor quality and
the rest of good quality, but when it is not known in advance which ones
are which.
- FACTOR
- Takes discrete multistate data with character state trees and
produces the corresponding data set with two states (0 and 1). Written by
Christopher Meacham.
- DRAWGRAM
- Plots rooted phylogenies, cladograms, and phenograms in a
wide variety of user-controllable formats. The program is
interactive and allows previewing of the tree on PC graphics screens,
and Tektronix or DEC graphics terminals. Final output can be on
a laser printer (such as the Apple Laserwriter or HP Laserjet),
on graphics screens or terminals, on pen plotters (Hewlett-Packard or
Houston Instruments) or on dot matrix printers capable of graphics
(Epson, Okidata, Imagewriter, or Toshiba).
- DRAWTREE
- Similar to DRAWGRAM but plots unrooted phylogenies.
- CONSENSE
- Computes consensus trees by the majority-rule consensus tree
method, which also allows one to easily find the strict consensus tree.
Does NOT compute the Adams consensus tree. Trees are input in a tree file
in standard nested-parenthesis notation, which is produced by many of the
tree estimation programs in the package when the Y option is invoked.
This program can be used as the final step in doing bootstrap analyses for
many of the methods in the package.
- RETREE
- Reads in a tree (with branch lengths if necessary) and allows
you to reroot the tree, to flip branches, to change species names and
branch lengths, and then write the result out. Can be used to convert
between rooted and unrooted trees.
Programs in the Unsupported Division
The Unsupported Division of PHYLIP consists of two programs contributed by
others that may be useful to you and have kindly been contributed by their
authors. Those authors retain full copyright to their programs and
documentation files. They are provided in the PHYLIP source code distribution
but have not been provided as executables in the executables distribution. All
questions about these programs should be directed to their authors, whose
electronic mail addresses and regular mail addresses are given in their
documentation files.
- MAKEINF
- This program by Arend Sidow can be used to translate the output files
from Jotun Hein's popular multiple-sequence alignment program into PHYLIP input
files. It also allows you to selectively analyze different codon positions and
different organisms. The output from other alignment programs can rather
easily be edited into a form that it will read.
- PROTML
- This large Pascal program from Jun Adachi and Masami Hasegawa carries
out maximum likelihood estimation of phylogenies from protein sequence data.
It is quite analogous to DNAML, but uses instead of a model for DNA evolution
the PAM matrix model of Margaret Dayhoff. Because of the larger number of
states (20 instead of 4) it is necessarily slower than DNAML by a large factor.
However the authors have adopted a different, and faster, rearrangement
strategy to search among tree topologies for the best one. ProtML does not yet
incorporate the Categories feature of DNAML and DNAMLK which allows different
rates of evolution at different sites, without the user specifying in advance
which site has which rate of evolution. For support, contact them at the
Internet addresses hasegawa@ism.ac.jp and adachi@sunmh.ism.ac.jp at the
Institute of Statistical Mathematics, Tokyo, Japan.
Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)