If I copied PHYLIP from a friend without you knowing, should I try to keep you from finding out?"
No. It is to your advantage and mine for you to
let me know. If you did not get PHYLIP "officially" from me or from someone
authorized by me, but copied a friend's version, you are not in my database of
users. You probably also have an old version which has since been
substantially improved (see the beginning of this main document file for the
date on which this version was released). I don't mind you "bootlegging"
PHYLIP (it's free anyway, and that saves me the work of writing diskettes), but
you should realize that you may have an outdated version. You may be able to
get the latest version just as quickly over Internet. You can read about
subsequent bug fixes in the electronic news bulletins the person you got it
from may (or may not) have subscribed to. It will help both of us if you get
onto my mailing list. If you are on it, then I will give your name to other
nearby users when they get a new copy, and they are urged to contact you and
update your copy. (I benefit by getting a better feel for how many
distributions there have been, and having a better mailing list to use to give
other users local people to contact). Send me your name and address (five
lines maximum), and your phone number, with the number of the version that you
have, plus the type of your computer, operating system, and C compiler, so that
I can add you to the address list. Note also the listserver information which
you can get, which provides news about PHYLIP by electronic mail. This is
described in the next to last section of this document.
"How do I make a citation to the PHYLIP package in the paper I am writing?"
One way is like this:
Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.
Distributed by the author. Department of Genetics, University of
Washington, Seattle.
or if the editor for whom you are writing insists that the citation must be to
a printed publication, you could cite a notice for version 3.2 published in
Cladistics:
Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2).
Cladistics 5: 164-166.
For a while a printed version of the PHYLIP documentation was available and one
could cite that. This is no longer true. Other than that, this is difficult,
because I have never written a paper announcing PHYLIP!
My 1985b paper in
Evolution on the bootstrap method contains a
one-paragraph Appendix describing the availability of this package, and that
can also be cited as a reference for the package, although it has been
distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP
is needed mostly to give people something to cite, as word-of-mouth, references
in other people's papers, and electronic newsgroup postings have spread the
word about PHYLIP's existence quite effectively.
"How do I bootstrap? Why has DNABOOT disappeared?"
DNABOOT, BOOT, and
DOLBOOT, the previous parsimony-based bootstrap programs, have been removed
from the package as there is now a more general way of bootstrapping. It
involves running SEQBOOT to make multiple bootstrapped data sets out of your
one data set, then running one of the tree-making programs with the Multiple
data sets option to analyze them all, then running CONSENSE to make a majority
rule consensus tree from the resulting tree file. Read the documentation of
SEQBOOT to get further information. Before, only parsimony methods could be
bootstrapped. With this new system almost any of the tree-making methods in
the package can be bootstrapped. It is somewhat more tedious but you will find
it much more rewarding.
"How do I specify a multi-species outgroup with your parsimony programs?"
It's not a feature but is not too hard to do in many of the programs. In
parsimony programs like MIX, for which the W (Weights) and A (Ancestral states)
options are available, and weights can be larger than 1, all you need to do is:
In programs like DNAPARS, you cannot use this method as weights of sites
cannot be greater than 1. But you do an analogous trick, by adding a
largish number of extra sites to the data, with one nucleotide state ("A")
for the ingroup and another ("G") for the outgroup. You will then have to
use RETREE to manually reroot the tree in the desired place.
"How do I force certain groups to remain monophyletic in your parsimony programs?"
By the same method, using multiple fake characters, any number of
groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and
DNAMOVE you can specify whatever outgroups you want without going to this
trouble.
"How can I reroot one of the trees written out by PHYLIP?"
Use the program
RETREE. But keep in mind whether the tree inferred by the original program was
already rooted, or whether you are free to reroot it.
"Why doesn't NEIGHBOR read my DNA sequences correctly?"
Because it wants
to have as input a distance matrix, not sequences. You have to use DNADIST to
make the distance matrix first.
"What do I do about deletions and insertions in my sequences?"
The
molecular sequence programs will accept sequences that have gaps (the "-"
character). They do various things with them, mostly not optimal. DNAPARS
counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G,
and T). Each site counts one change when a gap arises or disappears. The
disadvantage of this treatment is that a long gap will be overweighted, with
one event per gapped site. So a gap of 10 nucleotides will count as being as
much evidence as 10 single site nucleotide substitutions. If there are not
overlapping gaps, one way to correct this is to recode the first site in the
gap as "-" but make all the others be "?" so the gap only counts as one event.
Other programs such as DNAML and DNADIST count gaps as equivalent to unknown
nucleotides (or unknown amino acids) on the grounds that we don't know what
would be there if something were there. This completely leaves out the
information from the presence or absence of the gap itself, but does not bias
the gapped sequence to be close to or far from other gapped or ungapped
sequences.
"Why don't your parsimony programs print out branch lengths?"
Because
there are problems defining the branch lengths. If you look closely at the
reconstructions of the states of the hypothetical ancestral nodes for almost
any data set and almost any parsimony method you will find some ambiguous
states on those nodes. There is then usually an ambiguity as to which branch
the change is actually on. Other parsimony programs resolve this in one or
another arbitrary fashion, sometimes with the user specifying how (for example,
methods that push the changes up the tree as far as possible or down it as far
as possible). I have preferred to leave it to the user to do this. Few
programs available from others currently correct the branch lengths for
multiple changes of state that may have overlain each other. One possible way
to get branch lengths with nucleotide sequence data is to take the tree
topology that you got, use RETREE to convert it to be unrooted, prepare a
distance matrix from your data using DNADIST, and then use FITCH with that tree
as User Tree and see what branch lengths it estimates.
"Why can't your programs handle unordered multistate characters?"
Well,
they can if they are 4-state characters whose states are A, C, G, and T (or U)
because then one can use the DNA sequence parsimony programs. But in general
the discrete characters parsimony programs can only handle two states, 0 and 1.
This is mostly because I have not yet had time to modify them to do so -- the
modifications would have to be extensive. Ultimately I hope to get these done,
but in the meantime the best I can do is suggest that you either use one of the
excellent parsimony programs produced by others (PAUP or Hennig86, for example)
or if you have four or fewer states recode your states to look like nucleotides
and use the parsimony programs in the molecular sequence section of PHYLIP.
"Where can I get a printed version of the PHYLIP documents?"
For the
moment, you can only get a printed version by printing it yourself. For
versions 3.1 to 3.3 a printed version was sold by Christopher Meacham and Tom
Duncan, then at the University Herbarium of the University of California at
Berkeley. But they have had to discontinue this as it was too much work. You
should be able to print out the documentation file on almost any printer and
make yourself a printed version of whichever of them you need.
"Why have I been dropped from your newsletter mailing list?"
You haven't.
The newsletter was dropped. It simply was too hard to mail it out to such a
large mailing list. The last issue of the newsletter was Number 9 in May,
1987. I am hoping that the Listserver News Bulletins will replace the old
PHYLIP Newsletter. If you have electronic mail access you should definitely
sign up for these bulletins. For details see the section on the Listserver
News Bulletins below.
"How many copies of PHYLIP have been distributed?"
Currently (January,
1993) I have a bit over 1970 registered installations worldwide. Of course
there are many more people who have got copies from friends. PHYLIP is the
most widely distributed phylogeny package. PAUP is catching up in terms of
official registrations, but PHYLIP is probably far ahead in terms of numbers of
actual copies out there. In terms of phylogenies published, however, PAUP is
ahead, but PHYLIP is gaining on it. In recent years magnetic tape distribution
of PHYLIP has declined precipitously, electronic mail distribution is
decreasing, and there has been a slow decrease of diskette distributions. But
all this has been more than offset by a huge explosion of distributions by
anonymous ftp over Internet (a rate of about 3 ftp sessions per day, at the
moment). Because many people who get the package by anonymous ftp forget to
register their copies, it is hard to estimate how many people have got it this
way.
ADDITIONAL FREQUENTLY ASKED QUESTIONS, OR: "Why didn't it occur to you to ...
... write these programs in Pascal?"
These programs started out in
Pascal in 1980. In 1993 we have released both Pascal and C versions. All
future versions will be C-only. I make fewer mistakes in Pascal and do like
the language better than C, but C has overtaken Pascal and Pascal compilers are
starting to be hard to find on some machines. Also C is a bit better
standardized which makes the number of modifications a user has to make to
adapt the programs to their system much less.
... forgot about all those inferior systems and just develop PHYLIP for Unix?"
This is self-answering, since the same people first said I should
just develop it for Apple IIs, then for CP/M Z-80s, then for IBM PCDOS, and now
they're starting to tell me to just develop it for Macintoshes or for Sun
workstations. If I had listened to them and done any one of these, I would
have had a very hard time adapting the package to any of the other ones once
these folks changed their mind! However, I am keeping an eye on X-windows and
Unix, as this looks like it will be very widespread combination in the future
and may become a de facto standard for user interface and operating system.
But then we haven't yet seen Windows NT or the Apple/IBM Pink operating system!
... write these programs in PROLOG (or Ada, or Modula-2, or SIMULA, or BCPL, or PL/I, or APL, or LISP)?"
These are all languages I have considered.
All have advantages, but they are not really spreading (C is).
... include in the package a program to do the Distance Wagner method, (or successive approximations character weighting, or transformation series nalysis)?"
In most cases where I have not included other methods, it is
because I decided that they had no substantial advantages over methods that
were included (such as the programs FITCH, KITSCH, NEIGHBOR, the T option of
MIX and DOLLOP, and the "?" ancestral states option of the discrete characters
parsimony programs).
... include in the package ordination methods and more clustering algorithms?"
Because this is NOT a clustering package, it's a package for
phylogeny estimation. Those are different tasks with different objectives and
mostly different methods. Mary Kuhner has, however, included in NEIGHBOR an
option for UPGMA clustering, which will be very similar to KITSCH in results.
... include in the package a program to do nucleotide sequence alignment?"
Well, yes, I should have, and this is scheduled to be in future
releases. But multiple sequence alignment programs, in the era after Sankoff,
Morel, and Cedergren's 1973 classic paper, need to use substantial computer
horsepower to estimate the alignment and the tree together. So I will be slow
getting this into the package and in the meantime you may want to investigate
ClustalV or TreeAlign.
... send me the programs over the electronic mail network I use, BUTTERFLYNET?"
Well, I am trying to. Maybe there is a BUTTERFLYNET gateway
hanging off FISHNET, which hangs off HAIRNET, which ... I am connected to
NSFNET (the former ARPANET), which is part of Internet and connects to Bitnet.
I can mail to Bitnet (EARN, NetNorth) and to UUCP networks. Keep in mind that
the resulting files take up about 2.2 Megabytes and that if you are not going
to use them on the machine I send them to, you will have to download the files
to your other machine. Also in some cases networks and gateways lose or
truncate files (these can be up to about 60K long). So sometimes diskette or
tape are a better medium. I hope to continually expand and solidify network
distribution. For a couple of years, PHYLIP has been available over Internet
by "anonymous ftp" from my machine, evolution.genetics.washington.edu
(128.95.12.41). You can start by fetching file "README" from directory
pub/phylip. My electronic mail addresses are given at the end of this
document. Contact me by electronic mail if you are interested in getting
PHYLIP over your network but cannot get ftp to work.
... let me log in to your computer in Seattle and copy the files out over a phone line?"
No thanks. It would cost you for over two hours of long-
distance telephone time, plus a half hour of my time and yours in which I had
to explain to you how to log in and do the copying.
... send me a listing of your program?"
Damn it, it's not "a
program", it's 31 programs, in a total of 89 files. What were you thinking of
doing, having 1800-line programs typed in by slaves at your end? If you were
going to go to all that trouble why not try magnetic tape or diskettes? If you
have these then you can print out all the listings you want to and add them to
the huge stack of printed output in the corner of your office. (This and the
following two questions, once common, are finally disappearing, I am pleased to
report).
... write a magnetic tape in our computer center's favorite format (inverted Lithuanian EBCDIC at 998 bpi)?"
Because the ANSI standard format is
the most widely used one, and even though your computer center may pretend it
can't read a tape written this way, if you sniff around you will find a utility
to read it. It's just a LOT easier for me to let you do that work. If I tried
to put the tape into your format, I would probably get it wrong anyway.
... give us a version of these in FORTRAN?"
Because the programs are
FAR easier to write and debug in Pascal, and cannot easily be rewritten into
FORTRAN (they make extensive use of recursive calls and of records and
pointers). In any case, C is widely available. If you don't have a C compiler
or don't know how to use it, you are going to have to learn a language like C
or Pascal sooner or later, and the sooner the better.
Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)