FREQUENTLY ASKED QUESTIONS

If I copied PHYLIP from a friend without you knowing, should I try to keep you from finding out?"

No. It is to your advantage and mine for you to let me know. If you did not get PHYLIP "officially" from me or from someone authorized by me, but copied a friend's version, you are not in my database of users. You probably also have an old version which has since been substantially improved (see the beginning of this main document file for the date on which this version was released). I don't mind you "bootlegging" PHYLIP (it's free anyway, and that saves me the work of writing diskettes), but you should realize that you may have an outdated version. You may be able to get the latest version just as quickly over Internet. You can read about subsequent bug fixes in the electronic news bulletins the person you got it from may (or may not) have subscribed to. It will help both of us if you get onto my mailing list. If you are on it, then I will give your name to other nearby users when they get a new copy, and they are urged to contact you and update your copy. (I benefit by getting a better feel for how many distributions there have been, and having a better mailing list to use to give other users local people to contact). Send me your name and address (five lines maximum), and your phone number, with the number of the version that you have, plus the type of your computer, operating system, and C compiler, so that I can add you to the address list. Note also the listserver information which you can get, which provides news about PHYLIP by electronic mail. This is described in the next to last section of this document.

"How do I make a citation to the PHYLIP package in the paper I am writing?"

One way is like this:
Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.
or if the editor for whom you are writing insists that the citation must be to a printed publication, you could cite a notice for version 3.2 published in Cladistics:
Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.
For a while a printed version of the PHYLIP documentation was available and one could cite that. This is no longer true. Other than that, this is difficult, because I have never written a paper announcing PHYLIP!
My 1985b paper in Evolution on the bootstrap method contains a one-paragraph Appendix describing the availability of this package, and that can also be cited as a reference for the package, although it has been distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP is needed mostly to give people something to cite, as word-of-mouth, references in other people's papers, and electronic newsgroup postings have spread the word about PHYLIP's existence quite effectively.

"How do I bootstrap? Why has DNABOOT disappeared?"

DNABOOT, BOOT, and DOLBOOT, the previous parsimony-based bootstrap programs, have been removed from the package as there is now a more general way of bootstrapping. It involves running SEQBOOT to make multiple bootstrapped data sets out of your one data set, then running one of the tree-making programs with the Multiple data sets option to analyze them all, then running CONSENSE to make a majority rule consensus tree from the resulting tree file. Read the documentation of SEQBOOT to get further information. Before, only parsimony methods could be bootstrapped. With this new system almost any of the tree-making methods in the package can be bootstrapped. It is somewhat more tedious but you will find it much more rewarding.

"How do I specify a multi-species outgroup with your parsimony programs?"

It's not a feature but is not too hard to do in many of the programs. In parsimony programs like MIX, for which the W (Weights) and A (Ancestral states) options are available, and weights can be larger than 1, all you need to do is:
(a)
In MIX, make up an extra character with states 0 for all the outgroups and 1 for all the ingroups. If using DNAPARS the ingroup can have (say) "G" and the outgroup "A".
(b)
Assign this character an enormous weight (such as Z for 35) using the W option, all other characters getting weight 1, or whatever weight they had before.
(c)
If it is available, Use the A (Ancestral states) option to designate that for that new character the state found in the outgroup is the ancestral state.
(d)
In MIX do not use the O (Outgroup) option.
(e)
After the tree is found, the designated ingroup should have been held together by the fake character. The tree will be rooted somewhere in the outgroup (the program may or may not have a preference for one place in the outgroup over another). Make sure that you subtract from the total number of steps on the tree all steps in the new character.
In programs like DNAPARS, you cannot use this method as weights of sites cannot be greater than 1. But you do an analogous trick, by adding a largish number of extra sites to the data, with one nucleotide state ("A") for the ingroup and another ("G") for the outgroup. You will then have to use RETREE to manually reroot the tree in the desired place.

"How do I force certain groups to remain monophyletic in your parsimony programs?"

By the same method, using multiple fake characters, any number of groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and DNAMOVE you can specify whatever outgroups you want without going to this trouble.

"How can I reroot one of the trees written out by PHYLIP?"

Use the program RETREE. But keep in mind whether the tree inferred by the original program was already rooted, or whether you are free to reroot it.

"Why doesn't NEIGHBOR read my DNA sequences correctly?"

Because it wants to have as input a distance matrix, not sequences. You have to use DNADIST to make the distance matrix first.

"What do I do about deletions and insertions in my sequences?"

The molecular sequence programs will accept sequences that have gaps (the "-" character). They do various things with them, mostly not optimal. DNAPARS counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G, and T). Each site counts one change when a gap arises or disappears. The disadvantage of this treatment is that a long gap will be overweighted, with one event per gapped site. So a gap of 10 nucleotides will count as being as much evidence as 10 single site nucleotide substitutions. If there are not overlapping gaps, one way to correct this is to recode the first site in the gap as "-" but make all the others be "?" so the gap only counts as one event. Other programs such as DNAML and DNADIST count gaps as equivalent to unknown nucleotides (or unknown amino acids) on the grounds that we don't know what would be there if something were there. This completely leaves out the information from the presence or absence of the gap itself, but does not bias the gapped sequence to be close to or far from other gapped or ungapped sequences.

"Why don't your parsimony programs print out branch lengths?"

Because there are problems defining the branch lengths. If you look closely at the reconstructions of the states of the hypothetical ancestral nodes for almost any data set and almost any parsimony method you will find some ambiguous states on those nodes. There is then usually an ambiguity as to which branch the change is actually on. Other parsimony programs resolve this in one or another arbitrary fashion, sometimes with the user specifying how (for example, methods that push the changes up the tree as far as possible or down it as far as possible). I have preferred to leave it to the user to do this. Few programs available from others currently correct the branch lengths for multiple changes of state that may have overlain each other. One possible way to get branch lengths with nucleotide sequence data is to take the tree topology that you got, use RETREE to convert it to be unrooted, prepare a distance matrix from your data using DNADIST, and then use FITCH with that tree as User Tree and see what branch lengths it estimates.

"Why can't your programs handle unordered multistate characters?"

Well, they can if they are 4-state characters whose states are A, C, G, and T (or U) because then one can use the DNA sequence parsimony programs. But in general the discrete characters parsimony programs can only handle two states, 0 and 1. This is mostly because I have not yet had time to modify them to do so -- the modifications would have to be extensive. Ultimately I hope to get these done, but in the meantime the best I can do is suggest that you either use one of the excellent parsimony programs produced by others (PAUP or Hennig86, for example) or if you have four or fewer states recode your states to look like nucleotides and use the parsimony programs in the molecular sequence section of PHYLIP.

"Where can I get a printed version of the PHYLIP documents?"

For the moment, you can only get a printed version by printing it yourself. For versions 3.1 to 3.3 a printed version was sold by Christopher Meacham and Tom Duncan, then at the University Herbarium of the University of California at Berkeley. But they have had to discontinue this as it was too much work. You should be able to print out the documentation file on almost any printer and make yourself a printed version of whichever of them you need.

"Why have I been dropped from your newsletter mailing list?"

You haven't. The newsletter was dropped. It simply was too hard to mail it out to such a large mailing list. The last issue of the newsletter was Number 9 in May, 1987. I am hoping that the Listserver News Bulletins will replace the old PHYLIP Newsletter. If you have electronic mail access you should definitely sign up for these bulletins. For details see the section on the Listserver News Bulletins below.

"How many copies of PHYLIP have been distributed?"

Currently (January, 1993) I have a bit over 1970 registered installations worldwide. Of course there are many more people who have got copies from friends. PHYLIP is the most widely distributed phylogeny package. PAUP is catching up in terms of official registrations, but PHYLIP is probably far ahead in terms of numbers of actual copies out there. In terms of phylogenies published, however, PAUP is ahead, but PHYLIP is gaining on it. In recent years magnetic tape distribution of PHYLIP has declined precipitously, electronic mail distribution is decreasing, and there has been a slow decrease of diskette distributions. But all this has been more than offset by a huge explosion of distributions by anonymous ftp over Internet (a rate of about 3 ftp sessions per day, at the moment). Because many people who get the package by anonymous ftp forget to register their copies, it is hard to estimate how many people have got it this way.

ADDITIONAL FREQUENTLY ASKED QUESTIONS, OR: "Why didn't it occur to you to ...

... write these programs in Pascal?"

These programs started out in Pascal in 1980. In 1993 we have released both Pascal and C versions. All future versions will be C-only. I make fewer mistakes in Pascal and do like the language better than C, but C has overtaken Pascal and Pascal compilers are starting to be hard to find on some machines. Also C is a bit better standardized which makes the number of modifications a user has to make to adapt the programs to their system much less.

... forgot about all those inferior systems and just develop PHYLIP for Unix?"

This is self-answering, since the same people first said I should just develop it for Apple IIs, then for CP/M Z-80s, then for IBM PCDOS, and now they're starting to tell me to just develop it for Macintoshes or for Sun workstations. If I had listened to them and done any one of these, I would have had a very hard time adapting the package to any of the other ones once these folks changed their mind! However, I am keeping an eye on X-windows and Unix, as this looks like it will be very widespread combination in the future and may become a de facto standard for user interface and operating system. But then we haven't yet seen Windows NT or the Apple/IBM Pink operating system!

... write these programs in PROLOG (or Ada, or Modula-2, or SIMULA, or BCPL, or PL/I, or APL, or LISP)?"

These are all languages I have considered. All have advantages, but they are not really spreading (C is).

... include in the package a program to do the Distance Wagner method, (or successive approximations character weighting, or transformation series nalysis)?"

In most cases where I have not included other methods, it is because I decided that they had no substantial advantages over methods that were included (such as the programs FITCH, KITSCH, NEIGHBOR, the T option of MIX and DOLLOP, and the "?" ancestral states option of the discrete characters parsimony programs).

... include in the package ordination methods and more clustering algorithms?"

Because this is NOT a clustering package, it's a package for phylogeny estimation. Those are different tasks with different objectives and mostly different methods. Mary Kuhner has, however, included in NEIGHBOR an option for UPGMA clustering, which will be very similar to KITSCH in results.

... include in the package a program to do nucleotide sequence alignment?"

Well, yes, I should have, and this is scheduled to be in future releases. But multiple sequence alignment programs, in the era after Sankoff, Morel, and Cedergren's 1973 classic paper, need to use substantial computer horsepower to estimate the alignment and the tree together. So I will be slow getting this into the package and in the meantime you may want to investigate ClustalV or TreeAlign.

... send me the programs over the electronic mail network I use, BUTTERFLYNET?"

Well, I am trying to. Maybe there is a BUTTERFLYNET gateway hanging off FISHNET, which hangs off HAIRNET, which ... I am connected to NSFNET (the former ARPANET), which is part of Internet and connects to Bitnet. I can mail to Bitnet (EARN, NetNorth) and to UUCP networks. Keep in mind that the resulting files take up about 2.2 Megabytes and that if you are not going to use them on the machine I send them to, you will have to download the files to your other machine. Also in some cases networks and gateways lose or truncate files (these can be up to about 60K long). So sometimes diskette or tape are a better medium. I hope to continually expand and solidify network distribution. For a couple of years, PHYLIP has been available over Internet by "anonymous ftp" from my machine, evolution.genetics.washington.edu (128.95.12.41). You can start by fetching file "README" from directory pub/phylip. My electronic mail addresses are given at the end of this document. Contact me by electronic mail if you are interested in getting PHYLIP over your network but cannot get ftp to work.

... let me log in to your computer in Seattle and copy the files out over a phone line?"

No thanks. It would cost you for over two hours of long- distance telephone time, plus a half hour of my time and yours in which I had to explain to you how to log in and do the copying.

... send me a listing of your program?"

Damn it, it's not "a program", it's 31 programs, in a total of 89 files. What were you thinking of doing, having 1800-line programs typed in by slaves at your end? If you were going to go to all that trouble why not try magnetic tape or diskettes? If you have these then you can print out all the listings you want to and add them to the huge stack of printed output in the corner of your office. (This and the following two questions, once common, are finally disappearing, I am pleased to report).

... write a magnetic tape in our computer center's favorite format (inverted Lithuanian EBCDIC at 998 bpi)?"

Because the ANSI standard format is the most widely used one, and even though your computer center may pretend it can't read a tape written this way, if you sniff around you will find a utility to read it. It's just a LOT easier for me to let you do that work. If I tried to put the tape into your format, I would probably get it wrong anyway.

... give us a version of these in FORTRAN?"

Because the programs are FAR easier to write and debug in Pascal, and cannot easily be rewritten into FORTRAN (they make extensive use of recursive calls and of records and pointers). In any case, C is widely available. If you don't have a C compiler or don't know how to use it, you are going to have to learn a language like C or Pascal sooner or later, and the sooner the better.


Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)