FREQUENTLY ASKED QUESTIONS

If I copied PHYLIP from a friend without you knowing, should I try to keep you from finding out?"

No. It is to your advantage and mine for you to let me know. If you did not get PHYLIP "officially" from me or from someone authorized by me, but copied a friend's version, you are not in my database of users. You probably also have an old version which has since been substantially improved (see the beginning of this main document file for the date on which this version was released). I don't mind you "bootlegging" PHYLIP (it's free anyway, and that saves me the work of writing diskettes), but you should realize that you may have an outdated version. You may be able to get the latest version just as quickly over Internet. You can read about subsequent bug fixes in the electronic news bulletins the person you got it from may (or may not) have subscribed to. It will help both of us if you get onto my mailing list. If you are on it, then I will give your name to other nearby users when they get a new copy, and they are urged to contact you and update your copy. (I benefit by getting a better feel for how many distributions there have been, and having a better mailing list to use to give other users local people to contact). Send me your name and address (five lines maximum), and your phone number, with the number of the version that you have, plus the type of your computer, operating system, and C compiler, so that I can add you to the address list. Note also the listserver information which you can get, which provides news about PHYLIP by electronic mail. This is described in the next to last section of this document.

"How do I make a citation to the PHYLIP package in the paper I am writing?"

One way is like this:

Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.

or if the editor for whom you are writing insists that the citation must be to a printed publication, you could cite a notice for version 3.2 published in Cladistics:

Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.

For a while a printed version of the PHYLIP documentation was available and one could cite that. This is no longer true. Other than that, this is difficult, because I have never written a paper announcing PHYLIP! My 1985b paper in Evolution on the bootstrap method contains a one-paragraph Appendix describing the availability of this package, and that can also be cited as a reference for the package, although it has been distributed since 1980 while the bootstrap paper is 1985. A paper on PHYLIP is needed mostly to give people something to cite, as word-of-mouth, references in other people's papers, and electronic newsgroup postings have spread the word about PHYLIP's existence quite effectively.

"How do I bootstrap? Why has DNABOOT disappeared?"

DNABOOT, BOOT, and DOLBOOT, the previous parsimony-based bootstrap programs, have been removed from the package as there is now a more general way of bootstrapping. It involves running SEQBOOT to make multiple bootstrapped data sets out of your one data set, then running one of the tree-making programs with the Multiple data sets option to analyze them all, then running CONSENSE to make a majority rule consensus tree from the resulting tree file. Read the documentation of SEQBOOT to get further information. Before, only parsimony methods could be bootstrapped. With this new system almost any of the tree-making methods in the package can be bootstrapped. It is somewhat more tedious but you will find it much more rewarding.

"How do I specify a multi-species outgroup with your parsimony programs?"

It's not a feature but is not too hard to do in many of the programs. In parsimony programs like MIX, for which the W (Weights) and A (Ancestral states) options are available, and weights can be larger than 1, all you need to do is:

(a): In MIX, make up an extra character with states 0 for all the outgroups and 1 for all the ingroups. If using DNAPARS the ingroup can have (say) "G" and the outgroup "A".
(b): Assign this character an enormous weight (such as Z for 35) using the W option, all other characters getting weight 1, or whatever weight they had before.
(c): If it is available, Use the A (Ancestral states) option to designate that for that new character the state found in the outgroup is the ancestral state.
(d): In MIX do not use the O (Outgroup) option.
(e): After the tree is found, the designated ingroup should have been held together by the fake character. The tree will be rooted somewhere in the outgroup (the program may or may not have a preference for one place in the outgroup over another). Make sure that you subtract from the total number of steps on the tree all steps in the new character.

In programs like DNAPARS, you cannot use this method as weights of sites cannot be greater than 1. But you do an analogous trick, by adding a largish number of extra sites to the data, with one nucleotide state ("A") for the ingroup and another ("G") for the outgroup. You will then have to use RETREE to manually reroot the tree in the desired place.

"How do I force certain groups to remain monophyletic in your parsimony programs?"

By the same method, using multiple fake characters, any number of groups of species can be forced to be monophyletic. In MOVE, DOLMOVE, and DNAMOVE you can specify whatever outgroups you want without going to this trouble.

"How can I reroot one of the trees written out by PHYLIP?"

Use the program RETREE. But keep in mind whether the tree inferred by the original program was already rooted, or whether you are free to reroot it.

"Why doesn't NEIGHBOR read my DNA sequences correctly?"

Because it wants to have as input a distance matrix, not sequences. You have to use DNADIST to make the distance matrix first.

"What do I do about deletions and insertions in my sequences?"

The molecular sequence programs will accept sequences that have gaps (the "-" character). They do various things with them, mostly not optimal. DNAPARS counts "gap" as if it were a fifth nucleotide state (in addition to A, C, G, and T). Each site counts one change when a gap arises or disappears. The disadvantage of this treatment is that a long gap will be overweighted, with one event per gapped site. So a gap of 10 nucleotides will count as being as much evidence as 10 single site nucleotide substitutions. If there are not overlapping gaps, one way to correct this is to recode the first site in the gap as "-" but make all the others be "?" so the gap only counts as one event. Other programs such as DNAML and DNADIST count gaps as equivalent to unknown nucleotides (or unknown amino acids) on the grounds that we don't know what would be there if something were there. This completely leaves out the information from the presence or absence of the gap itself, but does not bias the gapped sequence to be close to or far from other gapped or ungapped sequences.