DNAMOVE - Interactive DNA parsimony

version 3.5c

CONTENTS:

(c) Copyright 1986-1993 by Joseph Felsenstein and by the University of Washington. Written by Joseph Felsenstein. Permission is granted to copy this document provided that no fee is charged for it and that this copyright notice is not removed.

DESCRIPTION

DNAMOVE is an interactive DNA parsimony program, inspired by Wayne Maddison and David Maddison's marvellous program MacClade, which is written for Apple MacIntosh computers. DNAMOVE reads in a data set which is prepared in almost the same format as one for the DNA parsimony program
DNAPARS. It allows the user to choose an initial tree, and displays this tree on the screen. The user can look at different sites and the way the nucleotide states are distributed on that tree, given the most parsimonious reconstruction of state changes for that particular tree. The user then can specify how the tree is to be rearraranged, rerooted or written out to a file. By looking at different rearrangements of the tree the user can manually search for the most parsimonious tree, and can get a feel for how different sites are affected by changes in the tree topology.

This program is compatible with fewer computer systems than the other programs in PHYLIP. It can be adapted to PCDOS systems or to any system whose screen or terminals emulate DEC VT52 or VT100 terminals (such as, for example, Zenith Z19, Z29, and Z49 terminals Telnet programs for logging in to remote computers over a TCP/IP network, VT100-compatible windows in the X windowing system, and any terminal compatible with ANSI standard terminals). For any other screen types, there is a generic option which does not make use of screen graphics characters to display the nucleotide states. This will be less effective, as the nucleotide states will be less easy to see when displayed.

The input data file is set up almost identically to the data files for DNAPARS. The code for nucleotide sequences is the standard one, as described in the molecular sequence programs document. As in DNAPARS, the only option whose presence needs to be signalled in the input file is the W (Weights) option, which functions as described in the main documentation file. If it is used then there must be a W on the first line of the input file. The user- defined trees, as described below, are not specified in the input file but in a separate tree file.

The user interaction starts with the program presenting a menu. The menu looks like this:

Interactive DNA parsimony, version 3.5c

Settings for this run:
  O                           Outgroup root?  No, use as outgroup species  1
  T                 Use Threshold parsimony?  No, use ordinary parsimony
  I             Input sequences interleaved?  Yes
  U Initial tree (arbitrary, user, specify)?  Arbitrary
  0      Graphics type (IBM PC, VT52, ANSI)?  ANSI
  L               Number of lines on screen?    24

Are these settings correct? (type Y or the letter for one to change)
The O (Outgroup), T (Threshold), and 0 (Graphics type) options are the usual ones and are described in the main documentation file. The I (Interleaved) option is the usual one and is described in the main documentation file and the molecular sequence programs documentation file. The U (initial tree) option allows the user to choose whether the initial tree is to be arbitrary, interactively specified by the user, or read from a tree file. Typing U causes the program to change among the three possibilities in turn. I would recommend that for a first run, you allow the tree to be set up arbitrarily (the default), as the "specify" choice is difficult to use and the "user tree" choice requires that you have available a tree file with the tree topology of the initial tree, which must be a rooted tree. If you wish to set up some particular tree you can also do that by the rearrangement commands specified below.

The T (threshold) option allows a continuum of methods between parsimony and compatibility. Thresholds less than or equal to 1.0 do not have any meaning and should not be used: they will result in a tree dependent only on the input order of species and not at all on the data!

The L (screen Lines) option allows the user to change the height of the screen (in lines of characters) that is assumed to be available on the display. This may be particularly helpful when displaying large trees on terminals that have more than 24 lines per screen, or on workstation or X-terminal screens that can emulate the ANSI terminals with more than 24 lines.

After the initial menu is displayed and the choices are made, the program then sets up an initial tree and displays it. Below it will be a one-line menu of possible commands, which looks like this:

NEXT? (Options: R # + - S . T U W O F C H ? X Q) (H or ? for Help)
If you type H or ? you will get a single screen showing a description of each of these commands in a few words. Here are slightly more detailed descriptions:

R
("Rearrange"). This command asks for the number of a node which is to be removed from the tree. It and everything to the right of it on the tree is to be removed (by breaking the branch immediately below it). The command also asks for the number of a node below which that group is to be inserted. If an impossible number is given, the program refuses to carry out the rearrangement and asks for a new command. The rearranged tree is displayed: it will often have a different number of steps than the original. If you wish to undo a rearrangement, use the Undo command, for which see below.
#
This command, and the +, - and S commands described below, determine which site has its states displayed on the branches of the trees. The initial tree displayed by the program does not show states of sites. When # is typed, the program does not ask the user which site is to be shown but automatically shows the states of the next site that is not compatible with the tree (the next site that does not perfectly fit the current tree). The search for this site "wraps around" so that if it reaches the last site without finding one that is not compatible with the tree, the search continues at the first site; if no incompatible site is found the current site is shown again, and if no current site is being shown then the first site is shown. The display takes the form of different symbols or textures on the branches of the tree. The state of each branch is actually the state of the node above it. A key of the symbols or shadings used for states A, C, G, T (U) and ? are shown next to the tree. State ? means that more than one possible nucleotide could exist at that point on the tree, and that the user may want to consider the different possibilities, which are usually apparent by inspection.
+
This command is the same as # except that it goes forward one site, showing the states of the next site. If no site has been shown, using + will cause the first site to be shown. Once the last site has been reached, using + again will show the first site.
-
This command is the same as + except that it goes backwards, showing the states of the previous site. If no site has been shown, using - will cause the last site to be shown. Once site number 1 has been reached, using - again will show the last site.
S
("Show"). This command is the same as + and - except that it causes the program to ask you for the number of a site. That site is the one whose states will be displayed. If you give the site number as 0, the program will go back to not showing the states of the sites.
.
This command simply causes the current tree to be redisplayed. It is of use when the tree has partly disappeared off of the top of the screen owing to too many responses to commands being printed out at the bottom of the screen.
T
("Try rearrangements"). This command asks for the name of a node. The part of the tree at and above that node is removed from the tree. The program tries to re-insert it in each possible location on the tree (this may take some time, and the program reminds you to wait). Then it prints out a summary. For each possible location the program prints out the number of the node to the right of the place of insertion and the number of steps required in each case. These are divided into those that are better then or tied with the current tree. Once this summary is printed out, the group that was removed is reinserted into its original position. It is up to you to use the R command to actually carry out any of the arrangements that have been tried.
U
("Undo"). This command reverses the effect of the most recent rearrangement, outgroup re-rooting, or flipping of branches. It returns to the previous tree topology. It will be of great use when rearranging the tree and when a rearrangement proves worse than the preceding one -- it permits you to abandon the new one and return to the previous one without remembering its topology in detail.
W
("Write"). This command writes out the current tree onto a tree output file. If the file already has been written to by this run of DNAMOVE, it will ask you whether you want to replace the contents of the file, add the tree to the end of the file, or not write out the tree to the file. The tree is written in the standard format used by PHYLIP (a subset of the New Hampshire standard). It is in the proper format to serve as the User- Defined Tree for setting up the initial tree in a subsequent run of the program. Note that if you provided the initial tree topology in a tree file and replace its contents, that initial tree will be lost.
O
("Outgroup"). This asks for the number of a node which is to be the outgroup. The tree will be redisplayed with that node as the left descendant of the bottom fork. Note that it is possible to use this to make a multi-species group the outgroup (i.e., you can give the number of an interior node of the tree as the outgroup, and the program will re-root the tree properly with that on the left of the bottom fork.
F
("Flip"). This asks for a node number and then flips the two branches at that node, so that the left-right order of branches at that node is changed. This does not actually change the tree topology (or the number of steps on that tree) but it does change the appearance of the tree.
C
("Clade"). When the data consist of more than 12 species (or more than half the number of lines on the screen if this is not 24), it may be difficult to display the tree on one screen. In that case the tree will be squeezed down to one line per species. This is too small to see all the interior states of the tree. The C command instructs the program to print out only that part of the tree (the "clade") from a certain node on up. The program will prompt you for the number of this node. Remember that thereafter you are not looking at the whole tree. To go back to looking at the whole tree give the C command again and enter "0" for the node number when asked. Most users will not want to use this option unless forced to.
H
("Help"). Prints a one-screen summary of what the commands do, a few words for each command.
?
("?"). A synonym for H. Same as Help command.
X
("Exit"). Exit from program. If the current tree has not yet been saved into a file, the program will first ask you whether it should be saved.
Q
("Quit"). A synonym for X. Same as the eXit command.

ADAPTING THE PROGRAM TO YOUR COMPUTER AND TO YOUR TERMINAL

As we have seen, the initial menu of the program allows you to choose among four screen types (PCDOS, Ansi, VT52 and none). If you want to avoid having to make this choice every time, you can change some of the constants at the beginning of the program to have it initialize itself in the proper way. At the beginning of the program are three constants that determine which kind of screen graphics the program will use. They are ibmpc0, vt520, and ansi0. In the distribution version of the programs, ansi0 is set to "true" and the others to "false", in the statements:

#define ibmpc0          false
#define ansi0           true
#define vt520           false
so that the version will work with ANSI compatible terminals.

On the other hand if you have a terminal compatible with DEC's VT52, but not with the ANSI terminal, you should change the constant ansi0 to "false" and vt520 to "true". If you have instead a terminal which is compatible with IBM PC graphics, you should set the constant "ibmpc0" to "true" and the others to "false". If your terminal is compatible with none of these, you will have to set the constants all "false", in which case special graphics characters will not be used to indicate nucleotide states, but only letters will be used for the four nucleotides. This is less easy to look at.

MORE ABOUT THE PARSIMONY CRITERION

This program carries out unrooted parsimony (analogous to Wagner trees) (Eck and Dayhoff, 1966; Kluge and Farris, 1969) on DNA sequences. The method of Fitch (1971) is used to count the number of changes of base needed on a given tree. The assumptions of this method are exactly analogous to those of MIX:

  1. Each site evolves independently.
  2. Different lineages evolve independently.
  3. The probability of a base substitution at a given site is small over the lengths of time involved in a branch of the phylogeny.
  4. The expected amounts of change in different branches of the phylogeny do not vary by so much that two changes in a high-rate branch are more probable than one change in a low-rate branch.
  5. The expected amounts of change do not vary enough among sites that two changes in one site are more probable than one change in another.
That these are the assumptions of parsimony methods has been documented in a series of papers of mine: ( 1973a, 1978b, 1979, 1981b, 1983b, 1988b). For an opposing view arguing that the parsimony methods make no substantive assumptions such as these, see the papers by Farris (1983) and Sober (1983a, 1983b), but also read the exchange between Felsenstein and Sober (1986).

Change from an occupied site to a deletion is counted as one change. Reversion from a deletion to an occupied site is allowed and is also counted as one change.

Below is a test data set, but we cannot show the output it generates because of the interactive nature of the program.

TEST DATA SET


   5   13
Alpha     AACGUGGCCA AAU
Beta      AAGGUCGCCA AAC
Gamma     CAUUUCGUCA CAA
Delta     GGUAUUUCGG CCU
Epsilon   GGGAUCUCGG CCC


Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)