GENERAL COMMENTS ON ADAPTING THE PACKAGE TO DIFFERENT COMPUTER SYSTEMS

In the sections following you will find instructions on how to adapt the programs to different computers and compilers. The programs should compile without alteration on most versions of C. They use the "malloc" library or "calloc" function to allocate memory so that the upper limits on how many species or how many sites or characters they can run is set by the system memory available to that memory-allocation function.

In the document file for each program, I have supplied a small input example, and the output it produces, to help you check whether the programs are running properly.

Most of the programs read their data from a file called "infile" and write their output to a file called "outfile" and a tree file to a file "treefile". If "infile" does not exist the program will prompt you for its name.

Compiling the programs

Many machines that have C compilers, particularly Unix systems, have a utility called "make" available that considerably simplifies the process of compiling these programs. I will first discuss how to compile these programs with "make" and then, after a digression on how to move PHYLIP to a microcomputer, discuss for different individual systems how to compile the programs. As we shall see below, for some DOS and Macintosh compilers one cannot simply use "make" and the standard Makefile.

Using "make"

If your machine has "make" you can place all the programs for the package, together with the file "Makefile" and the header files "phylip.h", and "drawgraphics.h", in one directory. The Makefile and header files are constructed to detect, for many varieties of C, which it is dealing with, and inform the programs accordingly so that they can (by using "#ifdef") adapt to the idiosyncracies of the compiler.
     To compile all the programs just type:    make all

     To compile just one program, such as DNAML, type:    make dnaml
After a time the compiler will finish compiling. The names of the executables will be the same as the names of the C programs, but without the ".c" suffix. Thus dnaml.c compiles to make an executable called "dnaml". If object modules ending in ".o" are found in the directory after compilation they can be removed if you need space.

Getting PHYLIP onto your microcomputer

C is widely available on microcomputers, and in any case we also distribute executable versions for PCDOS, 386 PCDOS, and Macintosh systems. Your institution may have an Internet connection, and if so there is probably a PCDOS system or a Macintosh somewhere connected directly to it. Using that machine you could download the executables and put them directly into diskette for transfer to your own machine. You can also get the source code, documentation, and executables by sending me the appropriate number of diskettes (see the general information at the start of this document).

If you cannot do this, you may be able to transfer the entire package, in the form of self-extracting archives (which is one of the ways we distribute it for microcomputers) to your system using a terminal program with file transfer capabilities. Some users are sufficiently terrified of this prospect that they prefer to mail us diskettes and wait for several weeks. But if your institution has an Internet connection it is much faster to do it that way. If you have a serial port to which a modem can be hooked, you can get a terminal program and do the transfers yourself. For most microcomputer systems, public-domain or shareware terminal programs are available, such as the widely-distributed KERMIT and MODEM families of programs. Most university computer centers have communications programs (KERMIT or XMODEM) to "talk" to KERMIT, MODEM, or PC-TALK and transfer files to and from it.

Thus, if you cannot get from me a disk format readable by your machine, you can:

  1. Get an account on your mainframe and learn to use its facilities for "anonymous ftp" (transfer of files over Internet) or electronic mail.

  2. If you are on Internet (Or NSFNET) use the "anonymous ftp" method to receive the self-extracting archive files (start by downloading and reading the file "pub/phylip/Read.Me" from my system whose Internet address is evolution.genetics.washington.edu (128.95.12.41)), or

  3. if your institution is not on Internet but does have Bitnet electronic mail, you can request that I send you the PHYLIP source code files and documentation as e-mail messages over BITNET/EARN (not the executables, however).

  4. Make sure the files are saved on your mainframe account (you will need about 2.2 Megabytes of space) under appropriate names.

  5. Use the file transfer provisions of your terminal program to transfer the archives to your microcomputer, or if they came as many e-mail messages, to transfer these to your machine individually (most file transfer programs can transfer many files with one command) for later compilation of the C source.
If you cannot read the diskette formats that I can write, and if you absolutely INSIST that I distribute the package in this format, please send me the computer and thirteen diskettes. I will promptly write the diskettes and return them (but of course I will keep your computer).

Now we turn to particular C compilers and describe particular problems that may be encountered.

Microsoft Quick C and Microsoft C

These comments apply to Microsoft Quick C but may also work with Microsoft C. A Makefile for Microsoft Quick C is included with the source code. It is called "Makefile.qc". If you copy it and call the copy "Makefile" (making sure to first save the generic Makefile that comes with this package under some name such as Makefile.old), you should be able to use "make" as described above, except that it is called "nmake". Note that the command you must use to compile (for example) DNAPARS is "nmake dnapars.exe", not "nmake dnapars", as the program that results is to be called "dnapars.exe" and the Quick C Makefile is set up that way.

To compile individual programs without using the makefile, you need to do the following. For a non-graphics program use the following command (DOS> is the PCDOS prompt, so you do not type it):

DOS> qcl /AH /F 4000 /FPi [source files]
If the program you are trying to compile is a 1-part source (for example, neighbor only has one part, neighbor.c) you should replace "[source files]" with "neighbor.c". So the command would be:
DOS> qcl /AH /F 4000 /FPi neighbor.c
If the program you are trying to compile is a 2-part source (for example, mix has two parts, mix.c and mix2.c) you can replace [source files] with both of the source files. Make sure that the first source file in the list has the same name as the executable file you want. i.e. use mix.c mix2.c and not the other way around. If you reorder them, the executable file will be called "MIX2.EXE". For mix, the command would be:
DOS> qcl /AH /F 4000 /FPi mix.c mix2.c
to compile a graphics program (i.e. drawgram, drawtree) under quick c without using the makefile, use one of the following commands: for DRAWGRAM:
DOS> qcl /AH /F 4000 /FPi drawgram.c drawgraphics.c graphics.lib [for drawgram]
for DRAWTREE:
DOS> qcl /AH /F 4000 /FPi drawtree.c drawgraphics.c graphics.lib [for drawtree]

Turbo C++ for PCDOS

The following instructions are for Turbo C++ but may also work for Turbo C and for Borland C, perhaps with slight modifications. Under normal situations you can use the makefile. The makefile for Turbo C++ is included in the package as "Makefile.tc". Copy it and call the copy "Makefile" (it would be wise the first rename the original "Makefile" to "Makefile.old"). Then to compile, say,
DNAPARS, just type:
make dnapars.exe
However, if for some reason you want to do it by hand, follow the following steps:

For the non-graphical programs (all those other than DRAWGRAM and DRAWTREE):

to compile dnapars.c type the following (DOS> is the PCDOS prompt)

 DOS> tcc -mh dnapars.c
If the source file is sufficiently large to require two sources (for example, dnaml.c and dnaml2.c), you will need to use both dnaml.c and dnaml2.c.

Examples:

 DOS> tcc -mh dnaml.c dnaml2.c
 DOS> tcc -mh neighbor.c
If you would like to use the program under the TD debugger, you should add a "-v" flag as a compiler option:
 DOS> tcc -mh -v restml.c restml2.c
For the graphical programs (DRAWGRAM and DRAWTREE):

First you need to build the "BGI" drivers. The BGI drivers are included with your TURBOC compiler, and should be in the "BGI" directory (this is a subdirectory of the main turboc directory). To do this you need to use the "bgiobj" program, also in the BGI directory. The current version of PHYLIP supports the EGA/VGA, CGA, and hercules drivers. If you have modified the sources to take advantage of other drivers, you will have to include those as well.

To build the BGI drivers:

   DOS> cd \tc\bgi [this should be replaced with whatever your turboc dir is]

   DOS> BGIOBJ EGAVGA
   DOS> BGIOBJ CGA
   DOS> BGIOBJ HERC
this generates the files "EGAVGA.OBJ", "CGA.OBJ", and "HERC.OBJ" in the current directory. you want to copy this into your main source directory. (assume this is \phylip)
   DOS> CP EGAVGA.OBJ \phylip [replace this with your source directory]
   DOS> CP CGA.OBJ \phylip
   DOS> CP HERC.OBJ \phylip
To compile the program, cd back to your source directory. You want to compile each source file, plus a shared graphics file called "drawgraphics.c". You also want to link it to the newly created BGI object files and to the graphics library.

Examples:

DOS> tcc -mh drawgram.c drawgraphics.c herc.obj egavga.obj cga.obj graphics.lib
DOS> tcc -mh drawtree.c drawgraphics.c herc.obj egavga.obj cga.obj graphics.lib
(to compile drawgram and drawtree, respectively)

If you want to compile for the TD debugger, add the -v flag as above.

Waterloo C/386

Waterloo C/386 is the compiler we use to create the 386 PCDOS and 386 Windows versions of the executables. It has a "make" capability called "wmake". We have had problems using this so the instructions here are for individually compiling programs without wmake.

Watcom C/386 is a very flexible compiler which can generate executable programs for many different environments. Following are instructions for using Watcom C/386 to compile for DOS using the DOS/4GW DOS extender (included with the Watcom distribution) and for Microsoft windows.

DOS/4GW:

to compile a program under watcom C/386 for the DOS/4GW dos extender use the following (the "DOS>" is the PCDOS prompt, not something you type):

DOS> wcl386 /l=dos4gw [source files]
If the program you are trying to compile is a 1-part source (for example, neighbor only has one part, neighbor.c) you can replace [source files] with "neighbor.c". So the command would be:
DOS> wcl386 /l=dos4gw neighbor.c
If the program you are trying to compile is a 2-part source (for example, mix has two parts, mix.c and mix2.c) you can replace [source files] with both of the source files. Make sure that the first source file in the list has the same name as the executable file you want. i.e. use mix.c mix2.c and not the other way around. If you reorder them, the executable file will be called "MIX2.EXE". For mix, the command would be:
DOS> wcl386 /l=dos4gw mix.c mix2.c
The resultant executable file will take advantage of your system's extended memory and will not be limited to using only the first 640K. However, it needs the file "dos4gw.exe" in order to run. If you want to be able to use the program generated, make sure that this program is somewhere in your path. (To ensure this you can copy the program into the directory where the compiled program resides). This "dos extender" is bundled with the Watcom C/386 compiler and is freely redistributable.

For Windows:

to compile a program under watcom C/386 for windows use the following:

DOS> wcl386 /l=win386 /zw [source files]
again, replace [source files] with either the complete program (ie neighbor.c) or both parts of the program (ie mix.c mix2.c).

once you have compiled the windows program you are not quite ready to run the program under windows. The final step is to link it with the "windows supervisor". to do this do the following:

DOS> wbind [program] -n
i.e.:
DOS> wbind mix -n
this program will generate [programname].exe. this application will be runnable under windows.

CAVEATS:

1. Make sure that when you use wbind that \watcom\binw is somewhere in your path. if it is not, you may have to tell wbind explicitly where the windows supervisor file is, as in the following example:

   DOS> wbind mix -n  -s c:\watcom\binw\win386.ext
replace the c:\watcom\win386.ext with the full path of win386.ext.

2. The draw programs (drawgram, drawtree) currently do not compile under windows. Compile them for DOS/4GW and use it in a dos shell under windows

Think C for Macintosh

For Symantec's Think C compiler (formerly called Lightspeed C) a "make" utility is not available. Thus you cannot use the Makefile but must compile the programs individually. Here are the steps you should follow to compile a typical program.
  1. Start up Think-C.
  2. Click on "New project" in the Think C project menu. You will be asked to enter the name of the project.
  3. Add the source code for the program to the project. To add sources to the project, you need to click on "add" from the source menu. You will need to add the sources from the main program (i.e. "neighbor.c" in the case of a program in 1 part or "dnaml.c" and "dnaml2.c" in the case of a 2-part program). You also need to add "interface.c" (included with the distribution) and two things which are included with the think C compiler. The first one is "MacTraps", and is contained within the Think C folder under a directory called "MacLibraries". The second one is "ANSI", and is contained within the Think C folder under a directory called "C Libraries"
  4. Segment the project: After adding each of the sources to the project, you need to segment the project. This means that every source file is contained within its own 32K segment. In order to do this within Think C, you can click on a source file name in the Think C project window (the window that lists each of the sources) and drag it down to the bottom of the source list. After you have done this for each of the source files, a dotted line should appear around each source file in the project window.
  5. Set up compile options: The first thing you need to do is set up what sort of project you're compiling, and some of the characteristics of how the memory is set up. To do this, select "Set project type" in the "Project" menu, and make sure it's set up to be an Application with far code and far data. Depending on the hardware you will be running on, you may want to select different compilation options. Most notably, if your machine has a 68881 math coprocessor, enable the use of the coprocessor by selecting "Options" under the "Edit" window, selecting "Compiler settings" through the list at the upper left corner of the display, and then checking the box next to "Generate 68881 instructions".
  6. Compile the project: select "Make" under the source window. After this has completed (assuming that there were no compile errors), you need to generate a mac application. To do this, select "Build Application" under the project menu. Select a name for the application, and think C will create a Macintosh application.
Although this is more tedious than using a Makefile, Think C works very well with the PHYLIP programs and is the compiler we use for creating the Macintosh executables.

Unix

I have already mentioned that under Unix you can use the "make" command to compile programs. This works on all Unix systems. To compile an individual program like dnapars.c you can give the command "make dnapars" or alternatively "cc dnapars.c -lm". When compiling programs that come in two parts, such as dnaml.c and dnaml2.c, you will have to issue three commands, two compile commands and one link command:
cc -C dnaml.c
cc -C dnaml2.c
cc dnaml.o dnaml2.o -lm -o dnaml
where the first two commands produced the object modules dnaml.o and dnaml2.o and the third command links them together into an executable that is called dnaml.

In running the programs, you may sometimes want to put them in background so you can proceed with other work. On systems with a windowing environment they can be put in their own window, and commands like "nice" used to make them have lower priority so that they do not interfere with interactive applications in other windows. If there is no windowing environment, you will want to use an ampersand ("&") after the command file name when invoking it to put the job in the background. You will have to put all the responses to the interactive menu of the program into a file and tell the background job to take its input from that file.

For example: suppose you want to run DNAPARS in a background, taking its input data from a file called sequences.dat, putting its interactive output to file called "screenout", and using a file called "input" as the place to store the interactive input. The file "input" need only contain two lines:

sequences.dat
Y
which is what you would have typed to run the program interactively, in response to the program's request for an input file name if it did not find a file named "infile", in in response the the menu.

To run the program in background, you would simply give the command:

dnapars < input > screenout &
which runs the program with input responses coming from "input" and interactive output being put into file "screenout". The usual output file and tree file will also be created by this run (keep that in mind as if you run any other PHYLIP program from the same directory while this one is running in background you may overwrite the output file from one program with that from the other!).

If you wanted to give the program lower priority, so that it would not interfere with other work, and you have Berkeley Unix type job control facilities in your Unix, you can use the "nice" command:

nice +10 dnapars < input > screenout &
which lowers the priority of the run. To also time the run and put the timing at the end of "screenout", you can do this:
nice +10 ( time dnapars < input ) >& screenout &
which I will not attempt to explain.

You may also want to explore putting the interactive output into the null file "/dev/null" so as to not be bothered with it (but then you cannot look at it to see why something went wrong. If you have problems with creating output files that are too large, you may want to explore carefully the turning off of options in the programs you run.

If you are doing several runs in one, as for example when you do a bootstrap analysis using SEQBOOT, DNAPARS (say), and CONSENSE, you can use an editor to create a "batch file" with these commands:

seqboot < input1 > screenout
mv outfile infile
dnapars < input2 >> screenout
mv treefile infile
consense < input3 >> screenout
and then take the file (say "foofile") containing these commands and give it execute permission by using the command "chmod +x foofile" followed by the command "rehash". Then the job that foofile describes can be run as a single job in background by giving the command "foofile &". Note that you must also have the interactive input commands for SEQBOOT (including the random number seed), DNAPARS, and CONSENSE in the separate files "input1", "input2", and "input3". With Berkeley-style job control the "nice" command can be used within the batch file "foofile" before each program name to reduce the priority with which the programs run.

VMS VAX systems

On the VMS operating system with DEC VAX VMS C the programs will compile without alteration, except that we have to add some extra routines because the "%hd" format in printf and fprintf does not work. These extra routines are in the file VAXFIX.C. The commands for compiling a typical program (DNAPARS) are:
$ DEFINE LNK$LIBRARY SYS$LIBRARY:VAXCRTL
$ CC DNAPARS.C
$ CC VAXFIX.C
$ LINK DNAPARS,VAXFIX
Once you use this "$ DEFINE" statement during a given interactive session, you need not repeat it again as the symbol "LNK$LIBRARY" is thereafter properly defined. The compilation process leaves a file DNAPARS.OBJ in your directory: this can be discarded. The executable program is named DNAPARS.EXE. To run the program one then uses the command:
$ R DNAPARS
The compiler defaults to the filenames "INFILE.", "OUTFILE.", and "TREEFILE.". If the input file "INFILE." does not exist the program will prompt you to type in its name. Note that some commands on VMS such as "TYPE OUTFILE" will fail because the name of the file that it will attempt to type out will be not "OUTFILE." but "OUTFILE.LIS". To get it to type the write file you would have to instead issue the command "TYPE OUTFILE.".

Some of the programs come in several pieces that have to be compiled and linked together. For example, DNAML comes in two pieces, dnaml.c and dnaml2.c. To compile them and link the resulting object files together into one executable, use the commands:

$ DEFINE LNK$LIBRARY SYS$LIBRARY:VAXCRTL
$ CC DNAML.C
$ CC DNAML2.C
$ CC VAXFIX.C
$ LINK DNAML,DNAML2,VAXFIX
This will make an executable called DNAML.EXE plus two ".OBJ" files that can be discarded. Note that when a LINK command is issued the name of the first file (in this case DNAML) becomes the name of the ".EXE" file that is produced by the linker.

To make it easier to compile all of the programs on VMS systems, we have supplied a command file, "compile.com" that will do this. If you install that file and issue the command "@compile" it will compile all of the programs. However it is recommended that you also know how to recompile individual programs so that they can be altered to your purposes.

The programs DRAWGRAM and DRAWTREE both use routines in drawgraphics.c. To compile (for example) DRAWGRAM, use:

$ DEFINE LNK$LIBRARY SYS$LIBRARY:VAXCRTL
$ CC DRAWGRAPHICS.C
$ CC DRAWGRAM.C
$ CC VAXFIX.C
$ LINK DRAWGRAM,DRAWGRAPHICS,VAXFIX
which will create a file called DRAWGRAM.EXE, plus two ".OBJ" files. When you run DRAWGRAM you must have a font file present in your directory, as well as the tree file. If they are not found under their default names the program will prompt you for these. When you are using the interactive previewing feature of DRAWGRAM (or DRAWTREE) on a Tektronix or DEC ReGIS compatible terminal, you will want before running the program to have issued the command:
$ SET TERM/NOWRAP/ESCAPE
so that you do not run into trouble from the VMS line length limit of 255 characters or the filtering of escape characters.

Cray

A number of people (F. James Rohlf, Kent Fiala, Shan Duncan, and Ron DeBry), succeeded in various ways in adapting the Pascal version of PHYLIP to several models of Crays. Recently Cray has been adopting Unicos, a Unix clone, as the operating system for its machines, and this means the Unix instructions should work for compiling the programs on Crays.

However, although the underlying algorithms of most programs, which treat sites independently, should be amenable to vector processors, there are details of the code which might best be changed. In particular within the innermost loops of the programs there are often scalar quantities that are used for temporary bookkeeping. These quantities, such as sum1, sum2, zz, z1, yy, y1, aa, bb, cc, sum, and denom in procedure makenewv of DNAML (and similar quantities in procedure nuview) are there to minimize the number of array references. For vectorizing compilers such as the Cray compilers it will be better to replace them by arrays so that processing can occur simultaneously.

IBM Mainframes running CMS

The following information applies not only to IBM mainframes, but to IBM- compatible mainframes such as Amdahls, Fujitsu, Hitachis, and ICLs when they run IBM operating systems or IBM-compatible operating systems. It does not apply to IBM mainframes running AIX (IBM's version of Unix) as for those one can simply use the Unix instructions above without modification.

Because IBM is IBM, it tried to impose the EBCDIC character code on the world. There are good arguments for and against EBCDIC; in any case, the ASCII (or ISO) code is winning out. I have chosen to distribute PHYLIP in the ASCII character code, as more likely to be readable on more machines. Some characters in ASCII have no equivalent in EBCDIC and get arbitrarily changed when my ASCII files are read into an EBCDIC machine. You may find some characters which look strange when viewed on a 3270 terminal on a CMS system, but we have found none that cause trouble for the compiler.

Andrew Keeffe was asked to investigate how to compile the C version of PHYLIP on our IBM 3090 system, and here is what he has found.

These are the procedures for compiling the phylip package in C on an IBM mainframe.

These instructions were developed using IBM C/370 on an IBM 3090 running VM/XA CMS 5.6 Service Level 201.

If you fetch PHYLIP directly as an ftp binary transfer, getting a compressed tar archive file, as available from our machine, we do not know whether there is an "uncompress" and a "tar" utility available on CMS to extact the files from the archive and translate them from ASCII to EBCDIC. You should ask your computer consultants about that. Alternatively, you could fetch the files to a PCDOS or Unix machine, extract the archives there, and then move the resulting text files for the source code and documentation to the CMS system. If you that, after establishing the connection between the IBM and the other host, type will translate the text files properly.

CMS prefers the names of files to have a minimum of two parts, called the filename (abbreviated fn) and the filetype (abbreviated ft), separated by a space. We have chosen "data" as the filetype, so that "infile" becomes "infile data", "outfile" becomes "outfile data" and so forth.

All commands that you give to the host are shown in UPPER CASE. You can type them in upper or lower case; CMS does not care.

Before compiling, give these commands to CMS:

        SETUP C370
        GLOBAL TXTLIB EDCBASE IBMLIB
It would make sense to put these commands in your profile exec until the compiling and linking is complete.

To compile a single program, such as dnapars.c:

        CC DNAPARS
If there are no errors, the compiler will produce a file with the same filename and a filetype of 'text', DNAPARS TEXT in this case. Now give these commands:
        LOAD DNAPARS
        GENMOD DNAPARS
The genmod command generates an executable module file (DNAPARS MODULE) which may be invoked by typing its name on the command line. Use this procedure to compile all of the phylip programs except dnaml, dnamlk, restml, drawgram, and drawtree.

The source files for dnaml, dnamlk, and restml have been split into two parts. To compile one of these programs, give these commands:

        CC DNAML
        CC DNAML2
        LOAD DNAML DNAML2
        GENMOD DNAML
Proceed similarly for dnamlk and restml.

The draw programs, drawgram and drawtree, both depend on common code which is stored in drawgraphics.c and drawgraphics.h. These names will be truncated to DRAWGRAP C and DRAWGRAP H on the CMS system. The contents of the files are not affected.

Compile the drawgraphics code:

        CC DRAWGRAP
Compile and link the draw programs:
        CC DRAWGRAM
        LOAD DRAWGRAM DRAWGRAP
        GENMOD DRAWGRAM

        CC DRAWTREE
        LOAD DRAWTREE DRAWGRAP
        GENMOD DRAWTREE
If you are having trouble getting the programs running on your machine, contact me. If I can't help, I can at least find out whether there is anyone else who has adapted them to the same machine and put you in touch with them.

Other Computer Systems

As you can see from the variety of different systems on which these programs have been successfully run, there are no serious incompatibility problems with most computer systems. PHYLIP in various past Pascal versions has also been compiled on 8080 and Z80 C/M Systems, Apple II systems running UCSD Pascal, a variety of minicomputer systems such as DEC PDP-11's and HP 1000's, CDC Cyber systems, and so on. We hope gradually to accumulate experience on a wider variety of C compilers. If you succeed in compiling the C version of PHYLIP on a different machine or a different compiler,, I would like to hear the details so that I can include the instructions in a future version of this manual.


Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)