If you don't use ALIGN, you can edit your alignment so that it can be used with MAKEINF. This section tells you how to do that.
The following lines represent all the features in the alignment file which are necessary for application of MAKEINF:
0 xen 1 wnt1 2 wnt2 3 wnt3 4 wnt4 ALIGNMENT SGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKH---R-ESRGWVETLRAKYALFKPPTERDLVYY 66 3 SGSCSLRTCWMRLPPFRSVGDALKDRFDGASKVTYSNNGSNRWGSRSDPPHLEPENPTHALPSSQDLVYF 70 0 SGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNRGSNR-ASRAELLRLEPEDPAHKPPSPHDLVYF 69 1 SGSCTLRTCWLAMADFRKTGDYLWRKYNGAIQVVMNQD---G-TGFTVA------NERFKKPTKNDLVYF 60 2 SGSCEVKTCWRAVPPFRQVGHALKEKFDGATEVEPRRV---G-SSRALVPR----NAQFKPHTDEDLVYL 62 4 **** *** * * * * **** ENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCVF 123 3 EKSPNFCSPSEKNGTPGTTGRICNSTSLGLDGCELLCCGRGYRSLAEKVTERCHCTF 127 0 EKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLCCGRGHRTRTQRVTERCNCTF 126 1 ENSPDYCIRDREAGSLGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKF 117 2 EPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCHCRF 119 4 * ** * * ** * ** * * * ***** * * *The features of the alignment format that are used as landmarks by MAKEINF are the following:
fscanf( sourceali, "%d", &taxnum ); /* dumps the length number */in function 'findname'.
SGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKH---R-ESRGWVETLRAKYALFKPPTERDLVYY 66 3 {SGSCSLRTCWMRLPPFRSVGDALKDRFDGASKVTYSNNGSNRWGSRSDPPHLEPENPTHALPSSQDLVYF 70 0} SGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNRGSNR-ASRAELLRLEPEDPAHKPPSPHDLVYF 69 1 SGSCTLRTCWLAMADFRKTGDYLWRKYNGAIQVVMNQD---G-TGFTVA------NERFKKPTKNDLVYF 60 2 SGSCEVKTCWRAVPPFRQVGHALKEKFDGATEVEPRRV---G-SSRALVPR----NAQFKPHTDEDLVYL 62 4 **** *** * * * * **** ENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCVF 123 3 {EKSPNFCSPSEKNGTPGTTGRICNSTSLGLDGCELLCCGRGYRSLAEKVTERCHCTF 127 0} EKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLCCGRGHRTRTQRVTERCNCTF 126 1 ENSPDYCIRDREAGSLGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKF 117 2 EPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCHCRF 119 4 * ** * * ** * ** * * * ***** * * *In this example, sequence number 0 will not be written to the output file. Several sequences can be excluded according to the following format:
SGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKH---R-ESRGWVETLRAKYALFKPPTERDLVYY 66 3 {SGSCSLRTCWMRLPPFRSVGDALKDRFDGASKVTYSNNGSNRWGSRSDPPHLEPENPTHALPSSQDLVYF 70 0 SGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNRGSNR-ASRAELLRLEPEDPAHKPPSPHDLVYF 69 1} SGSCTLRTCWLAMADFRKTGDYLWRKYNGAIQVVMNQD---G-TGFTVA------NERFKKPTKNDLVYF 60 2 SGSCEVKTCWRAVPPFRQVGHALKEKFDGATEVEPRRV---G-SSRALVPR----NAQFKPHTDEDLVYL 62 4 **** *** * * * * **** ENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCVF 123 3 {EKSPNFCSPSEKNGTPGTTGRICNSTSLGLDGCELLCCGRGYRSLAEKVTERCHCTF 127 0 EKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLCCGRGHRTRTQRVTERCNCTF 126 1} ENSPDYCIRDREAGSLGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKF 117 2 EPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCHCRF 119 4 * ** * * ** * ** * * * ***** * * *or, if they are interspersed, like this:
SGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKH---R-ESRGWVETLRAKYALFKPPTERDLVYY 66 3 {SGSCSLRTCWMRLPPFRSVGDALKDRFDGASKVTYSNNGSNRWGSRSDPPHLEPENPTHALPSSQDLVYF 70 0} SGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNRGSNR-ASRAELLRLEPEDPAHKPPSPHDLVYF 69 1 SGSCTLRTCWLAMADFRKTGDYLWRKYNGAIQVVMNQD---G-TGFTVA------NERFKKPTKNDLVYF 60 2 {SGSCEVKTCWRAVPPFRQVGHALKEKFDGATEVEPRRV---G-SSRALVPR----NAQFKPHTDEDLVYL 62 4} **** *** * * * * **** ENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCVF 123 3 {EKSPNFCSPSEKNGTPGTTGRICNSTSLGLDGCELLCCGRGYRSLAEKVTERCHCTF 127 0} EKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLCCGRGHRTRTQRVTERCNCTF 126 1 ENSPDYCIRDREAGSLGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKF 117 2 {EPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCHCRF 119 4} * ** * * ** * ** * * * ***** * * *KEEP TRACK OF HOW MANY SEQUENCES ARE COMMENTED OUT, SINCE THE PROGRAM WILL ASK YOU FOR THE NUMBER OF SEQUENCES TO BE ALIGNED.
SGSCEVKTCWWAQPDFRAIGDFLKDKYDSASEMVVEKH[---R-ESRGWVETLRAK]YALFKPPTERDLVYY 66 3 {SGSCSLRTCWMRLPPFRSVGDALKDRFDGASKVTYSNNGSNRWGSRSDPPHLEPENPTHALPSSQDLVYF 70 0} SGSCTVRTCWMRLPTLRAVGDVLRDRFDGASRVLYGNR[GSNR-ASRAELLRLEPE]DPAHKPPSPHDLVYF 69 1 SGSCTLRTCWLAMADFRKTGDYLWRKYNGAIQVVMNQD[---G-TGFTVA------]NERFKKPTKNDLVYF 60 2 SGSCEVKTCWRAVPPFRQVGHALKEKFDGATEVEPRRV[---G-SSRALVPR----]NAQFKPHTDEDLVYL 62 4 **** *** * * * * **** ENSPNFCEPNPETGSFGTRDRTCNVTSHGIDGCDLLCCGRGHNTRTEKRKEKCHCVF 123 3 {EKSPNFCSPSEKNGTPGTTGRICNSTSLGLDGCELLCCGRGYRSLAEKVTERCHCTF 127 0} EKSPNFCTYSGRLGTAGTAGRACNSSSPALDGCELLCCGRGHRTRTQRVTERCNCTF 126 1 ENSPDYCIRDREAGSLGTAGRVCNLTSRGMDSCEVMCCGRGYDTSHVTRMTKCGCKF 117 2 EPSPDFCEQDIRSGVLGTRGRTCNKTSKAIDGCELLCCGRGFHTAQVELAERCHCRF 119 4 * ** * * ** * ** * * * ***** * * *Note that a sequence that is excluded need not get square brackets.
> xen TCAGGATCCTGCTCCCTCAGGACGTGCTGGATGCGGCTTCCCCCCTTCCGTTCAGTTGGG GATGCTTTGAAGGATCGTTTTGATGGAGCCTCTAAAGTGACCTACAGCAACAATGGCAGC AATCGATGGGGTTCTCGCAGTGACCCACCTCACCTAGAACCTGAAAACCCCACACATGCT CTGCCATCATCCCAGGATCTTGTCTATTTTGAGAAGTCTCCTAACTTCTGCAGCCCTAGT GAAAAGAATGGAACTCCTGGAACCACAGGGCGAATATGTAACAGCACTTCATTGGGACTA GATGGATGTGAACTCTTGTGCTGTGGTAGAGGATACCGGAGTCTGGCTGAAAAAGTCACT GAACGGTGCCATTGCACATTT*The salient features of this format are the 'GREATER THAN' symbol > , the NAME ON THE LINE FOLLOWING IT, the SEQUENCE (in capital or lowercase characters), and the TERMINATION-SYMBOL ('*'). THESE FEATURES ARE ESSENTIAL.
Alignment file to be read: ex.ali Nucleotide file to be read: ex.nuc Destination file to be written: infile Total number of sequences in alignment: 5 Number of sequences to be used: 4 Nucleic acid or Protein coding sequence? (n/p): p Nuclear or mitochondrial genetic code? (n/m): n Enter a number between 1 and 5, for the codon position you wish to analyze: 1 for first, 2 for second, 3 for third 4 for first plus second, 5 for all. (1-5): 4 Conversion of first positions to degenerate base? (y/n): y Use nAmes or nUmbers as identifiers? (a/u): aLet's go through this one by one:
The program starts by writing 'Alignment file to be read: ', i.e. it asks
you for the name of the file which holds the amino acid alignment. The user,
in this case, specified the name 'ex.ali', and hit
Since we are dealing with protein coding sequences, the next question is
about which positions we want to use. Option 4 means: first plus second
positions. As we want to eliminate silent changes in first positions of
arginine and leucine codons, we anser yes ('y'); and we want to have the
program use the names ('a') in the output file.
When you've entered all these options, the program bounces the following
back at you:
That's it. If you have questions that this manual does not answer, send
e-mail to
arend@mendel.berkeley.edu
Yours,
Arend Sidow
Nucleotide sequences in: ex.nuc
Amino acid alignment source: ex.ali
Nucleotide alignment destination: infile
First plus second codon positions will be used.
L and R 1st positions will be converted to Y and M.
Names will be used to identify sequences.
Frequencies of A, C, G, T:
0.26529 0.24032 0.27189 0.22250
The last two lines appear if you are using first positions of protein
coding sequences, and you requested that first positions of leucine
and/or arginine codons be converted to their degenerate base. When you use
PHYLIP, you should input these numbers, rather than use empirical base
frequencies, especially if silent positions in your sequences are not at
compositional equilibrium.
Back to the main PHYLIP page
Back to the SEQNET home page
Maintained 15 Jul 1996 -- by Martin Hilbers(e-mail:M.P.Hilbers@dl.ac.uk)