Bioinformatics - Multiple sequence comparisons
|
|
Alignments of more than two sequences are dubbed
"multiple sequence alignments". This technique is used for the determination of the relationships among several
sequences and the construction of phylogenetic trees.
|
|
Determining
family history is a favorite pastime, generating trees of the relationships
among family members. In science and medicine, researchers often have to
determine the relationships and descendence of DNA sequences and proteins
to hunt down diseases or to determine the level of relatedness among organisms,
proteins, genes, or plain sequences. Bioinformatically, this problem is
solved by first identifying the two sequences that are most closely related,
and aligning them. Then, increasingly divergent sequences are added to the
alignment to generate a multiple sequence alignment of all sequences. It
is obvious that this process can take quite some time, depending on the
number, length, and degree of similarity of the sequences. |
ClustalW
is one of the more well-known tools for multiple sequence alignments. It
has been around for several years and has not been surpassed by newer tools.
In this exercise perform a multiple sequence alignment to determine the
relationship among the env-genes from HIV and SIV DNA isolated from
a variety of primates.
- Open The DNALC
BioServer tool suite
- Under 'Sequence Server' select either 'Enter' or register and log in as a registered user (recommended)
- Open 'Manage Groups'
- Wait until the 'Classes' window has loaded then, on the upper right
hand corner, find 'Sequence sources:' and click on the arrow head right
underneath it (to the right of the word 'Classes')
- Select 'Public'
- Find 'HIV/SIV env', check the check box to the left of it, and select
'OK' on the bottom of this window
- On the workspace you will now fund one sequence displayed, view it
by clicking 'OPEN'. (You will not be able to edit the sequence unless
you are the one who has entered it as a registered user.
)
- Select 'DONE' when you are
done viewing the sequence
- In order to pull more sequences onto the workspace move your cursor
to the arrow head to the right of the word 'None', then click it.
- Select another sequence until you pulled all sequences onto the workspace
- Check the sequences you wish to align (all), make sure that 'Align:
CLUSTALW' shows in the window next to 'COMPARE', and click 'COMPARE'
- Wait for the alignment to be displayed. Please be patient as this may take several minutes.
- View the alignment and determine where the sequences differ from each other.
- How are differences indicated in the output?
- How many differences can you identify?
- Are the differences distributed evenly among the sequences or are there some that are more alike among each other
with others?
- Try to identify the sequences which deviate a lot from the others and redo the ClustalW alignment after unchecking those.
|
Phylogenetic
relationships are inferred by sequence similarity. Most phylogenetic
trees are built upon performing a multiple sequence alignment, the result
of which is then displayed as a tree instead of an alignment and scores.
Many phylogenetic tools are available on the Internet, however, all need
to be viewed with healthy skepticism: the mathematical determination of
similarities and differences of sequences does not necessarily reflect true
phylogenetic relationships. In this exercise use BioServers to draw a phylogenetic
tree of env-genes from HIV and SIV DNA isolates from a variety of
primates.
- Go through steps one through nine above
- Check the sequences you wish to align (all), make sure that 'Phylogenetic Tree' shows in the
window next to 'COMPARE' and click 'COMPARE'
- Wait for the tree to be displayed. This may take several minutes, please be patient.
- Change the settings to 'Phenogram' and 'Yes'.
- View the tree and determine the relationships between the different genes.
|
|
|