Bioinformatics - Pairwise alignments in searches
|
|
Searching databases
for sequences is an extremely important bioinformatics routine. Learn how
alignment programs are used to query databases for sequences.
|
|
Searching
the Internet for information often necessitates the use of so-called
search engines. Bioinformatics has its own search engines which are specialized
tools to search databases containing nucleotide and amino acid sequence
data. These databases are located in a variety of different places around
the world, most notably in the US at the National Center for Biotechnology
Information (NCBI), in England at the European Molecular Biology Laboratories
(EMBL), in Switzerland (SwissProt), and the DNA Database in Japan (DDBJ).
The sheer number of sequences in these databases prevents direct sequence-to-sequence
alignments and search tools have to be quite sophisticated in order to complete
searches in a reasonable amount of time. |
BLAST
is one of the more known sequence search engines. BLAST stands for Basic
Local Alignment Search Tool. BLAST finds sequences in databases that
are similar and related to subsequences in a query sequence. It returns a brief title
line describing the nature of the search hit, links to the database entry for the hit,
shows the actual sequence alignment between query and hit sequence, and validates each
search hit a 'score' and a so-called 'E-value'.
The two scoring parameters 'score' and 'E-value' refer to the quality of the search result. Scores are determined
based on the number of matches, mismatches and gaps in the sequence alignment. However, they can be somewhat
misleading in that scores strongly depend on the length of the query sequence.
|
Perform a BLAST search with this sequence:
- Highlight and copy the sequence
- Open Gene Boy
- Click 'Your Sequence', paste the sequence into the central window,
change Your Sequence on top into a name of your choosing, select
'Save'
- Open 'WWW Tools', select 'Sequence Search'
- In the next window select 'Format', wait for the results to come up
- Examine the different parts of the results page. Try to resolve questions
first by consulting with the help provided under 'BLAST FAQs'.
- Clicking on a score moves further down the page to a view of the alignment between the match and the query.
- Clicking on a 'gi|.....' number will link you to the database entry for the respective match. Use the browser's
'Back'-button to move back to the result page.
- Which organisms are the matching DNA sequences from?
- What scores and E-values can you find for different matches? Read
the definition for score and for E-value under 'BLAST FAQs'.
- Check Genetic Origins - mtDNA - Recipes. What sequence was the primer derived from?
|
The two scoring parameters 'score' and 'E-value' are provided to judge the quality of the
search result. The score is determined based on the number of matches, mismatches and gaps in the sequence
alignment. However, score can be somewhat
misleading in that it strongly depends on the length of the query sequence.
The E-value on the other hand is seen as a more informative parameter to judge the validity of a search result.
It provides an estimate for the possibility that a match is similar to the query sequence just by chance.
The higher the E-value the more likely it is that the result matches the query just by chance. The lower the E-value
the more significant the search result. (As if this wouldn't be confusing enough, the E-value is often expressed as e to the power
of a negative number. While this negative number can be quite big, e.g. 56, e-56 is a rather small number and
an E-value that you would want to see for meaningful search results.)
|
|
|