Bioinformatics - Pairwise alignments in searches

Searching databases for sequences is an extremely important bioinformatics routine. Learn how alignment programs are used to query databases for sequences.
Searching the Internet for information often necessitates the use of so-called search engines. Bioinformatics has its own search engines which are specialized tools to search databases containing nucleotide and amino acid sequence data. These databases are located in a variety of different places around the world, most notably in the US at the National Center for Biotechnology Information (NCBI), in England at the European Molecular Biology Laboratories (EMBL), in Switzerland (SwissProt), and the DNA Database in Japan (DDBJ). The sheer number of sequences in these databases prevents direct sequence-to-sequence alignments and search tools have to be quite sophisticated in order to complete searches in a reasonable amount of time.
BLAST is one of the more known sequence search engines. BLAST stands for Basic Local Alignment Search Tool. BLAST finds sequences in databases that are similar and related to subsequences in a query sequence. It returns a brief title line describing the nature of the search hit, links to the database entry for the hit, shows the actual sequence alignment between query and hit sequence, and validates each search hit a 'score' and a so-called 'E-value'.
The two scoring parameters 'score' and 'E-value' refer to the quality of the search result. Scores are determined based on the number of matches, mismatches and gaps in the sequence alignment. However, they can be somewhat misleading in that scores strongly depend on the length of the query sequence.
Perform a BLAST search with this sequence:
TTAACTCCACCATTAGCACC
  1. Highlight and copy the sequence


  2. Open Gene Boy


  3. Click 'Your Sequence', paste the sequence into the central window, change Your Sequence on top into a name of your choosing, select 'Save'


  4. Open 'WWW Tools', select 'Sequence Search'


  5. In the next window select 'Format', wait for the results to come up


  6. Examine the different parts of the results page. Try to resolve questions first by consulting with the help provided under 'BLAST FAQs'.


  7. Clicking on a score moves further down the page to a view of the alignment between the match and the query.


  8. Clicking on a 'gi|.....' number will link you to the database entry for the respective match. Use the browser's 'Back'-button to move back to the result page.


  9. Which organisms are the matching DNA sequences from?


  10. What scores and E-values can you find for different matches? Read the definition for score and for E-value under 'BLAST FAQs'.

  11. Check Genetic Origins - mtDNA - Recipes. What sequence was the primer derived from?
The two scoring parameters 'score' and 'E-value' are provided to judge the quality of the search result. The score is determined based on the number of matches, mismatches and gaps in the sequence alignment. However, score can be somewhat misleading in that it strongly depends on the length of the query sequence.
The E-value on the other hand is seen as a more informative parameter to judge the validity of a search result. It provides an estimate for the possibility that a match is similar to the query sequence just by chance. The higher the E-value the more likely it is that the result matches the query just by chance. The lower the E-value the more significant the search result. (As if this wouldn't be confusing enough, the E-value is often expressed as e to the power of a negative number. While this negative number can be quite big, e.g. 56, e-56 is a rather small number and an E-value that you would want to see for meaningful search results.)
To similarity outline page                  To similarity views page