Introduction to Bioinformatics
|
The Human Genome Project
- What is it?
- Gene vs. Genes vs. Genome vs. Genomes
- Find
- Genes
- Controls
- Genetic Variation
- How has it been done/is it being done?
- It has been sequenced (What's a sequence?
What does it consist of? Junk DNA, genes, regulatory
regions,
)
- Estimate of ~30,000 genes (Surprise? Not?
What's a gene?)
- Surprise?
- 3,000,000,000 bp and less than 5% dedicated
to coding?
- Affymetrix claims 60,000 genes on their GeneChips;
- Incyte claims 120,000;
- GenBank claims 49,000 genes;
- UniGene has 89,000 clusters of unique ESTs
- Not?
- No reason to assume humans should have more or
less genes than other organisms
- What's a gene, anyway?
- "One gene : one protein" hypothesis
predates the description of the chemical structure
of DNA by Watson and Crick in 1953. However,
this theory has been patched up with information
about multi-subunit proteins, introns/splicing,
alternative splicing, pseudo-genes, gene duplication
etc., so that the original concept is almost
useless.
- Rationale for uncoupling "gene" and
"protein"
- Challenges
- Gaps are us
- Still hard to find a complete and uninterrupted genomic sequence for any gene
- Still can't identify pseudogenes with certainty
- Sequences of many individuals are needed to identify commonalities
- All current gene-finding tools are based on patterns
extracted from known genes and, therefore, biased
against finding "unusual" or rarely expressed genes
Genomics
- Whole new perspective: genes vs. genomes
- Identification
- Susceptability to disease
- Relationships
- Phylogenetics
- Genomics Technologies
- Automated DNA sequencing
- Automated annotation of sequences
- Storage and retrieval of genetic information
- DNA microarrays
- gene expression (measures RNA level)
- single nucleotide polymorphisms
- Protein chips (SELDI, etc.)
- Protein-protein interactions
- Microarray Data Analysis
- Clustering and pattern detection
- Data mining and visualization
- Controls and normalization of results
- Statistical validation
- Linkage between gene expression data and gene sequence/function/metabolic pathways databases
- Discovery of common sequences in co-regulated genes
- Meta-studies using data from multiple experiments
- Impact on Bioinformatics
- Genomics produces high-throughput, high-quality data, and bioinformatics provides the analysis and interpretation of these data sets
- It is impossible to separate genomics laboratory technologies from the computational tools required for data analysis
- Pharmacogenomics
- The use of DNA sequence information to measure and predict the reaction of individuals to drugs
- Personalized drugs
- Faster clinical trials through use of selected trial populations
- Fewer side effects (Toxicogenomics)
Sequencing and Annotation of Genomes
- What is it?
From raw genome data to an annotated genome.
- How is it being done?
Search for open reading frames; homology with expressed
sequence tags; homology with known genes from other
organisms; searching for known patterns (TATA, GC-rich,
exon-intron splice sites
.)
Bioinformatics as
Pattern Recognition: 'We've done it all along!".
- DNA: CG-rich; AT-rich; TATA-boxes; RFLPs; SNPs; transcriptional binding sites
- RNA: poly-A tails; hairpins
- Proteins: transmembrane; Alpha-helices; Beta-sheets; domains; motifs
Start out with paper & pencil exercise a and b = three vs ten; switch to computers
Challenges of Molecular Biology Computing.
- The huge data set
- Lots of new sequences being added (automated sequencers, Human Genome project, EST sequencing)
- GenBank has over10 Billion bases and is doubling every year (technology - hardware -storage and access- as well as software -sorting and searching- issues)
- How to have computers keep up?
- New types of biological data
- Microarrays
- Multi-level maps: genetic, physical, sequence, annotation
- Networks of protein-protein interactions
- Homologous genes (cross-species relationships)
- Chromosome organization (cross-species relationships)
- Polymorphisms (genetic variation)
- Define human genetic variations (SNPs; between
any two people there is one SNP every 1250 bases)
- All Biologists will use gene sequence information in their daily work
- Similarity Searching of databanks
- Alignments
- Structure-Function relationships
- Sequencing
- Annotation
- Phylogenetics
Implications for Biomedicine, Law Enforcement,
Information Technology, Health Care, Family Planning,
Economy
- Physicians will use genetic information to diagnose
and treat disease
- Virtually all medical conditions (other than trauma)
have a genetic component
- Drugs will be individualized
- Virtually all treatments have genetic components
- Efficacy of drugs depends on genetic constitution
- Appearance and severity of side effects depend
on genetic constitution
- Development of new treatments
- Gene therapy
- New drugs that are targeted more specifically
- Development of new methods to identify suspects
- Development of pattern recognition tools, data bases,
query tools, and algorithms will serve other fields
- Getting married with someone "with the gene for"
Huntington Disease
- Employment vs. occupational hazards; new technologies
and businesses, new careers
- Statistics and risk analysis; development of skills
to critically evaluate products and procedures, claims,
possibilities and risks
The Changing Role of the Biologist in the Genomics
Age
- All biologists will use computers
- In computational biology
- Internet provides wealth of biological information
- Generation of data (automated lab equipment
and storage devices)
- Analysis of data (search, sorting, and modeling
software)
- Biologists will need new skills
- To find required information efficiently
- To know the right tool for a given problem/question
- To get a job done with tools available
- To be flexible and adaptable (jobs, projects, computers, hardware, software, change)
- Biologists will need adequate support
- Network connection (lifeline of the scientist)
- Advice regarding hard- and software
- Adequate manuals and trouble-shooting information
- Technical support
- The job of the biologist is changing
- As more biological information becomes available …
- Biologists will spend more time using computers
- Biologists will spend more time on data analysis than on conducting biochemical laboratory experiments
- Biology will become a more quantitative science (think how periodic table and atomic theory affected chemistry)
Bioinformatics Software
- Super computers and workstations (Unix; GCG)
- The Web (see links.htm)
- Programs for PCs (Windows, Mac)
|
|
|
|
|
|
|