Bioinformatics in the Classroom

Introduction

Introduction to Bioinformatics

  • The Human Genome Project
    • What is it?
      • Gene vs. Genes vs. Genome vs. Genomes
    • Find
      • Genes
      • Controls
      • Genetic Variation
    • How has it been done/is it being done?
      • It has been sequenced (What's a sequence? What does it consist of? Junk DNA, genes, regulatory regions, …)
      • Estimate of ~30,000 genes (Surprise? Not? What's a gene?)
        • Surprise?
          • 3,000,000,000 bp and less than 5% dedicated to coding?
          • Affymetrix claims 60,000 genes on their GeneChips;
          • Incyte claims 120,000;
          • GenBank claims 49,000 genes;
          • UniGene has 89,000 clusters of unique ESTs
        • Not?
          • No reason to assume humans should have more or less genes than other organisms
        • What's a gene, anyway?
          • "One gene : one protein" hypothesis predates the description of the chemical structure of DNA by Watson and Crick in 1953. However, this theory has been patched up with information about multi-subunit proteins, introns/splicing, alternative splicing, pseudo-genes, gene duplication etc., so that the original concept is almost useless.
          • Rationale for uncoupling "gene" and "protein"
    • Challenges
      • Gaps are us
      • Still hard to find a complete and uninterrupted genomic sequence for any gene
      • Still can't identify pseudogenes with certainty
      • Sequences of many individuals are needed to identify commonalities
      • All current gene-finding tools are based on patterns extracted from known genes and, therefore, biased against finding "unusual" or rarely expressed genes
  • Genomics
    • Whole new perspective: genes vs. genomes
      • Identification
      • Susceptability to disease
      • Relationships
      • Phylogenetics
    • Genomics Technologies
      • Automated DNA sequencing
      • Automated annotation of sequences
      • Storage and retrieval of genetic information
      • DNA microarrays
        • gene expression (measures RNA level)
        • single nucleotide polymorphisms
      • Protein chips (SELDI, etc.)
      • Protein-protein interactions
    • Microarray Data Analysis
      • Clustering and pattern detection
      • Data mining and visualization
      • Controls and normalization of results
      • Statistical validation
      • Linkage between gene expression data and gene sequence/function/metabolic pathways databases
      • Discovery of common sequences in co-regulated genes
      • Meta-studies using data from multiple experiments
    • Impact on Bioinformatics
      • Genomics produces high-throughput, high-quality data, and bioinformatics provides the analysis and interpretation of these data sets
      • It is impossible to separate genomics laboratory technologies from the computational tools required for data analysis
    • Pharmacogenomics
      • The use of DNA sequence information to measure and predict the reaction of individuals to drugs
      • Personalized drugs
      • Faster clinical trials through use of selected trial populations
      • Fewer side effects (Toxicogenomics)
  • Sequencing and Annotation of Genomes
    • What is it?
      From raw genome data to an annotated genome.
    • How is it being done?
      Search for open reading frames; homology with expressed sequence tags; homology with known genes from other organisms; searching for known patterns (TATA, GC-rich, exon-intron splice sites….)
  • Bioinformatics as Pattern Recognition: 'We've done it all along!".
    • DNA: CG-rich; AT-rich; TATA-boxes; RFLPs; SNPs; transcriptional binding sites
    • RNA: poly-A tails; hairpins
    • Proteins: transmembrane; Alpha-helices; Beta-sheets; domains; motifs
      Start out with paper & pencil exercise a and b = three vs ten; switch to computers
  • Challenges of Molecular Biology Computing.
    • The huge data set
      • Lots of new sequences being added (automated sequencers, Human Genome project, EST sequencing)
      • GenBank has over10 Billion bases and is doubling every year (technology - hardware -storage and access- as well as software -sorting and searching- issues)
      • How to have computers keep up?
    • New types of biological data
      • Microarrays
      • Multi-level maps: genetic, physical, sequence, annotation
      • Networks of protein-protein interactions
      • Homologous genes (cross-species relationships)
      • Chromosome organization (cross-species relationships)
      • Polymorphisms (genetic variation)
      • Define human genetic variations (SNPs; between any two people there is one SNP every 1250 bases)
    • All Biologists will use gene sequence information in their daily work
      • Similarity Searching of databanks
      • Alignments
      • Structure-Function relationships
      • Sequencing
      • Annotation
      • Phylogenetics
  • Implications for Biomedicine, Law Enforcement, Information Technology, Health Care, Family Planning, Economy
    • Physicians will use genetic information to diagnose and treat disease
      • Virtually all medical conditions (other than trauma) have a genetic component
    • Drugs will be individualized
      • Virtually all treatments have genetic components
      • Efficacy of drugs depends on genetic constitution
      • Appearance and severity of side effects depend on genetic constitution
    • Development of new treatments
      • Gene therapy
      • New drugs that are targeted more specifically
    • Development of new methods to identify suspects
    • Development of pattern recognition tools, data bases, query tools, and algorithms will serve other fields
    • Getting married with someone "with the gene for" Huntington Disease
    • Employment vs. occupational hazards; new technologies and businesses, new careers
    • Statistics and risk analysis; development of skills to critically evaluate products and procedures, claims, possibilities and risks
  • The Changing Role of the Biologist in the Genomics Age
    • All biologists will use computers
      • In computational biology
        • Internet provides wealth of biological information
        • Generation of data (automated lab equipment and storage devices)
        • Analysis of data (search, sorting, and modeling software)
    • Biologists will need new skills
      • To find required information efficiently
      • To know the right tool for a given problem/question
      • To get a job done with tools available
      • To be flexible and adaptable (jobs, projects, computers, hardware, software, change)
    • Biologists will need adequate support
      • Network connection (lifeline of the scientist)
      • Advice regarding hard- and software
      • Adequate manuals and trouble-shooting information
      • Technical support
    • The job of the biologist is changing
      • As more biological information becomes available …
        • Biologists will spend more time using computers
        • Biologists will spend more time on data analysis than on conducting biochemical laboratory experiments
        • Biology will become a more quantitative science (think how periodic table and atomic theory affected chemistry)
  • Bioinformatics Software
    • Super computers and workstations (Unix; GCG)
    • The Web (see links.htm)
    • Programs for PCs (Windows, Mac)