Bioinformatics: decoding the genome

Abstract

Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological approaches and data requirements for genome annotation will be discussed, as well as user interfaces for exploring genomes. Polymorphic variation in the human genome and susceptibility to disease : One of the main features revealed by the completion of the human genome is the large amount of polymorphic sequence variation present in human populations, such that on average any two chromosomes differ every 600 - 800 base pairs. The majority of these sequence variants are Single Nucleotide Polymorphisms (SNPs), although other types of polymorphisms exist. So far around 5 million SNPs have been validated, and an international consortium has been set-up to characterize the main features of human variation in different populations (www.hapmap.org). Although most of the sequence variation in the human genome is thought to be neutral, a fraction of it is known to have functional consequences, for instance, modifying the activity/function of a protein or affecting the spatio-temporal regulation of a gene. As such, functional sequence variants underlie a substantial proportion of phenotypic variability including quantitative traits, susceptibility to common disorders (for example Diabetes, Asthma), and differential response to drugs. One of the main challenges of modern genomics is to identify specific SNPs associated to phenotypic states (discrete or continuous). Over the last two years there have been remarkable advances in genotyping technology and conceptual frame-works that make it possible for the first time to perform truly genome-wide studies. However substantial challenges remain concerning how best to extract the information in view of problems such as multiple hypothesis testing and non-additive gene-gene and gene-environment interactions. Finding the genes in the genome and associating them with a particular disease. Building models of biological processes from the information in the data, and using simulation to make further predictions : In the post-genomic era, our attention is turning to how to assemble the "pieces of the jigsaw puzzle" together into realistic and dynamic models of complex biological systems, and to try to understand what may be the fundamental principles governing how cells, organs and organisms have come about, and can evolve. One might say that this is a search for a biological "theory of everything"! In this talk, we examine some possible such principles, and how they could be used to infer computational models from experimental data -- a discipline now becoming known as "systems biology." Systems biology poses many interesting experimental and computational challenges. By examining several illustrative examples we hope to show how it might be possible to predict the behaviours of complex biological systems. The examples we choose are: (a) genetic and protein interaction networks at the intracellular level (b) simulation studies of whole organs, which show how models at the cellular level can be integrated into complete and useful models of entire systems such as the heart. We also briefly examine some of the implications of systems biology for drug discovery, human health and the environment. Measuring protein composition and protein 3-D structures - Important information in the design of new drugs : Molecular dynamics can be used to simulate the time evolution of microscopic system. Biological systems like DNA, lipid membranes and, most importantly, proteins have been intensively studied using these techniques. The various steps involved in molecular dynamics simulations of proteins will be presented, together with their applications to biological phenomenon. In particular, results of simulations performed on important proteins of the immune system will be given and how these data can be used to optimize cancer treatment will be shown. Using DNA microarrays as powerful detectors of the "genes at work", and thereby determining the mechanisms that control our bodies and our health - From Gene Chips to Regulatory Networks : The completion of the draft sequence of the human genome has raised public awareness of “genomics” and of the ways in which the emerging technologies of the genomics “revolution” will have direct applications to research as well as patient care.This information will be instrumental to decipher the role and function of the various elements present on our chromosomes. Microarrays, and in particular Affymetrix GeneChips®, have emerged as one very powerful technology to investigate our genome. These small glass arrays contain millions of short oligonucleotide (DNA strands) synthesized by photolithography. These tools enable to query for example the level of gene expression or the interactions of regulatory proteins with the DNA in a highly parallel manner. Cross comparisons and integration of the data using appropriate bioinformatics approaches lead to the elucidation of biological regulatory networks

    Similar works

    Full text

    thumbnail-image

    Available Versions