13 research outputs found

    Bioinformatics: decoding the genome

    No full text
    Extracting the fundamental genomic sequence from the DNA From Genome to Sequence : Biology in the early 21st century has been radically transformed by the availability of the full genome sequences of an ever increasing number of life forms, from bacteria to major crop plants and to humans. The lecture will concentrate on the computational challenges associated with the production, storage and analysis of genome sequence data, with an emphasis on mammalian genomes. The quality and usability of genome sequences is increasingly conditioned by the careful integration of strategies for data collection and computational analysis, from the construction of maps and libraries to the assembly of raw data into sequence contigs and chromosome-sized scaffolds. Once the sequence is assembled, a major challenge is the mapping of biologically relevant information onto this sequence: promoters, introns and exons of protein-encoding genes, regulatory elements, functional RNAs, pseudogenes, transposons, etc. The methodological approaches and data requirements for genome annotation will be discussed, as well as user interfaces for exploring genomes. Polymorphic variation in the human genome and susceptibility to disease : One of the main features revealed by the completion of the human genome is the large amount of polymorphic sequence variation present in human populations, such that on average any two chromosomes differ every 600 - 800 base pairs. The majority of these sequence variants are Single Nucleotide Polymorphisms (SNPs), although other types of polymorphisms exist. So far around 5 million SNPs have been validated, and an international consortium has been set-up to characterize the main features of human variation in different populations (www.hapmap.org). Although most of the sequence variation in the human genome is thought to be neutral, a fraction of it is known to have functional consequences, for instance, modifying the activity/function of a protein or affecting the spatio-temporal regulation of a gene. As such, functional sequence variants underlie a substantial proportion of phenotypic variability including quantitative traits, susceptibility to common disorders (for example Diabetes, Asthma), and differential response to drugs. One of the main challenges of modern genomics is to identify specific SNPs associated to phenotypic states (discrete or continuous). Over the last two years there have been remarkable advances in genotyping technology and conceptual frame-works that make it possible for the first time to perform truly genome-wide studies. However substantial challenges remain concerning how best to extract the information in view of problems such as multiple hypothesis testing and non-additive gene-gene and gene-environment interactions. Finding the genes in the genome and associating them with a particular disease. Building models of biological processes from the information in the data, and using simulation to make further predictions : In the post-genomic era, our attention is turning to how to assemble the "pieces of the jigsaw puzzle" together into realistic and dynamic models of complex biological systems, and to try to understand what may be the fundamental principles governing how cells, organs and organisms have come about, and can evolve. One might say that this is a search for a biological "theory of everything"! In this talk, we examine some possible such principles, and how they could be used to infer computational models from experimental data -- a discipline now becoming known as "systems biology." Systems biology poses many interesting experimental and computational challenges. By examining several illustrative examples we hope to show how it might be possible to predict the behaviours of complex biological systems. The examples we choose are: (a) genetic and protein interaction networks at the intracellular level (b) simulation studies of whole organs, which show how models at the cellular level can be integrated into complete and useful models of entire systems such as the heart. We also briefly examine some of the implications of systems biology for drug discovery, human health and the environment. Measuring protein composition and protein 3-D structures - Important information in the design of new drugs : Molecular dynamics can be used to simulate the time evolution of microscopic system. Biological systems like DNA, lipid membranes and, most importantly, proteins have been intensively studied using these techniques. The various steps involved in molecular dynamics simulations of proteins will be presented, together with their applications to biological phenomenon. In particular, results of simulations performed on important proteins of the immune system will be given and how these data can be used to optimize cancer treatment will be shown. Using DNA microarrays as powerful detectors of the "genes at work", and thereby determining the mechanisms that control our bodies and our health - From Gene Chips to Regulatory Networks : The completion of the draft sequence of the human genome has raised public awareness of “genomics” and of the ways in which the emerging technologies of the genomics “revolution” will have direct applications to research as well as patient care.This information will be instrumental to decipher the role and function of the various elements present on our chromosomes. Microarrays, and in particular Affymetrix GeneChips®, have emerged as one very powerful technology to investigate our genome. These small glass arrays contain millions of short oligonucleotide (DNA strands) synthesized by photolithography. These tools enable to query for example the level of gene expression or the interactions of regulatory proteins with the DNA in a highly parallel manner. Cross comparisons and integration of the data using appropriate bioinformatics approaches lead to the elucidation of biological regulatory networks

    ChimericSeq: An open-source, user-friendly interface for analyzing NGS data to identify and characterize viral-host chimeric sequences

    No full text
    <div><p>Identification of viral integration sites has been important in understanding the pathogenesis and progression of diseases associated with particular viral infections. The advent of next-generation sequencing (NGS) has enabled researchers to understand the impact that viral integration has on the host, such as tumorigenesis. Current computational methods to analyze NGS data of virus-host junction sites have been limited in terms of their accessibility to a broad user base. In this study, we developed a software application (named ChimericSeq), that is the first program of its kind to offer a graphical user interface, compatibility with both Windows and Mac operating systems, and optimized for effectively identifying and annotating virus-host chimeric reads within NGS data. In addition, ChimericSeq’s pipeline implements custom filtering to remove artifacts and detect reads with quantitative analytical reporting to provide functional significance to discovered integration sites. The improved accessibility of ChimericSeq through a GUI interface in both Windows and Mac has potential to expand NGS analytical support to a broader spectrum of the scientific community.</p></div

    Description of ChimericSeq’s interactive, graphical user interface (GUI).

    No full text
    <p>(A) Sequence data of host, virus, and sample NGS reads in fastq format is loaded into the program. (B) Reads containing integration sites are displayed in a column format. Analytical data associated with the selected read is displayed within the table. (C) The selected read is visualized to highlight different segments and overlap. (D) Interactive display that communicates questions to the user and also provides logistical information about the run.</p

    Schematic overview of the ChimericSeq workflow.

    No full text
    <p>Input NGS reads are manually loaded through a graphical interface, followed by user-determined 5’ and 3’ end trimming. Host and viral genomes and indices must be identified, if not otherwise already loaded. Next, the identification phase aligns each read to the specified viral genome, extracts these aligned reads, and then aligns the reads to the host genome. The identification phase is further broken down to describe potential scenarios, where 1) the read has no alignment to the viral genome, and is thus discarded, as indicated by the “X”, 2) the read has alignment to the viral genome, however the unmapped region’s length is lower than the threshold set by the program (or user), and is thus discarded, and 3) the read has alignment to the viral genome and has sufficient unmapped overhang for alignment to the host genome, and is extracted (as indicated by the checkmark). The extracted reads are then subjected to Bowtie2 alignment to the host genome, following similar scenarios as depicted. The identified chimeric reads are then passed to the post processing phase, which includes steps to filter out artifacts and annotate integration sites with functional information such as gene breakpoint location. Finally, reads are presented through the program interface and saved to accessible output files.</p

    Pleiotropic effects of the 8.1 HLA haplotype in patients with autoimmune myasthenia gravis and thymus hyperplasia

    No full text
    The 8.1 haplotype of the HLA complex has been reproducibly associated with several autoimmune diseases and traits, notably with thymus hyperplasia in patients with acquired generalized myasthenia gravis, an autoantibody-mediated disease directed at the muscle acetylcholine receptor. However, the strong linkage disequilibrium across this haplotype has prevented the identification of the causative locus, termed MYAS1. Here, we localized MYAS1 to a 1.2-Mb genome segment by reconstructing haplotypes and assessing their transmission in 73 simplex families. This segment encompasses the class III and proximal class I regions, between the BAT3 and C3-2-11 markers, therefore unambiguously excluding the class II loci. In addition, a case-control study revealed a very strong association with a core haplotype in this same region following an additive model (P = 7 Ă— 10(-11), odds ratio 6.5 for one copy and 42 for two copies of the core haplotype). Finally, we showed that this region is associated with a marked increase in serum titers of anti-acetylcholine receptor autoantibodies (P = 8 Ă— 10(-6)). Remarkably, this effect was suppressed by a second locus in cis on the 8.1 haplotype and located toward the class II region. Altogether, these data demonstrate the highly significant but complex effects of the 8.1 haplotype on the phenotype of myasthenia gravis patients and might shed light on its role in other autoimmune diseases
    corecore