6,416 research outputs found

    Hunter-Gatherer-Annotator science : characterizing regulatory elements in the genome of dog and zebrafish with public and not yet public data

    Get PDF
    In order to study gene regulation, large amounts of sequencing data are necessary. We can either generate (hunt) them ourselves or use (gather) publicly available data sets. In order to guarantee the reliability and reusability of the hunted and gathered data, we need to also annotate them with the correct metadata. In this thesis, I will touch on all three of these aspects. I was part of two international consortia which applied these approaches to two different model organisms. The DANIO-CODE consortium was initiated to systematically annotate the zebrafish genome. Similarly, the Dog Genome Annotation (DoGA) project aims to improve the annotation of genomic elements in the dog genome. Both zebrafish and dogs are popular model organisms for studying biological processes and pathologies in humans. Despite their popularity, both organisms lack a large-scale annotation of regulatory elements. Before analyzing any data, we designed an annotation structure that captures all aspects of a sequencing experiment that are essential for the processing and analysis of the data. We implemented this structure in a web-platform, which allows easy upload, query, and download of the sequencing data and associated metadata. We present the structure and implementation in Study I, which also contains a comparison to similar and well-established annotation schemata. We use this annotation structure and the web platform for Study II to collect sequencing data from 1,803 samples from 38 different research groups looking from transcriptomic, epigenomic, and methylomic perspectives at different stages of zebrafish development. We identified more than 140,000 new cis-regulatory elements active during development and provide them together with the sequencing data and genome browser tracks as a resource for the community. In Study III, we present a biobank for dog tissues established for the DoGA consortium. For both Study III and Study IV, we used 88 and 37 tissues from the biobank, respectively, to catalog promoter regions and their tissue activity using STRT and CAGE-seq. In Study III we also present the web-platform, based on the structure in Study I, where we make the data and the corresponding metadata available. In Study IV, we used the data from CAGE-seq to also identify active enhancer regions and their corresponding tissue activity. We identify regulatory networks between enhancers and promoters and show their conservation in human

    High Throughput Detection of Pseudouridine: Caveats, Conundrums, and a Case for Open Science

    Get PDF
    The isomerization of uridine to pseudouridine (Ψ), known as pseudouridylation, is the most abundant post-transcriptional modification of stable RNAs. Due to technical limitations in pseudouridine detection methods, studies on pseudouridylation have historically focused on ribosomal RNAs, transfer RNAs, and spliceosomal small nuclear RNAs, where Ψs play a critical role in RNA biogenesis and function. For decades, Ψ research was confined to this small subset of cellular RNAs ,owing to limitations in methods for Ψ detection. Interest in this modification was reinvigorated, however, with reports that Ψ is conditionally induced in different environmental contexts and that pseudouridylation of certain codons recoded amino acid incorporation. Pseudouridine has thus revealed itself as a dynamic modification capable of fine-tuning RNA function. In this thesis, I describe how I attempted to develop a high-throughput technique to identify novel sites of pseudouridylation throughout the whole transcriptome. By identifying what transcripts are subject to pseudouridylation, I hoped to better understand Ψ’s functional role. While pursuing this work, a series of deep sequencing methods — Pseudo-seq, Ψ-seq, PSI-seq, and CeU-seq — were published that mapped Ψ positions across the entire transcriptome with single nucleotide resolution. Collectively, these methods greatly expanded the catalogue of pseudouridylated transcripts and revealed conditionally-dependent sites of pseudouridylation in response to cellular stress. With four techniques available, I undertook a critical analysis of their results, uncovering a comparatively small subset of robustly detectible putative Ψ sites. This analysis underscored the merits and limitations of each approach. Having identified areas for improvement in the available Ψ-detection approaches, I adapted Ψ-seq to profile sites of pseudouridylation in the protozoan parasite Trypanosoma brucei. My efforts at transcriptome-wide Ψ-detection, however, were undercut by an inability to experimentally replicate Ψ-seq. As much as this thesis documents an endeavor to better understand the functional role of pseudouridylation, it also documents systematic and thorough experimental failure. In so doing, the work detailed in this thesis highlights a need within the sciences to foster increased transparency and reproducibility

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    MIMAS: an innovative tool for network-based high density oligonucleotide microarray data management and annotation

    Get PDF
    BACKGROUND: The high-density oligonucleotide microarray (GeneChip) is an important tool for molecular biological research aiming at large-scale detection of small nucleotide polymorphisms in DNA and genome-wide analysis of mRNA concentrations. Local array data management solutions are instrumental for efficient processing of the results and for subsequent uploading of data and annotations to a global certified data repository at the EBI (ArrayExpress) or the NCBI (GeneOmnibus). DESCRIPTION: To facilitate and accelerate annotation of high-throughput expression profiling experiments, the Microarray Information Management and Annotation System (MIMAS) was developed. The system is fully compliant with the Minimal Information About a Microarray Experiment (MIAME) convention. MIMAS provides life scientists with a highly flexible and focused GeneChip data storage and annotation platform essential for subsequent analysis and interpretation of experimental results with clustering and mining tools. The system software can be downloaded for academic use upon request. CONCLUSION: MIMAS implements a novel concept for nation-wide GeneChip data management whereby a network of facilities is centered on one data node directly connected to the European certified public microarray data repository located at the EBI. The solution proposed may serve as a prototype approach to array data management between research institutes organized in a consortium

    Bioinformatics process management: information flow via a computational journal

    Get PDF
    This paper presents the Bioinformatics Computational Journal (BCJ), a framework for conducting and managing computational experiments in bioinformatics and computational biology. These experiments often involve series of computations, data searches, filters, and annotations which can benefit from a structured environment. Systems to manage computational experiments exist, ranging from libraries with standard data models to elaborate schemes to chain together input and output between applications. Yet, although such frameworks are available, their use is not widespread–ad hoc scripts are often required to bind applications together. The BCJ explores another solution to this problem through a computer based environment suitable for on-site use, which builds on the traditional laboratory notebook paradigm. It provides an intuitive, extensible paradigm designed for expressive composition of applications. Extensive features facilitate sharing data, computational methods, and entire experiments. By focusing on the bioinformatics and computational biology domain, the scope of the computational framework was narrowed, permitting us to implement a capable set of features for this domain. This report discusses the features determined critical by our system and other projects, along with design issues. We illustrate the use of our implementation of the BCJ on two domain-specific examples

    The Evolution of Diversity

    Get PDF
    Since the beginning of time, the pre-biological and the biological world have seen a steady increase in complexity of form and function based on a process of combination and re-combination. The current modern synthesis of evolution known as the neo-Darwinian theory emphasises population genetics and does not explain satisfactorily all other occurrences of evolutionary novelty. The authors suggest that symbiosis and hybridisation and the more obscure processes such as polyploidy, chimerism and lateral transfer are mostly overlooked and not featured sufficiently within evolutionary theory. They suggest, therefore, a revision of the existing theory including its language, to accommodate the scientific findings of recent decades
    • …
    corecore