17 research outputs found

    Detecting modules in multiplex networks – an application for integrating expression profiles across multiple species

    Get PDF
    Multiplex network, a set of networks linked through interconnected layers, is a useful mathematical framework for data integration. Here, we present a general method to detect modules in multiplex networks and apply it in a specific biological context: to simultaneously cluster the genome-wide expression profiles of C. elegans and D. melanogaster generated by the ENOCDE and modENCODE consortia. The method revealed modules that are fundamentally cross-species and can either be conserved or species-specific. In general, the method could be applied in various contexts like the integration of different social networks

    Enhanced Transcriptome Maps from Multiple Mouse Tissues Reveal Evolutionary Constraint in Gene Expression for Thousands of Genes

    Get PDF
    We characterized by RNA-seq the transcriptional profiles of a large and heterogeneous collection of mouse tissues, augmenting the mouse transcriptome with thousands of novel transcript candidates. Comparison with transcriptome profiles obtained in human cell lines reveals substantial conservation of transcriptional programs, and uncovers a distinct class of genes with levels of expression across cell types and species, that have been constrained early in vertebrate evolution. This core set of genes capture a substantial and constant fraction of the transcriptional output of mammalian cells, and participates in basic functional and structural housekeeping processes common to all cell types. Perturbation of these constrained genes is associated with significant phenotypes including embryonic lethality and cancer. Evolutionary constraint in gene expression levels is not reflected in the conservation of the genomic sequences, but is associated with strong and conserved epigenetic marking, as well as to a characteristic post-transcriptional regulatory program in which sub-cellular localization and alternative splicing play comparatively large roles

    GENCODE reference annotation for the human and mouse genomes

    Get PDF
    The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.National Human Genome Research Institute of the National Institutes of Healt

    Comparative analysis of the transcriptome across distant species

    Get PDF
    The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters

    Reconstruction of genetic network by Bayesian network model with integration of various prior knowledge

    No full text
    Bayesian network model is widely used for reverse engineering of gene regulatory network structure. An advantage of this model lies in its capability to integrate prior knowledge into the model learning process, which can lead to improvement in the quality of the analysis outcome. Sonic previous works have explored this area. Unfortunately, most of the existing works focus only on prior knowledge of the direct, variable links. Here we propose a set of methods designed to integrate other types of prior knowledge in model learning, namely, the semantic variable relations and indirect variable relations. We show in this work how these knowledge can be formalized and integrated into the model definition, and how the resulting models are evaluated with simulated data and real biological data. It has been shown that the integration of prior knowledge results in a significant improvement in their model performances. To address the issue of prior knowledge generation, we also proposed an approach to automatically extract indirect variable links from knowledge databases such KEGG and GO.

    Analysis of variable retroduplications in human populations suggests coupling of retrotransposition to cell division.

    No full text
    In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either retrogenes coding for functioning proteins, or expressed processed pseudogenes, which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify novel retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it. Genome Res 2013 Dec; 23(12):2042-52
    corecore