1,654 research outputs found

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    Prediction of gene expression in embryonic structures of Drosophila melanogaster.

    Get PDF
    Understanding how sets of genes are coordinately regulated in space and time to generate the diversity of cell types that characterise complex metazoans is a major challenge in modern biology. The use of high-throughput approaches, such as large-scale in situ hybridisation and genome-wide expression profiling via DNA microarrays, is beginning to provide insights into the complexities of development. However, in many organisms the collection and annotation of comprehensive in situ localisation data is a difficult and time-consuming task. Here, we present a widely applicable computational approach, integrating developmental time-course microarray data with annotated in situ hybridisation studies, that facilitates the de novo prediction of tissue-specific expression for genes that have no in vivo gene expression localisation data available. Using a classification approach, trained with data from microarray and in situ hybridisation studies of gene expression during Drosophila embryonic development, we made a set of predictions on the tissue-specific expression of Drosophila genes that have not been systematically characterised by in situ hybridisation experiments. The reliability of our predictions is confirmed by literature-derived annotations in FlyBase, by overrepresentation of Gene Ontology biological process annotations, and, in a selected set, by detailed gene-specific studies from the literature. Our novel organism-independent method will be of considerable utility in enriching the annotation of gene function and expression in complex multicellular organisms

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

    Get PDF
    Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

    Assembling models of embryo development: Image analysis and the construction of digital atlases

    Get PDF
    Digital atlases of animal development provide a quantitative description of morphogenesis, opening the path toward processes modeling. Prototypic atlases offer a data integration framework where to gather information from cohorts of individuals with phenotypic variability. Relevant information for further theoretical reconstruction includes measurements in time and space for cell behaviors and gene expression. The latter as well as data integration in a prototypic model, rely on image processing strategies. Developing the tools to integrate and analyze biological multidimensional data are highly relevant for assessing chemical toxicity or performing drugs preclinical testing. This article surveys some of the most prominent efforts to assemble these prototypes, categorizes them according to salient criteria and discusses the key questions in the field and the future challenges toward the reconstruction of multiscale dynamics in model organisms

    Developing a workflow for the multi-omics analysis of Daphnia

    Get PDF
    In the era of multi-omics, making reasonable statistical inferences through data integration is challenged by data heterogeneity, dimensionality constraints, and data harmonization. The biological system is presumed to function as a network where the physical relationships between genes (nodes) are represented by links (edges) connecting genes that interact. This thesis aims to develop a new and efficient workflow to analyse non-model organism multi-omics data for researchers who are entangled in the biology questions by using readily available software tools. The proposed approach was applied to the transcriptome and metabolome data of Daphnia magna under various dose rates of gamma radiation. The first part of this workflow compares and contrasts the transcriptional regulation of short-and long-term gamma radiation exposure. A group of genes which share a similar expression across different samples under the same conditions are known as modules, because they are likely to be functionally relevant. Modules were identified using WGCNA but biologically meaningful modules (significant modules) were selected through a novel approach that associates genes with significantly altered expression levels as a result of radiation (i.e. differentially expressed genes) with these candidate modules. Dynamic transcriptional regulation was modelled using transcription factor (TF) DNA binding patterns to associate TFs with expression responses captured by the modules. The biological functions of significant modules and their TF regulators were verified with functional annotations and mapped into the proposed Adverse Outcome Pathways (AOP) of D. magna, which describes the key events which contribute to fecundity reduction. The findings demonstrate that short term radiation impacts are entirely different from long term and cannot be used for long term prediction. The second part investigates the coordination of gene expression and metabolites with differential abundances induced by different gamma dose rates and the underlying mechanisms contributing to the varying extent of the reduction in fecundity. Significant modules which belong to the same design model of dose rates were combined and annotated with new functionality. The abundance of metabolites was also modelled with the same design model. Integrated pathway enrichment analysis was performed to discover and create pathway diagrams for visualising the multi-omics output. Finally, the performance of this workflow on explaining the reduction of fecundity of D. magna, which has not been described in previous studies, has been evaluated. Combining the information from the metabolome and transcriptome data, new insights suggest that the alteration to the cell cycle is the underlying mechanism contributing to the varying reduction of fecundity under the effect of different dose rates of radiation.M-G

    Preimplantation development regulatory pathway construction through a text-mining approach

    Get PDF
    BACKGROUND: The integration of sequencing and gene interaction data and subsequent generation of pathways and networks contained in databases such as KEGG Pathway is essential for the comprehension of complex biological processes. We noticed the absence of a chart or pathway describing the well-studied preimplantation development stages; furthermore, not all genes involved in the process have entries in KEGG Orthology, important information for knowledge application with relation to other organisms. RESULTS: In this work we sought to develop the regulatory pathway for the preimplantation development stage using text-mining tools such as Medline Ranker and PESCADOR to reveal biointeractions among the genes involved in this process. The genes present in the resulting pathway were also used as seeds for software developed by our group called SeedServer to create clusters of homologous genes. These homologues allowed the determination of the last common ancestor for each gene and revealed that the preimplantation development pathway consists of a conserved ancient core of genes with the addition of modern elements. CONCLUSIONS: The generation of regulatory pathways through text-mining tools allows the integration of data generated by several studies for a more complete visualization of complex biological processes. Using the genes in this pathway as "seeds" for the generation of clusters of homologues, the pathway can be visualized for other organisms. The clustering of homologous genes together with determination of the ancestry leads to a better understanding of the evolution of such process

    A genomic comparison of two termites with different social complexity

    Get PDF
    The termites evolved eusociality and complex societies before the ants, but have been studied much less. The recent publication of the first two termite genomes provides a unique comparative opportunity, particularly because the sequenced termites represent opposite ends of the social complexity spectrum. Zootermopsis nevadensis has simple colonies with totipotent workers that can develop into all castes (dispersing reproductives, nest-inheriting replacement reproductives, and soldiers). In contrast, the fungus-growing termite Macrotermes natalensis belongs to the higher termites and has very large and complex societies with morphologically distinct castes that are life-time sterile. Here we compare key characteristics of genomic architecture, focusing on genes involved in communication, immune defenses, mating biology and symbiosis that were likely important in termite social evolution. We discuss these in relation to what is known about these genes in the ants and outline hypotheses for further testing

    A genomic comparison of two termites with different social complexity

    Get PDF
    abstract: The termites evolved eusociality and complex societies before the ants, but have been studied much less. The recent publication of the first two termite genomes provides a unique comparative opportunity, particularly because the sequenced termites represent opposite ends of the social complexity spectrum. Zootermopsis nevadensis has simple colonies with totipotent workers that can develop into all castes (dispersing reproductives, nest-inheriting replacement reproductives, and soldiers). In contrast, the fungus-growing termite Macrotermes natalensis belongs to the higher termites and has very large and complex societies with morphologically distinct castes that are life-time sterile. Here we compare key characteristics of genomic architecture, focusing on genes involved in communication, immune defenses, mating biology and symbiosis that were likely important in termite social evolution. We discuss these in relation to what is known about these genes in the ants and outline hypothesis for further testing.View the article as published at http://journal.frontiersin.org/article/10.3389/fgene.2015.00009/ful

    Deploying Big Data To Crack The Genotype To Phenotype Code

    Get PDF
    Mechanistically connecting genotypes to phenotypes is a longstanding and central mission of biology. Deciphering these connections will unite questions and datasets across all scales from molecules to ecosystems. Although high-throughput sequencing has provided a rich platform on which to launch this effort, tools for deciphering mechanisms further along the genome to phenome pipeline remain limited. Machine learning approaches and other emerging computational tools hold the promise of augmenting human efforts to overcome these obstacles. This vision paper is the result of a Reintegrating Biology Workshop, bringing together the perspectives of integrative and comparative biologists to survey challenges and opportunities in cracking the genotype to phenotype code and thereby generating predictive frameworks across biological scales. Key recommendations include: promoting the development of minimum “best practices” for the experimental design and collection of data; fostering sustained and long-term data repositories; promoting programs that recruit, train, and retain a diversity of talent and providing funding to effectively support these highly cross-disciplinary efforts. We follow this discussion by highlighting a few specific transformative research opportunities that will be advanced by these efforts
    • 

    corecore