217 research outputs found

    Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

    Get PDF
    Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

    J Comput Biol

    Get PDF
    Gene expression measurements allow determining sets of up- or down-regulated, or unchanged genes in a particular experimental condition. Additional biological knowledge can suggest examples of genes from one of these sets. For instance, known target genes of a transcriptional activator are expected, but are not certain to go down after this activator is knocked out. Available differential expression analysis tools do not take such imprecise examples into account. Here we put forward a novel partially supervised mixture modeling methodology for differential expression analysis. Our approach, guided by imprecise examples, clusters expression data into differentially expressed and unchanged genes. The partially supervised methodology is implemented by two methods: a newly introduced belief-based mixture modeling, and soft-label mixture modeling, a method proved efficient in other applications. We investigate on synthetic data the input example settings favorable for each method. In our tests, both belief-based and soft-label methods prove their advantage over semi-supervised mixture modeling in correcting for erroneous examples. We also compare them to alternative differential expression analysis approaches, showing that incorporation of knowledge yields better performance. We present a broad range of knowledge sources and data to which our partially supervised methodology can be applied. First, we determine targets of Ste12 based on yeast knockout data, guided by a Ste12 DNA-binding experiment. Second, we distinguish miR-1 from miR-124 targets in human by clustering expression data under transfection experiments of both microRNAs, using their computationally predicted targets as examples. Finally, we utilize literature knowledge to improve clustering of time-course expression profiles

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    A bag-of-words approach for Drosophila gene expression pattern annotation

    Get PDF
    abstract: Background Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-11

    Development of web-based image annotation tool and application of machine learning methods

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 91-92).Large-scale in situ hybridization screens are providing an abundance of spatio-temporal patterns of gene expression data that is valuable for understanding the mechanisms of gene regulation. Drosophila gene expression pattern images have been generated by the Berkeley Drosophila Genome Project (BDGP) for over 7,000 genes in over 90,000 digital images. These images are currently hand curated by field experts with developmental and anatomical terms based on the stained regions. These annotations enable the integration of spatial expression patterns with other genomic data sets that link regulators with their downstream targets. However, the manual curation has become a bottleneck in the process of analyzing the rapidly generated data therefore it is necessary to explore computational methods for the curation of gene expression pattern images. This thesis addresses improving the manual annotation process with a web-based image annotation tool and also enabling automation of the process using machine learning methods. First, a tool called LabelLife was developed to provide a systematic and flexible way of annotating images, groups of images, and shapes within images using terms from a controlled vocabulary. Second, machine learning methods for automatically predicting vocabulary terms for a given image based on image feature data were explored and implemented. The results of the applied machine learning methods are promising in terms of predictive ability, which has the potential to simplify and expedite the curation process hence increasing the rate that biologically significant data can be evaluated and new insights can be gained.by Anna Maria E. Ayuso.M.Eng

    Modeling signal transduction pathways and their transcriptional response

    No full text
    This thesis is concerned with revealing regulation of gene expression. The basic motivation behind our work is that gene regulation can be better resolved when analyzed in a cellular context of the upstream signaling pathway and known regulatory targets. Our source of data are perturbation experiments, which are performed on pathway components and induce changes in gene expression. In such a way, they connect the signaling pathway to its downstream target genes. This chapter starts with an introduction to the cellular con- text considered in the thesis (section 1.1) and the principles of perturbation experiments (section 1.2). We end with a concise summary of three approaches that comprise this thesis. The approaches tackle various problems in the process of revealing context-speci c regulatory networks (section 1.3). We deal with di erential expression analysis of the per- turbation data, enhanced with known transcription factor targets serving as examples of di erential genes (chapter 2), pathway model-based planning of informative perturbation experiments (chapter 3), and nally, with deregulation analysis, i.e., comparing changes in gene regulation between two di erent cell populations (chapter 4)
    • …
    corecore