972 research outputs found

    SPEX2: automated concise extraction of spatial gene expression patterns from Fly embryo ISH images

    Get PDF
    Motivation: Microarray profiling of mRNA abundance is often ill suited for temporal–spatial analysis of gene expressions in multicellular organisms such as Drosophila. Recent progress in image-based genome-scale profiling of whole-body mRNA patterns via in situ hybridization (ISH) calls for development of accurate and automatic image analysis systems to facilitate efficient mining of complex temporal–spatial mRNA patterns, which will be essential for functional genomics and network inference in higher organisms

    Joint stage recognition and anatomical annotation of drosophila gene expression patterns

    Get PDF
    Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms

    A Computational Framework for Learning from Complex Data: Formulations, Algorithms, and Applications

    Get PDF
    Many real-world processes are dynamically changing over time. As a consequence, the observed complex data generated by these processes also evolve smoothly. For example, in computational biology, the expression data matrices are evolving, since gene expression controls are deployed sequentially during development in many biological processes. Investigations into the spatial and temporal gene expression dynamics are essential for understanding the regulatory biology governing development. In this dissertation, I mainly focus on two types of complex data: genome-wide spatial gene expression patterns in the model organism fruit fly and Allen Brain Atlas mouse brain data. I provide a framework to explore spatiotemporal regulation of gene expression during development. I develop evolutionary co-clustering formulation to identify co-expressed domains and the associated genes simultaneously over different temporal stages using a mesh-generation pipeline. I also propose to employ the deep convolutional neural networks as a multi-layer feature extractor to generate generic representations for gene expression pattern in situ hybridization (ISH) images. Furthermore, I employ the multi-task learning method to fine-tune the pre-trained models with labeled ISH images. My proposed computational methods are evaluated using synthetic data sets and real biological data sets including the gene expression data from the fruit fly BDGP data sets and Allen Developing Mouse Brain Atlas in comparison with baseline existing methods. Experimental results indicate that the proposed representations, formulations, and methods are efficient and effective in annotating and analyzing the large-scale biological data sets

    Systematic image-driven analysis of the spatial Drosophila embryonic expression landscape

    Get PDF
    We created innovative virtual representation for our large scale Drosophila insitu expression dataset. We aligned an elliptically shaped mesh comprised of small triangular regions to the outline of each embryo. Each triangle defines a unique location in the embryo and comparing corresponding triangles allows easy identification of similar expression patterns.The virtual representation was used to organize the expression landscape at stage 4-6. We identified regions with similar expression in the embryo and clustered genes with similar expression patterns.We created algorithms to mine the dataset for adjacent non-overlapping patterns and anti-correlated patterns. We were able to mine the dataset to identify co-expressed and putative interacting genes.Using co-expression we were able to assign putative functions to unknown genes

    Characterization of the Glia in the adult Drosophila central nervous system

    Get PDF

    An objective comparison of cell-tracking algorithms

    Get PDF
    We present a combined report on the results of three editions of the Cell Tracking Challenge, an ongoing initiative aimed at promoting the development and objective evaluation of cell segmentation and tracking algorithms. With 21 participating algorithms and a data repository consisting of 13 data sets from various microscopy modalities, the challenge displays today's state-of-the-art methodology in the field. We analyzed the challenge results using performance measures for segmentation and tracking that rank all participating methods. We also analyzed the performance of all of the algorithms in terms of biological measures and practical usability. Although some methods scored high in all technical aspects, none obtained fully correct solutions. We found that methods that either take prior information into account using learning strategies or analyze cells in a global spatiotemporal video context performed better than other methods under the segmentation and tracking scenarios included in the challenge

    Transcriptome Analysis during Human Trophectoderm Specification Suggests New Roles of Metabolic and Epigenetic Genes

    Get PDF
    In humans, successful pregnancy depends on a cascade of dynamic events during early embryonic development. Unfortunately, molecular data on these critical events is scarce. To improve our understanding of the molecular mechanisms that govern the specification/development of the trophoblast cell lineage, the transcriptome of human trophectoderm (TE) cells from day 5 blastocysts was compared to that of single day 3 embryos from our in vitro fertilization program by using Human Genome U133 Plus 2.0 microarrays. Some of the microarray data were validated by quantitative RT-PCR. The TE molecular signature included 2,196 transcripts, among which were genes already known to be TE-specific (GATA2, GATA3 and GCM1) but also genes involved in trophoblast invasion (MUC15), chromatin remodeling (specifically the DNA methyltransferase DNMT3L) and steroid metabolism (HSD3B1, HSD17B1 and FDX1). In day 3 human embryos 1,714 transcripts were specifically up-regulated. Besides stemness genes such as NANOG and DPPA2, this signature included genes belonging to the NLR family (NALP4, 5, 9, 11 and 13), Ret finger protein-like family (RFPL1, 2 and 3), Melanoma Antigen family (MAGEA1, 2, 3, 5, 6 and 12) and previously unreported transcripts, such as MBD3L2 and ZSCAN4. This study provides a comprehensive outlook of the genes that are expressed during the initial embryo-trophectoderm transition in humans. Further understanding of the biological functions of the key genes involved in steroidogenesis and epigenetic regulation of transcription that are up-regulated in TE cells may clarify their contribution to TE specification and might also provide new biomarkers for the selection of viable and competent blastocysts

    Biochemical and mass spectrometric analysis of interactions in Drosophila mRNA localization

    Get PDF
    mRNA localization is a common mechanism of gene regulation, involved in a broad range of biological processes including embryonic patterning, asymmetric cell division and cell migration. In Drosophila oocytes, asymmetric deposition of maternal oskar (osk), nanos, gurken and bicoid mRNAs defines the future embryonic axes. This differential targeting of mRNAs in a defined spatio-temporal manner requires several trans-acting factors which assemble with the mRNA into messenger ribonucleoprotein particles (mRNPs). Many trans-acting factors are RNA-binding proteins (RBPs) that recognize specific cis-acting elements in the RNA, and often function both in the localization and translational regulation of the transcript. Although individual RBPs have been identified and extensively studied for their role in mRNA localization, less is known about their interaction network. Often the same RBPs bind to differentially localized transcripts and it is unclear how transcript specificity and differential targeting is achieved. A possibility is that while these RBPs form the core of the mRNP, a higher level of transcript-specific regulation comes from the regulatory partners that interact directly with them. To gain further insights into the functional components of an mRNP and possibly understand the regulation of RBPs, I performed co-purification studies of both a localizing mRNP and the RBPs associated with localizing transcripts in Drosophila. In the first part of the project, using a tandem affinity purification approach, I established a protocol to biochemically purify osk mRNP using the MS2 system. In the second part of the project, I immunoprecipitated six tagged RBPs that are known to regulate localization of one or more maternal mRNAs at different developmental stages. By employing mass spectrometry and subsequent statistical analysis, I identified proteins significantly enriched with each tagged RBP and constructed an interactome. By using co-immunoprecipitation assay in cultured HEK cells, I was able to validate several interactions identified in the mass spectrometric data, including 26 novel interactions of potential RBP regulators. This work presents the foundation for in vivo functional and co-localization studies, as well as in vitro structural characterization of the identified interactants, to fully understand the relevance of these interactions in the regulation of mRNA localization

    Image-level and group-level models for Drosophila gene expression pattern annotation

    Get PDF
    abstract: Background Drosophila melanogaster has been established as a model organism for investigating the developmental gene interactions. The spatio-temporal gene expression patterns of Drosophila melanogaster can be visualized by in situ hybridization and documented as digital images. Automated and efficient tools for analyzing these expression images will provide biological insights into the gene functions, interactions, and networks. To facilitate pattern recognition and comparison, many web-based resources have been created to conduct comparative analysis based on the body part keywords and the associated images. With the fast accumulation of images from high-throughput techniques, manual inspection of images will impose a serious impediment on the pace of biological discovery. It is thus imperative to design an automated system for efficient image annotation and comparison. Results We present a computational framework to perform anatomical keywords annotation for Drosophila gene expression images. The spatial sparse coding approach is used to represent local patches of images in comparison with the well-known bag-of-words (BoW) method. Three pooling functions including max pooling, average pooling and Sqrt (square root of mean squared statistics) pooling are employed to transform the sparse codes to image features. Based on the constructed features, we develop both an image-level scheme and a group-level scheme to tackle the key challenges in annotating Drosophila gene expression pattern images automatically. To deal with the imbalanced data distribution inherent in image annotation tasks, the undersampling method is applied together with majority vote. Results on Drosophila embryonic expression pattern images verify the efficacy of our approach. Conclusion In our experiment, the three pooling functions perform comparably well in feature dimension reduction. The undersampling with majority vote is shown to be effective in tackling the problem of imbalanced data. Moreover, combining sparse coding and image-level scheme leads to consistent performance improvement in keywords annotation.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-35

    A piRNA regulation landscape in C. elegans and a computational model to predict gene functions

    Get PDF
    Investigating mechanisms that regulate genes and the genes' functions are essential to understand a biological system. This dissertation is consists of two specific research projects under these aims, which are for understanding piRNA's regulation mechanism and predicting genes' function computationally. The first project shows a piRNA regulation landscape in C. elegans. piRNAs (Piwi-interacting small RNAs) form a complex with Piwi Argonautes to maintain fertility and silence transposons in animal germlines. In C. elegans, previous studies have suggested that piRNAs tolerate mismatched pairing and in principle could target all transcripts. In this project, by computationally analyzing the chimeric reads directly captured by cross-linking piRNA and their targets in vivo, piRNAs are found to target all germline mRNAs with microRNA-like pairing rules. The number of targeting chimeric reads correlates better with binding energy than with piRNA abundance, suggesting that piRNA concentration does not limit targeting. Further more, in mRNAs silenced by piRNAs, secondary small RNAs are found to be accumulating at the center and ends of piRNA binding sites. Whereas in germline-expressed mRNAs, reduced piRNA binding density and suppression of piRNA-associated secondary small RNAs targeting correlate with the CSR-1 Argonaute presence. These findings reveal physiologically important and nuanced regulation of piRNA targets and provide evidence for a comprehensive post-transcriptional regulatory step in germline gene expression. The second project elaborates a computational model to predict gene function. Predicting genes involved in a biological function facilitates many kinds of research, such as prioritizing candidates in a screening project. Following the “Guilt By Association” principle, multiple datasets are considered as biological networks and integrated together under a multi-label learning framework for predicting gene functions. Specifically, the functional labels are propagated and smoothed using a label propagation method on the networks and then integrated using an “Error correction of code” multi-label learning framework, where a “codeword” defines all the labels annotated to a specific gene. The model is then trained by finding the optimal projections between the code matrix and the biological datasets using canonical correlation analysis. Its performance is benchmarked by comparing to a state-of-art algorithm and a large scale screen results for piRNA pathway genes in D.melanogaster. Finally, piRNA targeting's roles in epigenetics and physiology and its cross-talk with CSR-1 pathway are discussed, together with a survey of additional biological datasets and a discussion of benchmarking methods for the gene function prediction
    corecore