151 research outputs found

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    Multi-Label Dimensionality Reduction

    Get PDF
    abstract: Multi-label learning, which deals with data associated with multiple labels simultaneously, is ubiquitous in real-world applications. To overcome the curse of dimensionality in multi-label learning, in this thesis I study multi-label dimensionality reduction, which extracts a small number of features by removing the irrelevant, redundant, and noisy information while considering the correlation among different labels in multi-label learning. Specifically, I propose Hypergraph Spectral Learning (HSL) to perform dimensionality reduction for multi-label data by exploiting correlations among different labels using a hypergraph. The regularization effect on the classical dimensionality reduction algorithm known as Canonical Correlation Analysis (CCA) is elucidated in this thesis. The relationship between CCA and Orthonormalized Partial Least Squares (OPLS) is also investigated. To perform dimensionality reduction efficiently for large-scale problems, two efficient implementations are proposed for a class of dimensionality reduction algorithms, including canonical correlation analysis, orthonormalized partial least squares, linear discriminant analysis, and hypergraph spectral learning. The first approach is a direct least squares approach which allows the use of different regularization penalties, but is applicable under a certain assumption; the second one is a two-stage approach which can be applied in the regularization setting without any assumption. Furthermore, an online implementation for the same class of dimensionality reduction algorithms is proposed when the data comes sequentially. A Matlab toolbox for multi-label dimensionality reduction has been developed and released. The proposed algorithms have been applied successfully in the Drosophila gene expression pattern image annotation. The experimental results on some benchmark data sets in multi-label learning also demonstrate the effectiveness and efficiency of the proposed algorithms.Dissertation/ThesisPh.D. Computer Science 201

    A bag-of-words approach for Drosophila gene expression pattern annotation

    Get PDF
    abstract: Background Drosophila gene expression pattern images document the spatiotemporal dynamics of gene expression during embryogenesis. A comparative analysis of these images could provide a fundamentally important way for studying the regulatory networks governing development. To facilitate pattern comparison and searching, groups of images in the Berkeley Drosophila Genome Project (BDGP) high-throughput study were annotated with a variable number of anatomical terms manually using a controlled vocabulary. Considering that the number of available images is rapidly increasing, it is imperative to design computational methods to automate this task. Results We present a computational method to annotate gene expression pattern images automatically. The proposed method uses the bag-of-words scheme to utilize the existing information on pattern annotation and annotates images using a model that exploits correlations among terms. The proposed method can annotate images individually or in groups (e.g., according to the developmental stage). In addition, the proposed method can integrate information from different two-dimensional views of embryos. Results on embryonic patterns from BDGP data demonstrate that our method significantly outperforms other methods. Conclusion The proposed bag-of-words scheme is effective in representing a set of annotations assigned to a group of images, and the model employed to annotate images successfully captures the correlations among different controlled vocabulary terms. The integration of existing annotation information from multiple embryonic views improves annotation performance.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-10-11

    Learning Sparse Representations for Fruit-Fly Gene Expression Pattern Image Annotation and Retrieval

    Get PDF
    abstract: Background Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results.The electronic version of this article is the complete one and can be found online at: http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-10

    Learning Sparse Representations for Fruit Fly Gene Expression Pattern Image Annotation and Retreival

    Get PDF
    Background: Fruit fly embryogenesis is one of the best understood animal development systems, and the spatiotemporal gene expression dynamics in this process are captured by digital images. Analysis of these high-throughput images will provide novel insights into the functions, interactions, and networks of animal genes governing development. To facilitate comparative analysis, web-based interfaces have been developed to conduct image retrieval based on body part keywords and images. Currently, the keyword annotation of spatiotemporal gene expression patterns is conducted manually. However, this manual practice does not scale with the continuously expanding collection of images. In addition, existing image retrieval systems based on the expression patterns may be made more accurate using keywords. Results: In this article, we adapt advanced data mining and computer vision techniques to address the key challenges in annotating and retrieving fruit fly gene expression pattern images. To boost the performance of image annotation and retrieval, we propose representations integrating spatial information and sparse features, overcoming the limitations of prior schemes. Conclusions: We perform systematic experimental studies to evaluate the proposed schemes in comparison with current methods. Experimental results indicate that the integration of spatial information and sparse features lead to consistent performance improvement in image annotation, while for the task of retrieval, sparse features alone yields better results

    Appearance Based Stage Recognition of Drosophila Embryos

    Get PDF
    Stages in Drosophila development denote the time after fertilization at which certain specific events occur in the developmental cycle. Stage information of a host embryo, as well as spatial information of a gene expression region is indispensable input for the discovery of the pattern of gene-gene interaction. Manual labeling of stages is becoming a bottleneck under the circumstance of high throughput embryo images. Automatic recognition based on the appearances of embryos is becoming a more desirable scheme. This problem, however, is very challenging due to severe variations of illumination and gene expressions. In this research thesis, we propose an appearance based recognition method using orientation histograms and Gabor filter. Furthermore, we apply Principal Component Analysis to reduce the dimension of the low-level features, aiming to accelerate the speed of recognition. With the experiments on BDGP images, we show the promise of the proposed method

    Sparse reduced-rank regression for imaging genetics studies: models and applications

    Get PDF
    We present a novel statistical technique; the sparse reduced rank regression (sRRR) model which is a strategy for multivariate modelling of high-dimensional imaging responses and genetic predictors. By adopting penalisation techniques, the model is able to enforce sparsity in the regression coefficients, identifying subsets of genetic markers that best explain the variability observed in subsets of the phenotypes. To properly exploit the rich structure present in each of the imaging and genetics domains, we additionally propose the use of several structured penalties within the sRRR model. Using simulation procedures that accurately reflect realistic imaging genetics data, we present detailed evaluations of the sRRR method in comparison with the more traditional univariate linear modelling approach. In all settings considered, we show that sRRR possesses better power to detect the deleterious genetic variants. Moreover, using a simple genetic model, we demonstrate the potential benefits, in terms of statistical power, of carrying out voxel-wise searches as opposed to extracting averages over regions of interest in the brain. Since this entails the use of phenotypic vectors of enormous dimensionality, we suggest the use of a sparse classification model as a de-noising step, prior to the imaging genetics study. Finally, we present the application of a data re-sampling technique within the sRRR model for model selection. Using this approach we are able to rank the genetic markers in order of importance of association to the phenotypes, and similarly rank the phenotypes in order of importance to the genetic markers. In the very end, we illustrate the application perspective of the proposed statistical models in three real imaging genetics datasets and highlight some potential associations

    Interactions between the gut symbiont Frischella perrara and its host the honey bee (Apis mellifera)

    Get PDF
    Not all bacterial gut symbionts are necessarily beneficial to the host. Some of them may be neutral while others can even have detrimental effects. Determining the impact of individual gut symbionts can be challenging because the borders between being beneficial and detrimental are often fuzzy, and gut bacteria typically live in complex and highly variable multispecies communities. The honey bee possesses a relatively simple gut microbiota, providing a trackable model to study the effects of individual species. Among the few members of the honey bee gut microbiota, Frischella perrara is a gammaproteobacterium that colonizes a specific gut region where it causes the so-called “scab” phenotype, a dark colored band that appears on the luminal side of the epithelial surface. The scab has been hypothesized to result from melanization, a common insect immune response typically elicited after wounding or pathogen exposure. Despite inducing this putative immune response, there is currently no evidence that F. perrara is pathogenic for bees. In fact, F. perrara is highly prevalent among adult worker bees in healthy colonies across the world. This raises a number of interesting questions about the symbiosis between F. perrara and the host. Is the scab really a melanization response? Does F. perrara impact bee health? What genes from F. perrara are responsible for gut colonization and scab formation? Are there seasonal patterns of F. perrara prevalence along the year or interactions with other microbiota members or pathogens? The present thesis tackles these questions while investigating the symbiosis between F. perrara and the honey bee from three perspectives: the host side (chapter 1), the symbiont side (chapter 2) and in the context of the hive along seasons (chapter 3). In order to understand how F. perrara affects the gut homeostasis and immune status of the host, I used RNA-Seq to determine changes in host gene expression in the gut in response to experimental colonization with F. perrara. This showed that colonization with F. perrara led to the specific upregulation of many genes involved in the host immune response. In particular, multiple genes of the melanization cascade were upregulated by F. perrara, supporting the idea that the scab corresponds to a host melanization response. Despite this strong immune response, experimental colonization with F. perrara did not reduce the lifespan of bees relative to non-colonized bees or bees colonized with another symbiont not causing the scab. To identify F. perrara genes involved in colonization, persistence or scab formation, I investigated gene expression changes with RNA-Seq in F. perrara during host colonization relative to growth on agar plates, in collaboration with another PhD student. We found a number of interesting differentially expressed genes, with many genes upregulated in vivo involved in tryptophan biosynthesis, carbohydrate or ion transport, and some genes involved in tolerance to oxidative stress. Downregulated genes included genes coding for cell motility and sulfur metabolism. Finally, to identify specific conditions in the bee gut that impact colonization by F. perrara, we monitored the microbiota of individual bees from a hive through time. While we did not find significant correlations between F. perrara and other gut microbiota members or pathogens, we found that winter bees had a distinct microbiota structure than foragers that may be dictated at least in part by diet. In particular, F. perrara was the only species to be at significantly lower levels in winter bees relative to foragers. Overall, we can conclude from this PhD thesis that the scab phenotype is very likely the result of a melanization response upon F. perrara colonization. The absence of any detectable detrimental effect of F. perrara on the host is in line with its wide distribution across space and time. However, other pathogens are also highly prevalent in thriving honey bee colonies. Hence it is possible that the negative effect of F. perrara is small enough so that this gut symbiont is tolerated in the bee gut. The immune response mounted by the host may play an important role for the tolerance of the host. Rather than eliminating F. perrara, the specific immune response may keep the bacterium in check. However, further experiments need to be performed to test this hypothesis. On the contrary, we cannot exclude either that F. perrara has a beneficial role for the host. In particular, host immune activation by F. perrara may protect against subsequent pathogen assaults and the biosynthesis of the essential amino acid tryptophan or other chemical compounds by F. perrara may be used by the host. In summary, F. perrara is a clear example of a gut symbiont that cannot be easily classified according to the three classical categories encompassing mutualists, pathogens and commensals. This highlights the need to think about symbiosis as a continuum between pathogenicity and mutualism, and to find precise measures to quantify the costs and benefits for the involved partners
    corecore