12 research outputs found

    The Yeast Resource Center Public Image Repository: A large database of fluorescence microscopy images

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There is increasing interest in the development of computational methods to analyze fluorescent microscopy images and enable automated large-scale analysis of the subcellular localization of proteins. Determining the subcellular localization is an integral part of identifying a protein's function, and the application of bioinformatics to this problem provides a valuable tool for the annotation of proteomes. Training and validating algorithms used in image analysis research typically rely on large sets of image data, and would benefit from a large, well-annotated and highly-available database of images and associated metadata.</p> <p>Description</p> <p>The Yeast Resource Center Public Image Repository (YRC PIR) is a large database of images depicting the subcellular localization and colocalization of proteins. Designed especially for computational biologists who need large numbers of images, the YRC PIR contains 532,182 TIFF images from nearly 85,000 separate experiments and their associated experimental data. All images and associated data are searchable, and the results browsable, through an intuitive web interface. Search results, experiments, individual images or the entire dataset may be downloaded as standards-compliant OME-TIFF data.</p> <p>Conclusions</p> <p>The YRC PIR is a powerful resource for researchers to find, view, and download many images and associated metadata depicting the subcellular localization and colocalization of proteins, or classes of proteins, in a standards-compliant format. The YRC PIR is freely available at <url>http://images.yeastrc.org/</url>.</p

    Quantifying the distribution of probes between subcellular locations using unsupervised pattern unmixing

    Get PDF
    Motivation: Proteins exhibit complex subcellular distributions, which may include localizing in more than one organelle and varying in location depending on the cell physiology. Estimating the amount of protein distributed in each subcellular location is essential for quantitative understanding and modeling of protein dynamics and how they affect cell behaviors. We have previously described automated methods using fluorescent microscope images to determine the fractions of protein fluorescence in various subcellular locations when the basic locations in which a protein can be present are known. As this set of basic locations may be unknown (especially for studies on a proteome-wide scale), we here describe unsupervised methods to identify the fundamental patterns from images of mixed patterns and estimate the fractional composition of them

    Statistical and visual differentiation of subcellular imaging

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Automated microscopy technologies have led to a rapid growth in imaging data on a scale comparable to that of the genomic revolution. High throughput screens are now being performed to determine the localisation of all of proteins in a proteome. Closer to the bench, large image sets of proteins in treated and untreated cells are being captured on a daily basis to determine function and interactions. Hence there is a need for new methodologies and protocols to test for difference in subcellular imaging both to remove bias and enable throughput. Here we introduce a novel method of statistical testing, and supporting software, to give a rigorous test for difference in imaging. We also outline the key questions and steps in establishing an analysis pipeline.</p> <p>Results</p> <p>The methodology is tested on a high throughput set of images of 10 subcellular localisations, and it is shown that the localisations may be distinguished to a statistically significant degree with as few as 12 images of each. Further, subtle changes in a protein's distribution between nocodazole treated and control experiments are shown to be detectable. The effect of outlier images is also examined and it is shown that while the significance of the test may be reduced by outliers this may be compensated for by utilising more images. Finally, the test is compared to previous work and shown to be more sensitive in detecting difference. The methodology has been implemented within the iCluster system for visualising and clustering bio-image sets.</p> <p>Conclusion</p> <p>The aim here is to establish a methodology and protocol for testing for difference in subcellular imaging, and to provide tools to do so. While iCluster is applicable to moderate (<1000) size image sets, the statistical test is simple to implement and will readily be adapted to high throughput pipelines to provide more sensitive discrimination of difference.</p

    Principles of Bioimage Informatics: Focus on Machine Learning of Cell Patterns

    Full text link
    Abstract. The field of bioimage informatics concerns the development and use of methods for computational analysis of biological images. Traditionally, analysis of such images has been done manually. Manual annotation is, however, slow, expensive, and often highly variable from one expert to another. Furthermore, with modern automated microscopes, hundreds to thousands of images can be collected per hour, making manual analysis infeasible. This field borrows from the pattern recognition and computer vision literature (which contain many techniques for image processing and recognition), but has its own unique challenges and tradeoffs. Fluorescence microscopy images represent perhaps the largest class of biological images for which automation is needed. For this modality, typical problems include cell segmentation, classification of phenotypical response, or decisions regarding differentiated responses (treatment vs. control setting). This overview focuses on the problem of subcellular location determination as a running example, but the techniques discussed are often applicable to other problems.

    An incremental approach to automated protein localisation

    Get PDF
    Tscherepanow M, Jensen N, Kummert F. An incremental approach to automated protein localisation. BMC Bioinformatics. 2008;9(1): 445.Background: The subcellular localisation of proteins in intact living cells is an important means for gaining information about protein functions. Even dynamic processes can be captured, which can barely be predicted based on amino acid sequences. Besides increasing our knowledge about intracellular processes, this information facilitates the development of innovative therapies and new diagnostic methods. In order to perform such a localisation, the proteins under analysis are usually fused with a fluorescent protein. So, they can be observed by means of a fluorescence microscope and analysed. In recent years, several automated methods have been proposed for performing such analyses. Here, two different types of approaches can be distinguished: techniques which enable the recognition of a fixed set of protein locations and methods that identify new ones. To our knowledge, a combination of both approaches – i.e. a technique, which enables supervised learning using a known set of protein locations and is able to identify and incorporate new protein locations afterwards – has not been presented yet. Furthermore, associated problems, e.g. the recognition of cells to be analysed, have usually been neglected. Results: We introduce a novel approach to automated protein localisation in living cells. In contrast to well-known techniques, the protein localisation technique presented in this article aims at combining the two types of approaches described above: After an automatic identification of unknown protein locations, a potential user is enabled to incorporate them into the pre-trained system. An incremental neural network allows the classification of a fixed set of protein location as well as the detection, clustering and incorporation of additional patterns that occur during an experiment. Here, the proposed technique achieves promising results with respect to both tasks. In addition, the protein localisation procedure has been adapted to an existing cell recognition approach. Therefore, it is especially well-suited for high-throughput investigations where user interactions have to be avoided. Conclusion: We have shown that several aspects required for developing an automatic protein localisation technique – namely the recognition of cells, the classification of protein distribution patterns into a set of learnt protein locations, and the detection and learning of new locations – can be combined successfully. So, the proposed method constitutes a crucial step to render image-based protein localisation techniques amenable to large-scale experiments

    Automatic Annotation of Spatial Expression Patterns via Sparse Bayesian Factor Models

    Get PDF
    Advances in reporters for gene expression have made it possible to document and quantify expression patterns in 2D–4D. In contrast to microarrays, which provide data for many genes but averaged and/or at low resolution, images reveal the high spatial dynamics of gene expression. Developing computational methods to compare, annotate, and model gene expression based on images is imperative, considering that available data are rapidly increasing. We have developed a sparse Bayesian factor analysis model in which the observed expression diversity of among a large set of high-dimensional images is modeled by a small number of hidden common factors. We apply this approach on embryonic expression patterns from a Drosophila RNA in situ image database, and show that the automatically inferred factors provide for a meaningful decomposition and represent common co-regulation or biological functions. The low-dimensional set of factor mixing weights is further used as features by a classifier to annotate expression patterns with functional categories. On human-curated annotations, our sparse approach reaches similar or better classification of expression patterns at different developmental stages, when compared to other automatic image annotation methods using thousands of hard-to-interpret features. Our study therefore outlines a general framework for large microscopy data sets, in which both the generative model itself, as well as its application for analysis tasks such as automated annotation, can provide insight into biological questions

    The biomedical discourse relation bank

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of discourse relations, such as causal and contrastive relations, between situations mentioned in text is an important task for biomedical text-mining. A biomedical text corpus annotated with discourse relations would be very useful for developing and evaluating methods for biomedical discourse processing. However, little effort has been made to develop such an annotated resource.</p> <p>Results</p> <p>We have developed the Biomedical Discourse Relation Bank (BioDRB), in which we have annotated explicit and implicit discourse relations in 24 open-access full-text biomedical articles from the GENIA corpus. Guidelines for the annotation were adapted from the Penn Discourse TreeBank (PDTB), which has discourse relations annotated over open-domain news articles. We introduced new conventions and modifications to the sense classification. We report reliable inter-annotator agreement of over 80% for all sub-tasks. Experiments for identifying the sense of explicit discourse connectives show the connective itself as a highly reliable indicator for coarse sense classification (accuracy 90.9% and F1 score 0.89). These results are comparable to results obtained with the same classifier on the PDTB data. With more refined sense classification, there is degradation in performance (accuracy 69.2% and F1 score 0.28), mainly due to sparsity in the data. The size of the corpus was found to be sufficient for identifying the sense of explicit connectives, with classifier performance stabilizing at about 1900 training instances. Finally, the classifier performs poorly when trained on PDTB and tested on BioDRB (accuracy 54.5% and F1 score 0.57).</p> <p>Conclusion</p> <p>Our work shows that discourse relations can be reliably annotated in biomedical text. Coarse sense disambiguation of explicit connectives can be done with high reliability by using just the connective as a feature, but more refined sense classification requires either richer features or more annotated data. The poor performance of a classifier trained in the open domain and tested in the biomedical domain suggests significant differences in the semantic usage of connectives across these domains, and provides robust evidence for a biomedical sublanguage for discourse and the need to develop a specialized biomedical discourse annotated corpus. The results of our cross-domain experiments are consistent with related work on identifying connectives in BioDRB.</p

    Contributions to Statistical Image Analysis for High Content Screening.

    Full text link
    Images of cells incubated with fluorescent small molecule probes can be used to infer where the compounds distribute within cells. Identifying the spatial pattern of compound localization within each cell is very important problem for which adequate statistical methods do not yet exist. First, we asked whether a classifier for subcellular localization categories can be developed based on a training set of manually classified cells. Due to challenges of the images such as uneven field illumination, low resolution, high noise, variation in intensity and contrast, and cell to cell variability in probe distributions, we constructed texture features for contrast quantiles conditioning on intensities, and classifying on artificial cells with same marginal distribution but different conditional distribution supported that this conditioning approach is beneficial to distinguish different localization distributions. Using these conditional features, we obtained satisfactory performance in image classification, and performed to dimension reduction and data visualization. As high content images are subject to several major forms of artifacts, we are interested in the implications of measurement errors and artifacts on our ability to draw scientifically meaningful conclusions from high content images. Specifically, we considered three forms of artifacts: saturation, blurring and additive noise. For each type of artifacts, we artificially introduced larger amount, and aimed to understand the bias by `Simulation Extrapolation' (SIMEX) method, applied to the measurement errors for pairwise centroid distances, the degree of eccentricity in the class-specific distributions, and the angles between the dominant axes of variability for different categories. Finally, we briefly considered the analysis of time-point images. Small molecule studies will be more focused. Specifically, we consider the evolving patterns of subcellular staining from the moment that a compound is introduced into the cell culture medium, to the point that steady state distribution is reached. We construct the degree to which the subcellular staining pattern is concentrated in or near the nucleus as the features of timecourse data set, and aim to determine whether different compounds accumulate in different regions at different times, as characterized in terms of their position in the cell relative to the nucleus.Ph.D.StatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91460/1/liufy_1.pd

    Quantitative single-cell analysis of S. cerevisiae using a microfluidic live-cell imaging platform

    Get PDF
    Genome-wide manipulations and measurements have made huge progress over the last decades. In Saccharomyces cerevisiae, a well-studied eukaryotic model organism, homologous recombination allows for systematic deletion or alteration of a majority of its genes. Important products of these manipulation techniques are two libraries of modified strains: A deletion library consisting of all viable knockout mutants, and a GFP library in which 4159 proteins are successfully tagged with GFP. In addition, the development of a method that allows for the systematic construction of double mutants led to a virtually infinite number of potential strains of interest. These advancements in combinatorial biology need to be matched by methods of data measurement and analysis. In order to simultaneously observe the spatio-temporal dynamics of thousands of strains from the GFP library, Dénervaud et al. developed a microfluidic platform that allows for parallel imaging of 1152 strains in a single experiment. On this platform, strains can be grown and monitored in a controllable environment for several days, which results in the imaging of several millions of cells during one experiment. To objectively and quantitatively analyze this immense amount of information, we implemented an image analysis pipeline, which can extract experiment-wide information on single-cell protein abundance and subcellular localization. The construction of a supervised classifier to quantify localization information on a single cell level is a new approach and was invaluable to detect dynamic localization changes within the proteome. Using five different stress conditions, we gained insight into temporal changes of abundance and localization of multiple proteins. For example, we found that while localization changes can often be fast and transient, long-term response of a cell is usually enabled by changes in abundance. This shows a well-orchestrated response of a cell to external stimuli. To extend knowledge about cellular mechanisms, we used our microfluidic platform for two separate screens, combining GFP-reporter with additional deletion mutants. The advantage of our platform in comparison to more common approaches lies in its simultaneous measurement of fluorescence and phenotypic information on cell size and growth. For each deletion, we can quantify not only its influence onto the respective GFP-reporter under changing conditions, but also its effect on cell growth and size. We showed that it is advantageous to combine this information, as it allows pointing out possible underlying mechanisms of gene network regulations. In a first screen we investigated the behavior of several gene networks upon UV irradiation damage. We were able to show that four gene deletions influenced the localization of ribonucleotide-diphosphate reductase (Rnr4p). A second screen was designed to find genes that influence the induction of the galactose network. This screen uses more than 500 deletions of genes mostly related to chromatin in combination with two different reporter strains. A main focus of this study was the inheritance of memory during galactose reinduction. We found several previously unknown genes that potentially influence either induction or reinduction and were picked as candidates for further inheritance studies. Our microfluidic platform allows for unprecedented studies of proteomes in flux. [...
    corecore