19 research outputs found

    Cis-regulatory modules clustering from sequence similarity

    No full text
    I present a method that regroups cis-regulatory modules by shared sequences motifs. The goal of this approach is to search for clusters of modules that may share some function, using only sequence similarity. The proposed similarity measure is based on a variable-order Markov model likelihood scoring of sequences. I also introduce an extension of the variable-order Markov model which could better perform the required task. Results. I show that my method may recover subsets of sequences sharing a pattern in a set of generated sequences. I found that the proposed approach is successful in finding groups of modules that shared a type of transcription factor binding site

    Unsupervised Clustering of Subcellular Protein Expression Patterns in High-Throughput Microscopy Images Reveals Protein Complexes and Functional Relationships between Proteins

    Get PDF
    <div><p>Protein subcellular localization has been systematically characterized in budding yeast using fluorescently tagged proteins. Based on the fluorescence microscopy images, subcellular localization of many proteins can be classified automatically using supervised machine learning approaches that have been trained to recognize predefined image classes based on statistical features. Here, we present an unsupervised analysis of protein expression patterns in a set of high-resolution, high-throughput microscope images. Our analysis is based on 7 biologically interpretable features which are evaluated on automatically identified cells, and whose cell-stage dependency is captured by a continuous model for cell growth. We show that it is possible to identify most previously identified localization patterns in a cluster analysis based on these features and that similarities between the inferred expression patterns contain more information about protein function than can be explained by a previous manual categorization of subcellular localization. Furthermore, the inferred cell-stage associated to each fluorescence measurement allows us to visualize large groups of proteins entering the bud at specific stages of bud growth. These correspond to proteins localized to organelles, revealing that the organelles must be entering the bud in a stereotypical order. We also identify and organize a smaller group of proteins that show subtle differences in the way they move around the bud during growth. Our results suggest that biologically interpretable features based on explicit models of cell morphology will yield unprecedented power for pattern discovery in high-resolution, high-throughput microscopy images.</p></div

    Time profile clustering result.

    No full text
    <p>A heatmap with 4004 GFP-tagged strains ordered using maximum likelihood agglomerative clustering based on the time profiles of protein abundance and 5 morphological measures. Within manually selected clusters (colored bars), the fraction of proteins in the cluster that have the same subcellular localization or GO Annotation (the latter indicated with stars) is listed under Fraction. Log p-values were computed using the hypergeometric distribution to test against the null hypothesis that the cluster was drawn randomly from the protein annotations. Fold enrichment indicates the ratio of the Fraction of proteins in the cluster with each annotation compared to that in the protein collection. Nuclear proteins appear in the bud at a specific time (dashed line).</p

    Morphological distances.

    No full text
    <p>a) Heatmap of the mean morphological distance features for each of the 3 cell classes automatically labelled: ‘bud’, ‘mother’ and ‘lone’ (columns indicated by ‘B’, ‘M’ and ‘L’ respectively). The proteins at the two extremes are enriched in cell periphery and nucleolus proteins. b) Three examples of the morphological distances extracted from the heatmap. Although the heatmap only shows the mean, we also compute the standard deviation (error bars). c) Examples of cells from the strains indicated in b). The spread of GFP fluorescence is greater than the RFP for the first three proteins, and less than RFP for the last three.</p

    Pipeline of the methods used in this work.

    No full text
    <p>The identification of cells, assignment of cell type, cell stage and the estimation of cell confidence is based solely on the intensities of the RFP marker present in all strains. Please refer to the Results and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003085#s4" target="_blank">Methods</a> for descriptions of steps (i)–(vi). The cell type, stage and confidence are then used in conjunction with the GFP signal from tagged proteins in each strain in order to compute biologically interpretable features of protein expression.</p

    Yeast cell identification.

    No full text
    <p>a) Shows the mother-bud assignment heuristic. Pairs of circular objects that reciprocally have largest and smallest sizes among neighboring areas are said to be ‘mother’ cells (indicated by M) and ‘bud’ cells (indicated by B, mother-bud pairs indicated by bidirectional arrows), unless the potential ‘bud’ cell has a smaller neighbor than itself (indicated by a unidirectional arrow). Any other cells are labelled as ‘lone’ cells (L). b) Example of low and high confidence objects. The cyan lines in each image represent the cell contours produced, and the white dots indicate the predicted bud neck position. The dashed objects represent obvious artifacts that were filtered using thresholds (See text for details). Objects on the edge of images were not automatically filtered out, but are expected to have low confidence.</p

    Subcellular location class profiles.

    No full text
    <p>a) Time series for protein abundance in buds. Nuclear proteins are the last to appear in the bud (dashed line). b) The spatial distribution of protein expression is highly variable in the growing bud cell. Organelles appear to be pushed from the bud neck at the time of the nucleus inclusion (dashed line). Note that the absence of nuclear protein in the bud leads to irrelevant variations in the morphological distance features, perhaps due to auto-fluorescence captured in the GFP channel. Actin proteins migrate from bud tip to bud neck (black traces). c) In the mother cell, organelles appear to maintain a typical distance to the bud neck, expect for the nucleus.</p

    Intensity and time profiles.

    No full text
    <p>a) GFP intensity heatmap for several protein whose abundances are known to be cell-cycle dependent. b) Profiles for 3 proteins showing significantly higher expression level in large buds. ‘n’ is the number of mother-bud pairs used to infer each time series. 26 out of the 60 time points (indicated with markers) show coherent cell-stage specific deviation (permutation test, See Suppl. <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003085#pcbi.1003085.s004" target="_blank">Figure S4</a>). c) Examples of mother-bud pairs with the computed pixel size (pt) of the bud object (identical RFP/GFP intensity scale). The displayed cells were manually selected and then ordered by the computed bud size. Arrows indicate nuclear localization at lower intensity.</p

    A cluster of 91 proteins displaying time profiles with variable distances to the bud neck.

    No full text
    <p>a) Heat map of the cluster displayed as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003085#pcbi-1003085-g006" target="_blank">Figure 6</a>. We observe several classes of dynamic patterns, which capture the localization to the bud neck and bud periphery. (*) 5 of the 8 subunits of the exocyst complex are found within 9 proteins. b) Examples of proteins with dynamic bud patterns. (**) The displayed GFP intensity was scaled down by 75%.</p

    Time profiles of morphological distances.

    No full text
    <p>a) Top panel shows a heatmap of the morphological distances in bud and mother cells indicated as B and M, respectively. Bottom panel shows the data for two of these proteins as line graphs. The reported morphological distances are variance normalized. MCM complex subunits and Whi5 display a cell-cycle dependent subcellular location; cytoplasmic for small buds, nuclear for large buds. ‘n’ is the number of mother-bud pairs used to infer each time series. Out of the 80 timepoints for each protein, 34 for Whi5 (blue traces), and 72 for Mcm6 (red traces) show significant cell cycle variation (, indicated as dark dots). b) Examples of mother-bud pairs that were ordered by the computed bud size (pt). The GFP channel was scaled between images to more clearly illustrate the change in subcellular location.</p
    corecore