9 research outputs found

    Quantitative Protein Localization Signatures Reveal an Association between Spatial and Functional Divergences of Proteins

    No full text
    <div><p>Protein subcellular localization is a major determinant of protein function. However, this important protein feature is often described in terms of discrete and qualitative categories of subcellular compartments, and therefore it has limited applications in quantitative protein function analyses. Here, we present Protein Localization Analysis and Search Tools (PLAST), an automated analysis framework for constructing and comparing quantitative signatures of protein subcellular localization patterns based on microscopy images. PLAST produces human-interpretable protein localization maps that quantitatively describe the similarities in the localization patterns of proteins and major subcellular compartments, without requiring manual assignment or supervised learning of these compartments. Using the budding yeast <i>Saccharomyces cerevisiae</i> as a model system, we show that PLAST is more accurate than existing, qualitative protein localization annotations in identifying known co-localized proteins. Furthermore, we demonstrate that PLAST can reveal protein localization-function relationships that are not obvious from these annotations. First, we identified proteins that have similar localization patterns and participate in closely-related biological processes, but do not necessarily form stable complexes with each other or localize at the same organelles. Second, we found an association between spatial and functional divergences of proteins during evolution. Surprisingly, as proteins with common ancestors evolve, they tend to develop more diverged subcellular localization patterns, but still occupy similar numbers of compartments. This suggests that divergence of protein localization might be more frequently due to the development of more specific localization patterns over ancestral compartments than the occupation of new compartments. PLAST enables systematic and quantitative analyses of protein localization-function relationships, and will be useful to elucidate protein functions and how these functions were acquired in cells from different organisms or species. A public web interface of PLAST is available at <a href="http://plast.bii.a-star.edu.sg" target="_blank">http://plast.bii.a-star.edu.sg</a>.</p></div

    Construction of quantitative protein subcellular localization profiles.

    No full text
    <p>(<b>A</b>) Schematic showing the major components of Protein Localization Analysis and Search Tools (PLAST). (<b>B</b>) Example images of GFP-tagged <i>Saccharomyces cerevisiae</i> strains from the UCSF dataset <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504-Huh1" target="_blank">[1]</a>. The intensity of each image has been scaled to the same range. (<b>C</b>) Multi-dimensional scaling plot based on the dissimilarity scores (<i>d<sub>p</sub></i>) among all the P-profiles<sub>SVM</sub> constructed for the UCSF dataset. ORFs manually assigned to “nucleus”, “cytoplasm”, or “mitochondrion” categories by UCSF are shown in purple, red, or green dots, respectively. (<b>D</b>) Multidimensional scaling plot of 20 representative protein localization patterns (dots) or “exemplars” identified using an affinity-propagation clustering algorithm. The radius of the circle around each dot is proportional to the number of ORFs assigned to the exemplar. Each exemplar is colored and named according to the most enriched UCSF category among its assigned ORFs (<b>Supplementary <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504.s007" target="_blank">Fig. S4A</a></b>). The exemplars of MC2 (Cox8), CP3 (Rbg1), and NC3 (Hda2) are shown in B. (<b>E</b>) Comparison of the performances of P-profiles and quantitative features extracted using two other previous analysis frameworks (“Chen07” and “Huh09”) <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504-Chen1" target="_blank">[11]</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504-Huh2" target="_blank">[13]</a> in classifying ORFs according to UCSF categories. The accuracies shown were estimated using a multi-class SVM classifier and 5-fold cross validation, and averaged over all UCSF categories.</p

    Large portion of WGD duplicates now have diverged localization patterns.

    No full text
    <p>Distributions of (<b>A</b>) P-profile dissimilarity scores (<i>d<sub>p</sub></i>), (<b>B</b>) ratios of shared compartments, and (<b>C</b>) numbers of compartments assigned to WGD duplicate (red) and random non-duplicate (black) pairs (M = medians, Ό = means of the distributions; two-sided permutation tests for differences in medians or means.) Protein pairs with <i>d<sub>p</sub></i>≄10th-percentile of non-duplicate pairs are referred to as “dissimilarly localized” (DL) pairs, or otherwise as “similarly localized” (SL) pairs. (<b>D</b>) Ratios of DL duplicate pairs in the ten biological processes with the highest numbers of duplicate pairs (parentheses = numbers of duplicate pairs with P-profiles, red dashed line = DL-duplicate ratio for all duplicate pairs.) (<b>E</b>) Example images from the UCSF dataset <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504-Huh1" target="_blank">[1]</a> showing DL duplicate pairs with different <i>d<sub>p</sub></i> values. The intensity of each image has been scaled to the same range. The molecular functions of the duplicates are also shown if they are known.</p

    Spatial divergence level of WGD duplicates is significantly correlated to shared compartment ratio but not total number of occupied compartments.

    No full text
    <p>Scatter plots showing (<b>A</b>) shared compartment ratio, (<b>B</b>) total number of compartments, and (<b>C</b>) mean protein expression level of WGD duplicates with different <i>d<sub>p</sub></i> values (R = Pearson's correlation coefficient, dashed lines = best linear regression fits of the data.) (<b>D</b>) Results from linear regression modeling of shared compartment ratio using <i>d<sub>p</sub></i> and protein expression level as factors (T = T-statistics of the factors; F = F-statistic for the analysis of variance (ANOVA) between two regression models with different factors; R<sup>2</sup> = squared correlation coefficients between the actual and predicted values of shared compartment ratio.)</p

    Most proteins are localized at multiple subcellular compartments.

    No full text
    <p>(<b>A</b>) The ten compartments with the highest numbers of assigned ORFs at a Bonferroni-adjusted threshold of <2.5×10<sup>−4</sup> (RNP = ribonucleoprotein). (<b>B</b>) Distributions of ORFs with different numbers of assigned subcellular compartments. The assignments were based on P-profile<sub>SVM</sub> with all 73 compartments, P-profile<sub>SVM</sub> with a reduced set of 22 compartments, UCSF, and SGD GoSlim cellular component annotations (M = medians, Ό = means of the distributions). (<b>C</b>) Comparisons of the mean numbers of compartments assigned to an ORF by different profiling/annotation methods. (Cytoplasmic/Non-cytoplasmic ORFs = ORFs assigned or not assigned with cytoplasm, respectively; error bars = standard errors; *** = P<0.001, two-sided permutation test for the difference in means.)</p

    Performance of PLAST in identifying known co-localized proteins.

    No full text
    <p>(<b>A</b>) Probability distributions of the P-profile dissimilarity scores (<i>d<sub>p</sub></i>) between interactors and between non-interactors detected by affinity-purification mass spectrometry (AP-MS) or yeast two-hybrid screening (Y2H). (M = medians, Ό = means of the distributions; P-values from two-sided permutation tests for differences in means or medians.) (<b>B</b>) An example of PLAST search result obtained from using 19S proteasomal base subunits as query proteins. The mean <i>d<sub>p</sub></i> between the query proteins and all other proteins are shown as red (known subunits) or gray (other ORFs) vertical lines. Most of the red lines have low <i>d<sub>p</sub></i> values, indicating that they are placed at the top of the search result. (Black line graph = precisions, red line graph = recalls, black dashed line = decision threshold at optimum F1-score, black box = magnified region, parenthesis = number of known subunits.) (<b>C</b>) Performances of subunit searches obtained from using different numbers of query proteins randomly selected from known subunits of a proteasome (left) or cytosolic ribosome (right). For each query protein number, we tested max(100, number of all possible combinations) random combinations of query proteins, and computed the mean value of these tested combinations (parentheses = numbers of known subunits). (<b>D</b>) Normalized F1-score differences between P-profile<sub>SVM</sub> and UCSF annotation for a catalog of 197 protein complexes. Some of the complexes with the highest F1-score differences are highlighted. (Red line = the mean of the normalized differences, which is also the test statistic used in the paired t-test between the F1-scores of these two methods; gray areas = statistically insignificant differences with P>0.001; red text = unadjusted P-value obtained for the paired t-test.) (<b>E</b>) Bonferroni-adjusted P-values obtained from one-sided, paired t-tests between the F1-scores of of all the possible pairs of profiling/annotation methods (F1<sub>A</sub> vs. F1<sub>B</sub>).</p

    Duplicates with different divergence times have significantly different spatial divergence levels but similar numbers of occupied subcellular compartments.

    No full text
    <p>(<b>A</b>) We used a phylogeny of orhtologous gene groups estimated for Ascomycota fungi <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504-Wapinski1" target="_blank">[37]</a> (inset = number of <i>S. cerevisiae</i> duplicates that we could traced to their originating ancestors without any loss event; blue = “old” duplicates; red = “young” duplicates; WGD = whole genome duplication.) The (<b>B</b>) mean P-profile dissimilarity scores (<i>d<sub>p</sub></i>), (<b>C</b>) shared compartment ratios, (<b>D</b>) mean protein expression levels, and (<b>E</b>) total numbers of occupied compartments for duplicates with different divergence times (dashed black vertical line = division between “young” and “old” duplicates; error bars = standard errors; red/blue line = mean values for the young or old duplicates, respectively; P-values from two-sided t-tests between young and old duplicates.) (<b>F</b>) Results from linear regression modeling of <i>d<sub>p</sub></i> and shared compartment ratio using divergence age and protein expression level as factors (T = T-statistics of the factors; F = F-statistics for the analyses of variance (ANOVA) between two regression models with different factors; R<sup>2</sup> = squared correlation coefficients between the actual and predicted values of <i>d<sub>p</sub></i> or shared compartment ratio.)</p

    A subcellular localization map for the yeast proteome.

    No full text
    <p>(<b>A</b>) An example of how PLAST assigns compartments to an ORF, YDR110W (black curve = estimated probability distribution of the <i>d<sub>p</sub></i> scores between the ORF and a catalog of 73 major subcellular compartments; dashed red vertical line = local maxima of the distribution with the highest <i>d<sub>p</sub></i> value; red curve = estimated “null” distribution of the <i>d<sub>p</sub></i> scores between the ORF and non-specifically localized compartments; blue vertical line = a threshold for compartments with <i>d<sub>p</sub></i> significantly less than the null distribution at Bonferroni-adjusted P˜<2.5×10<sup>−4</sup>.) The estimated mean and standard deviation of the null distribution are used to standardize the <i>d<sub>p</sub></i> scores between the ORF and all compartments. (<b>B</b>) A subcellular localization map showing the standardized P-profile dissimilariy scores () between 4066 ORFs (x-axis) and the 73 major subcellular compartments (y-axis) in a budding yeast cell. The compartments (rows) were ordered using a hierarchical clustering algorithm with cosine dissimilarity scores, and labeled with color codes according to their known functions or localizations (“common” compartments = compartments assigned to large numbers of ORFs.) A fully annotated map is shown in <b>Supplementary <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003504#pcbi.1003504.s010" target="_blank">Fig. S7</a></b>. (<b>C</b>) Using a Bonferroni-adjusted threshold of P˜<1.0×10<sup>−12</sup>, we assigned compartments to each and every ORF. Among the 73 compartments, we found 22 compartments whose known components and “non-components” assigned by PLAST share at least one common, significantly-enriched GO biological process (P˜<0.05 with false-discovery-rate adjustment, hypergeometric test). Shown are the percentages of known- and non-components in all the ORFs assigned with these compartments by PLAST. The list of (up to three) common enriched GO biological processes for each compartment is also shown (pol. = polymerase, reg. = regulation, RNP = ribonucleoprotein).</p

    US EPA-A*STAR Partnership: Accelerating Acceptance of Next-Generation Sciences and Application to Regulatory Risk Assessment

    No full text
    Presentation on US EPA – A*STAR Partnership at international symposium on accelerating the acceptance of next-generation sciences and their application to regulatory risk assessment in Singapore February 201
    corecore