10 research outputs found
Knowledge-Based Analysis for Detecting Key Signaling Events from Time-Series Phosphoproteomics Data
<div><p>Cell signaling underlies transcription/epigenetic control of a vast majority of cell-fate decisions. A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. In a typical phosphoproteomics study, phosphorylation sites (substrates) of active kinases are quantified proteome-wide. By analyzing the activities of phosphorylation sites over a time-course, the temporal dynamics of signaling cascades can be elucidated. Since many substrates of a given kinase have similar temporal kinetics, clustering phosphorylation sites into distinctive clusters can facilitate identification of their respective kinases. Here we present a knowledge-based CLUster Evaluation (CLUE) approach for identifying the most informative partitioning of a given temporal phosphoproteomics data. Our approach utilizes prior knowledge, annotated kinase-substrate relationships mined from literature and curated databases, to first generate biologically meaningful partitioning of the phosphorylation sites and then determine key kinases associated with each cluster. We demonstrate the utility of the proposed approach on two time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway. The proposed approach will be a valuable resource in the identification and characterizing of signaling networks from phosphoproteomics data.</p></div
Optimal clustering and analysis of hES cell phosphoproteomics data.
<p>CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 20 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (<i>p</i>-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 3, 6, and 7. Known substrates of p70S6K and ERK kinases are highlighted as x and *, respectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of p70S6K and ERK1, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.</p
Comparison of CLUE with alternative approaches on the two phosphoproteomics datasets.
<p>ns, not significant;-, not applicable</p><p>Comparison of CLUE with alternative approaches on the two phosphoproteomics datasets.</p
Schematic overview of CLUE.
<p>The level of phosphorylation for each phosphorylation sites in the proteome are quantified in time-course by mass spectrometry. First, time-course profiles of phosphorylation sites are partitioned into clusters using a <i>k</i>-means clustering-based algorithm for a range of values for <i>k</i>. Next, the clustering result, for each <i>k</i>, is evaluated based on the correct clustering of known substrates of kinases, as annotated in the PhosphoSitePlus database [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004403#pcbi.1004403.ref053" target="_blank">53</a>], and an enrichment score is computed. The clustering with the highest enrichment score is reported as the optimal clustering along with kinases whose substrates are enriched within each cluster.</p
Optimal clustering and analysis of adipocytes phosphoproteomics data.
<p>CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 36 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (<i>p</i>-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 2, 7, 9 and 17. Known substrates of Akt1 and mTOR kinases are highlighted in x and *, r<i>e</i>spectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of Akt1 and mTOR, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.</p
Comparison of CLUE with alternative approaches.
<p>Raw scores, representing the quality of clustering result for each <i>k</i>, for each method were normalized to be between 0 and 1 (y-axis). The higher the score, the more informative the resulting clustering is. The methods were evaluated based on how accurately they can recover the true number of clusters within a simulated dataset. The yellow line represents the true number of clusters in the simulated dataset, and the red dot denotes the predicted number of clusters in each case.</p
Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics
A key step in the analysis of mass spectrometry (MS)-based
proteomics
data is the inference of proteins from identified peptide sequences.
Here we describe Re-Fraction, a novel machine learning algorithm that
enhances deterministic protein identification. Re-Fraction utilizes
several protein physical properties to assign proteins to expected
protein fractions that comprise large-scale MS-based proteomics data.
This information is then used to appropriately assign peptides to
specific proteins. This approach is sensitive, highly specific, and
computationally efficient. We provide algorithms and source code for
the current version of Re-Fraction, which accepts output tables from
the MaxQuant environment. Nevertheless, the principles behind Re-Fraction
can be applied to other protein identification pipelines where data
are generated from samples fractionated at the protein level. We demonstrate
the utility of this approach through reanalysis of data from a previously
published study and generate lists of proteins deterministically identified
by Re-Fraction that were previously only identified as members of
a protein group. We find that this approach is particularly useful
in resolving protein groups composed of splice variants and homologues,
which are frequently expressed in a cell- or tissue-specific manner
and may have important biological consequences
Scaled log fold change over time of kinase (shown in blue) and the corresponding CTA (shown in red, mean ± SD) for multiple kinases.
<p>Scaled log fold change over time of kinase (shown in blue) and the corresponding CTA (shown in red, mean ± SD) for multiple kinases.</p
Validation of IRS1 S265 as an AKT substrate.
<p>A) Comparison of AKT and RPS6KB1 consensus motif and IRS1 S265 site. B) CTA of AKT (green) and RPS6KB1 (purple) and time profile of IRS1 S265 (blue). (CTA is depicted with mean ± SD) C) Scatter plot of RPS6KB1 prediction scores (y-axis) against RPS6KB1 prediction score—AKT prediction score (x-axis). AKT training substrates are shown in red and RPS6KB1 training substrates are shown in blue. IRS1 S265 is shown in green. D) Insulin signaling via AKT and RPS6KB1. See main text for details. E) 3T3-L1 adipocytes were stimulated with insulin alone or in the presence of inhibitors of AKT (MK, GDC) or mTORC1 (Rapa), after which AKT and RPS6KB1 signaling were assessed by Western blotting. Blots shown are representative of 3 separate experiments. F) Quantification of IRS1 S265 phosphorylation from (E), depicted as mean ± SEM.</p
Overview of KSR-LIVE.
<p>A) Flowchart of clustering procedure. Substrates for a kinase (for example Akt) are extracted from the KSR knowledgebase and can either be exclusive (blue) or not (pink). In the first step tight clustering is performed on exclusive substrates and core substrates (purple) identified. In the second step tight clustering is performed using all substrates and the characteristic temporal activity of a kinase is identified. B) Heatmap of scaled log fold change of the characteristic temporal activity of 9 kinases over time. High log fold change is represented in red, low log fold change is shown in blue C) Table showing the time points included in the accuracy analysis and the accuracy of using a database or KSR-LIVE for Akt and mTOR.</p