Search CORE

10 research outputs found

Knowledge-Based Analysis for Detecting Key Signaling Events from Time-Series Phosphoproteomics Data

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date: 01/08/2015
Field of study

<div>Cell signaling underlies transcription/epigenetic control of a vast majority of cell-fate decisions. A key goal in cell signaling studies is to identify the set of kinases that underlie key signaling events. In a typical phosphoproteomics study, phosphorylation sites (substrates) of active kinases are quantified proteome-wide. By analyzing the activities of phosphorylation sites over a time-course, the temporal dynamics of signaling cascades can be elucidated. Since many substrates of a given kinase have similar temporal kinetics, clustering phosphorylation sites into distinctive clusters can facilitate identification of their respective kinases. Here we present a knowledge-based CLUster Evaluation (CLUE) approach for identifying the most informative partitioning of a given temporal phosphoproteomics data. Our approach utilizes prior knowledge, annotated kinase-substrate relationships mined from literature and curated databases, to first generate biologically meaningful partitioning of the phosphorylation sites and then determine key kinases associated with each cluster. We demonstrate the utility of the proposed approach on two time-series phosphoproteomics datasets and identify key kinases associated with human embryonic stem cell differentiation and insulin signaling pathway. The proposed approach will be a valuable resource in the identification and characterizing of signaling networks from phosphoproteomics data.</div

Directory of Open Access Journals

PubMed Central

FigShare

Optimal clustering and analysis of hES cell phosphoproteomics data.

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date
Field of study

CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 20 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (p-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 3, 6, and 7. Known substrates of p70S6K and ERK kinases are highlighted as x and *, respectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of p70S6K and ERK1, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.</p

FigShare

Comparison of CLUE with alternative approaches on the two phosphoproteomics datasets.

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date
Field of study

ns, not significant;-, not applicableComparison of CLUE with alternative approaches on the two phosphoproteomics datasets.</p

FigShare

Schematic overview of CLUE.

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date
Field of study

The level of phosphorylation for each phosphorylation sites in the proteome are quantified in time-course by mass spectrometry. First, time-course profiles of phosphorylation sites are partitioned into clusters using a k-means clustering-based algorithm for a range of values for k. Next, the clustering result, for each k, is evaluated based on the correct clustering of known substrates of kinases, as annotated in the PhosphoSitePlus database [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1004403#pcbi.1004403.ref053" target="_blank">53</a>], and an enrichment score is computed. The clustering with the highest enrichment score is reported as the optimal clustering along with kinases whose substrates are enriched within each cluster.</p

FigShare

Optimal clustering and analysis of adipocytes phosphoproteomics data.

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date
Field of study

CLUE's estimation of number of clusters. The number of clusters evaluated ranges from 2 to 36 and the optimal number of clusters, as estimated by CLUE, is highlighted in red. Visual representation of temporal profiles of phosphorylation sites within each cluster. Membership scores of all phosphorylation sites within a cluster is used to create color gradient from green to red correspond to lower to higher clustering confidence. Size: number of phosphorylation sites that have membership in that cluster. Bar plot showing kinases whose substrates are enriched within each cluster (p-value < 0.05; Fisher’s exact test). Principal component analysis of the temporal profile of phosphorylation sites within clusters 2, 7, 9 and 17. Known substrates of Akt1 and mTOR kinases are highlighted in x and *, respectively. Motif enrichment analysis. Phosphorylation sites from each cluster are scored against the PSSMs of Akt1 and mTOR, respectively. The cluster with the highest motif enrichment scores (median) are highlighted in yellow.</p

FigShare

Comparison of CLUE with alternative approaches.

Author: Guang Hu (140331)
Jean Yee Hwa Yang (781173)
Pengyi Yang (781172)
Raja Jothi (31820)
Vivek Jayaswal (188318)
Xiaofeng Zheng (36701)
Publication venue
Publication date
Field of study

Raw scores, representing the quality of clustering result for each k, for each method were normalized to be between 0 and 1 (y-axis). The higher the score, the more informative the resulting clustering is. The methods were evaluated based on how accurately they can recover the true number of clusters within a simulated dataset. The yellow line represents the true number of clusters in the simulated dataset, and the red dot denotes the predicted number of clusters in each case.</p

FigShare

Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics

Author: Daniel J. Fazakerley (2087365)
David E. James (139275)
Guang Yang (154978)
Jean Yee-Hwa Yang (2087359)
Matthew J. Prior (2087362)
Pengyi Yang (781172)
Sean J. Humphrey (2087368)
Publication venue
Publication date
Field of study

A key step in the analysis of mass spectrometry (MS)-based proteomics data is the inference of proteins from identified peptide sequences. Here we describe Re-Fraction, a novel machine learning algorithm that enhances deterministic protein identification. Re-Fraction utilizes several protein physical properties to assign proteins to expected protein fractions that comprise large-scale MS-based proteomics data. This information is then used to appropriately assign peptides to specific proteins. This approach is sensitive, highly specific, and computationally efficient. We provide algorithms and source code for the current version of Re-Fraction, which accepts output tables from the MaxQuant environment. Nevertheless, the principles behind Re-Fraction can be applied to other protein identification pipelines where data are generated from samples fractionated at the protein level. We demonstrate the utility of this approach through reanalysis of data from a previously published study and generate lists of proteins deterministically identified by Re-Fraction that were previously only identified as members of a protein group. We find that this approach is particularly useful in resolving protein groups composed of splice variants and homologues, which are frequently expressed in a cell- or tissue-specific manner and may have important biological consequences

FigShare

Scaled log fold change over time of kinase (shown in blue) and the corresponding CTA (shown in red, mean ± SD) for multiple kinases.

Author: Daniel Fazakerley (2851943)
David James (114768)
Fatemeh Vafaee (2851946)
James Krycer (2851949)
Pengyi Yang (781172)
Rima Chaudhuri (2066485)
Sean Humphrey (2151385)
Westa Domanova (2851952)
Zdenka Kuncic (832583)
Publication venue
Publication date
Field of study

Scaled log fold change over time of kinase (shown in blue) and the corresponding CTA (shown in red, mean ± SD) for multiple kinases.</p

FigShare

Validation of IRS1 S265 as an AKT substrate.

Author: Daniel Fazakerley (2851943)
David James (114768)
Fatemeh Vafaee (2851946)
James Krycer (2851949)
Pengyi Yang (781172)
Rima Chaudhuri (2066485)
Sean Humphrey (2151385)
Westa Domanova (2851952)
Zdenka Kuncic (832583)
Publication venue
Publication date
Field of study

A) Comparison of AKT and RPS6KB1 consensus motif and IRS1 S265 site. B) CTA of AKT (green) and RPS6KB1 (purple) and time profile of IRS1 S265 (blue). (CTA is depicted with mean ± SD) C) Scatter plot of RPS6KB1 prediction scores (y-axis) against RPS6KB1 prediction score—AKT prediction score (x-axis). AKT training substrates are shown in red and RPS6KB1 training substrates are shown in blue. IRS1 S265 is shown in green. D) Insulin signaling via AKT and RPS6KB1. See main text for details. E) 3T3-L1 adipocytes were stimulated with insulin alone or in the presence of inhibitors of AKT (MK, GDC) or mTORC1 (Rapa), after which AKT and RPS6KB1 signaling were assessed by Western blotting. Blots shown are representative of 3 separate experiments. F) Quantification of IRS1 S265 phosphorylation from (E), depicted as mean ± SEM.</p

FigShare

Overview of KSR-LIVE.

Author: Daniel Fazakerley (2851943)
David James (114768)
Fatemeh Vafaee (2851946)
James Krycer (2851949)
Pengyi Yang (781172)
Rima Chaudhuri (2066485)
Sean Humphrey (2151385)
Westa Domanova (2851952)
Zdenka Kuncic (832583)
Publication venue
Publication date
Field of study

A) Flowchart of clustering procedure. Substrates for a kinase (for example Akt) are extracted from the KSR knowledgebase and can either be exclusive (blue) or not (pink). In the first step tight clustering is performed on exclusive substrates and core substrates (purple) identified. In the second step tight clustering is performed using all substrates and the characteristic temporal activity of a kinase is identified. B) Heatmap of scaled log fold change of the characteristic temporal activity of 9 kinases over time. High log fold change is represented in red, low log fold change is shown in blue C) Table showing the time points included in the accuracy analysis and the accuracy of using a database or KSR-LIVE for Akt and mTOR.</p

FigShare