27 research outputs found

    Integration of pan-cancer transcriptomics with RPPA proteomics reveals mechanisms of epithelial-mesenchymal transition

    No full text
    <div><p>Integrating data from multiple regulatory layers across cancer types could elucidate additional mechanisms of oncogenesis. Using antibody-based protein profiling of 736 cancer cell lines, along with matching transcriptomic data, we show that pan-cancer bimodality in the amounts of mRNA, protein, and protein phosphorylation reveals mechanisms related to the epithelial-mesenchymal transition (EMT). Based on the bimodal expression of E-cadherin, we define an EMT signature consisting of 239 genes, many of which were not previously associated with EMT. By querying gene expression signatures collected from cancer cell lines after small-molecule perturbations, we identify enrichment for histone deacetylase (HDAC) inhibitors as inducers of EMT, and kinase inhibitors as mesenchymal-to-epithelial transition (MET) promoters. Causal modeling of protein-based signaling identifies putative drivers of EMT. In conclusion, integrative analysis of pan-cancer proteomic and transcriptomic data reveals key regulatory mechanisms of oncogenic transformation.</p></div

    Bimodal protein expression and phosphorylation detected across cancer types associate with known oncogenic processes including EMT.

    No full text
    <p>(<b>A</b>) Two-component Gaussian mixture model fit to E-cadherin protein expression. The lines indicate the probability density contribution from the low (-) and high (+) expression components. The histogram represents the RPPA measurements for the cell lines. (<b>B</b>) By comparing a two- versus one-component fit using the Bayesian Information Criterion (BIC), 260 out of 450 RPPA measurements supported bimodal expression. (<b>C</b>) Heat map of the posterior probabilities of each cell line belonging to the low (-, blue) or high (+, red) mixture component for the top-20 most bimodal proteins. The posterior probabilities can be thought of as soft assignments for the cell lines to low or high expression. Shannon entropy of the tissues assigned to low and high expression quantify the tissue diversity giving rise to the bimodal fits. (<b>D</b>) Overview of classification approach of proteins in terms of bimodality, tissue diversity (Shannon entropy), and frequency of cell lines assigned to the fitted distributions. (<b>E</b>) Significant GO terms for common bimodal proteins that were not found to be significant for non-bimodal proteins (p < 0.05, Benjamini-Hochberg).</p

    Bayesian networks of proteins and phosphosites inferred from pan-cancer cell lines identify drivers of EMT and correlate to tumor networks.

    No full text
    <p>All Bayesian network structures were inferred by a Fast Greedy Search algorithm. (<b>A</b>) Network centrality statistics of the directed causal graph over all measured proteins pertaining to the influence of proteins on cancer signaling. (<b>B</b>) In and out degree distributions of proteins and phosphosites from the inferred network. (<b>C</b>) Causal neighborhood (1st neighbors) of EMT markers E-cadherin, Rab25, and Claudin7. Tissue-specific correlations in support for each edge are shown as bars along the edges. The layout was determined using the hierarchical Sugiyama algorithm with all edges oriented downwards. (<b>D</b>) Network comparisons between cell line and tumor data. (<b>E</b>) Distribution of average connectivity in bootstrapped Bayesian networks. (<b>F</b>) Distribution of networks of bimodal coupling coefficients.</p

    Pan-cancer bimodal coupling between E-cadherin protein expression and genome-wide transcripts defines an EMT signature, predicting EMT- and MET-inducing small-molecules.

    No full text
    <p>(<b>A</b>) Top-25 transcripts in CCLE with the strongest positive and top 24 negative bimodal coupling coefficients (r<sub>b</sub>) to E-cadherin protein expression. Red squares indicate previous EMT signature genes in non-small cell lung carcinoma published by Byers <i>et al</i>. [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005911#pcbi.1005911.ref051" target="_blank">51</a>]. To define an EMT signature, we considered transcripts with |r<sub>b</sub>| > 0.5, resulting in 215 epithelial and 24 mesenchymal markers. (<b>B</b>) Distribution of bimodal coupling coefficients, showing that E-cadherin coupling coefficients are shifted towards negative values compared to all measured proteins. (<b>C</b>) Overlap of EMT signature with previously published transcriptomic EMT signatures. The ‘mesenchymal’ bar plot is for the inversely correlated (coupled) genes and the ‘epithelial’ for the positively correlated genes. (<b>D</b>-<b>E</b>) Gene set enrichment analysis of epithelial part of the EMT signature. The TF enrichment analysis used ChIP-seq data to predict TFs involved in the regulation of the epithelial genes. The pie charts indicate the fraction of the signature genes associated with significantly enriched terms or TFs. (<b>F</b>) Small-molecule perturbations predicted to induce EMT and MET based on L1000 cell line data and the L1000CDS<sup>2</sup> method. The top-50 signatures are shown with results from multiple cell lines or concentrations aggregated by boxplots. PK: protein kinase.</p

    Proteins and phosphosites with coupled bimodality form network communities associated with EMT and intermediate transitions.

    No full text
    <p>(<b>A</b>) Pan-cancer protein communities detected by Spearman’s correlation of the posterior probabilities of cell lines having low or high expression (|r<sub>b</sub>| > 0.3). Only RPPA measurements associated with bimodal fits with high tissue diversity were included. Network communities were detected by calculating the leading non-negative eigenvector according to Newman’s method. Only edges within identified communities are shown, colored by the magnitude of the bimodal coupling coefficients. The size and color of the nodes represent the fitted mixing parameters from the Gaussian mixture models, quantifying whether underlying switches are common or rare in cancer cell lines. Each community was manually named according to plausible biological mechanisms by conducting a literature search for their protein members. Asterisks (*) indicate proteins with reported mechanisms linked to EMT. (<b>B</b>) Proposed interpretation of a two-step transition from the endothelial–E, to the mesenchymal–M states through two identified modules: EMT1 and EMT3. (<b>C</b>) Supporting protein expression data showing that Claudin7 and E-cadherin are correlated. Cell lines are colored by the tissue of origin (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005911#pcbi.1005911.g003" target="_blank">Fig 3</a> for tissue labels).</p

    Bimodal coupling between regulatory layers.

    No full text
    <p>(<b>A</b>) Comparison of mRNA and protein bimodality colored by the coupling (Spearman’s correlation) between posterior probabilities of two-component Gaussian mixture models. The red percentage indicates the fraction of compared genes with coupled (r<sub>b</sub> > 0.5) bimodalities at the transcript- and protein level. (<b>B</b>) Bimodal coupling of phosphosites and protein expression. (<b>C</b>) High confidence assignments (p < 0.1) to low or high expression for selection of bimodally coupled mRNA-protein pairs. (<b>D</b>) Scatter plot of E-cadherin mRNA and protein expression, indicating in red the 30 cell lines assigned to high mRNA but low protein expression (+–). r is Pearson’s correlation and r<sub>b</sub> the bimodal coupling coefficient. (<b>E</b>) Tissue of origin and <i>CDH1</i> (E-cadherin) mutational status of 30 cell lines with high E-cahderin mRNA but low protein expression. Of these cell lines, 3 out of 4 cell lines genotyped for <i>CDH1</i> in COSMIC, all had mutations in the coding sequence. fs: frameshift, *: missense.</p

    Pan-cancer cell line data from CCLE transcriptomic and reverse phase protein arrays (RPPA) cluster by tissue of origin and E-cadherin expression but not by prior metastasis classification.

    No full text
    <p>(<b>A</b>) Overlap of available RPPA and CCLE data with regard to cancer cell lines (left), measured transcripts and proteins (middle), and proteins measured for both basal expression and phosphorylation levels (right). The colored areas indicate data used to calculate and compare Euclidean distances between cell lines. (<b>B</b>) t-SNE plots of overlapping cancer cell lines based on protein, transcript, and equally weighted combined data. Each point represents a cell line and is colored by the tissue of origin (top), E-cadherin expression (middle), or tumor classification (bottom). NS: not specified. (<b>C</b>) Comparing pairwise distances between all cell lines using a linear model at the mRNA or protein levels. The red points show the top-100 highest residuals of cell line pairs, and the blue points the top-100 lowest residuals. (<b>D</b>) Dendrograms of breast cancer cell lines mapped for transcriptomic and RPPA data. The leaves of the trees were arranged to minimize the number of crossing lines between leaves of the two trees. L1-5 represents clusters found within the luminal subtype of breast cancer cell lines.</p

    Bimodal protein expression and phosphorylation detected across cancer types associate with known oncogenic processes including EMT.

    No full text
    <p>(<b>A</b>) Two-component Gaussian mixture model fit to E-cadherin protein expression. The lines indicate the probability density contribution from the low (-) and high (+) expression components. The histogram represents the RPPA measurements for the cell lines. (<b>B</b>) By comparing a two- versus one-component fit using the Bayesian Information Criterion (BIC), 260 out of 450 RPPA measurements supported bimodal expression. (<b>C</b>) Heat map of the posterior probabilities of each cell line belonging to the low (-, blue) or high (+, red) mixture component for the top-20 most bimodal proteins. The posterior probabilities can be thought of as soft assignments for the cell lines to low or high expression. Shannon entropy of the tissues assigned to low and high expression quantify the tissue diversity giving rise to the bimodal fits. (<b>D</b>) Overview of classification approach of proteins in terms of bimodality, tissue diversity (Shannon entropy), and frequency of cell lines assigned to the fitted distributions. (<b>E</b>) Significant GO terms for common bimodal proteins that were not found to be significant for non-bimodal proteins (p < 0.05, Benjamini-Hochberg).</p

    The LINCS Data Portal and FAIR LINCS Dataset Landing Pages

    No full text
    <p>The LINCS Data Portal (LDP) presents a unified interface to access LINCS datasets and metadata with mappings to several external resources. LDP provides various options to explore, query, and download LINCS dataset packages and reagents that have been described using the LINCS metadata standards.</p><p>We recently introduced LINCS Dataset Landing Pages to provide integrated access to important content for each LINCS dataset. The landing pages provide deep metadata for each LINCS dataset including description of the assays, authors, data analysis pipelines, and standardized reagents such as small molecules cell lines, antibodies, etc, with rich annotations. The landing pages are a key component to make LINCS data persistent and reusable, by integrating LINCS datasets, data processing pipelines, analytes, perturbations, model systems and related concepts as uniquely identifiable digital research objects.</p><p>LDP supports ontology-driven concept search, free text search, facet filtering, logical intersection of filters (AND, OR), and list, table, and matrix views. LDP enables download of LINCS dataset packages, which consist of released datasets and associated metadata. LDP also provides several specialized apps including small molecule compounds and cell lines. A landing page facilitates interactive exploration of all LINCS datasets via several classifications.</p>LDP is built on a robust API and is integrated with the MetaData Registry and interfaces with other components of the Integrated Knowledge Environment (IKE) developed in our Center. All LINCS datasets are also indexed in bioCADDIE DataMed

    FAIR Dataset Landing Pages, Digital Research Objects, and Software Tools for LINCS and BD2K

    No full text
    <p>The Library of Integrated Network-based Signatures (LINCS, http://lincsproject.org/) program generates a wide variety of cell-based perturbation-response signatures using diverse assay technologies. For example, LINCS includes large-scale transcriptional profiling of genetic and small molecule perturbations, and various proteomics and imaging datasets. The BD2K LINCS Data Coordination and Integration Center (DCIC) has been developing a collection of tools including data standards specifications, data processing pipelines and infrastructure, a metadata registration system, and a diverse suite of end-user software tools to support and implement an end-to-end solution from submitting LINCS datasets by the Data and Signature Generation Centers (DSGCs) to dataset publication via a Data Portal followed by integrated data analytics enabled by easy to use web-based tools. We will give an overview of LINCS tools with an emphasis on our long-term goal of persistent and FAIR (findable, accessible, interoperable, reusable) LINCS resources by connecting signatures, data processing pipelines, analytes, perturbagens, model systems and related concepts, and analysis software tools via uniquely identifiable digital research objects.</p><p>All LINCS Datasets are already indexed in bioCADDIE DataMed. In another example of BD2K and LINCS collaboration, we are working with the CEDAR Metadata Center to develop a LINCS Community Metadata Framework for end-to-end metadata management supporting authoring, curation, validation, management, and sharing of LINCS metadata. Shared metadata facilitated via re-usable, modular, and user-friendly CEDAR templates provide the prospect of cross-searchable linkable datasets connecting many different data generation programs.</p>In addition to building an advanced integrated knowledge environment, our Center supports several internal and external data science research projects and we have an active outreach and training program. Our software and data analytics resources, data science projects, and training programs are available at http://bd2k-lincs.org/
    corecore