94 research outputs found

    ProteoClade: A taxonomic toolkit for multi-species and metaproteomic analysis

    Get PDF
    We present ProteoClade, a Python toolkit that performs taxa-specific peptide assignment, protein inference, and quantitation for multi-species proteomics experiments. ProteoClade scales to hundreds of millions of protein sequences, requires minimal computational resources, and is open source, multi-platform, and accessible to non-programmers. We demonstrate its utility for processing quantitative proteomic data derived from patient-derived xenografts and its speed and scalability enable a novel de novo proteomic workflow for complex microbiota samples

    Towards a microfluidic disease detection deviced based on cellular adhesion differences

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Biological Engineering Division, 2006.Includes bibliographical references (leaves 44-45).There is a great need in the fields of biology, medicine, and pharmaceuticals to create high-throughput devices for the detection of specific cell states in a heterogeneous mixture of cells. The desire is to differentiate among diseased and healthy cells, cell age, and cell type with the minimum amount of sample pretreatment. This project addresses this need by developing microfluidic devices that exploit the adhesion differences between cell states and cell types to rapidly count cells of different types without the need for labels. There are two avenues in which to explore cell adhesion differences with these devices, the first is a net electrostatic change at the surface of the cell wall and the second is the presence of specific cell-membrane adhesion proteins. It is hypothesized that the forced interaction of the cell wall with the microfabricated microcapillary walls would result in a differential velocity based on cell type that could be detected simply using a microscope and video camera or an interferometer. The eventual integration of cell velocity detection would result in a portable all-inclusive lab-on-a-chip system that could be used in the field for detecting the presence of diseases, such as malaria and cancer as well as in a lab setting for drug discovery.by Kristen M. Naegle.S.M

    Computational methodologies and resources for discovery of phosphorylation regulation and function in cellular networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Biological Engineering, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 145-156).Post-translational modifications (PTMs) regulate cellular signaling networks by modifying activity, localization, turnover and other characteristics of proteins in the cell. For example, signaling in receptor tyrosine kinase (RTK) networks, such as those downstream of epidermal growth factor receptor (EGFR) and insulin receptor, is initiated by binding of cytokines or growth factors, and is generally propagated by phosphorylation of signaling molecules. The rate of discovery of PTM sites is increasing rapidly and is significantly outpacing our biological understanding of the function and regulation of those modifications. The ten-fold increase in known phosphorylation sites over a five year time span can primarily be attributed to mass spectrometry (MS) measurement methods, which are capable of identifying and monitoring hundreds to thousands of phosphorylation sites across multiple biological samples. There is significant interest in the field in understanding these modifications, due to their important role in basic physiology as well as their implication in disease. In this thesis, we develop algorithms and tools to aid in analysis and organization of these immense datasets, which fundamentally seek to generate novel insights and testable hypotheses regarding the function and regulation of phosphorylation in RTK networks. We have developed a web-accessible analysis and repository resource for high-throughput quantitative measurements of post-translational modifications, called PTMScout. Additionally, we have developed a semi-automatic, high-throughput screen for unsupervised learning parameters based on their relative ability to partition datasets into functionally related and biologically meaningful clusters. We developed methods for comparing the variability and robustness of these clustering solutions and discovered that phosphopeptide co-clustering robustness can recapitulate known protein interaction networks, and extend them. Both of these tools take advantage of a new linear motif discovery algorithm, which we additionally used to find a putative regulatory sequence downstream of the highly tumorigenic EGFRvIII mutation that indicates casein kinase II (CK2) activity may be increased in glioblastoma.by Kristen M. Naegle.Ph.D

    ProteomeScout: A repository and analysis resource for post-translational modifications and proteins

    Get PDF
    ProteomeScout (https://proteomescout.wustl.edu) is a resource for the study of proteins and their post-translational modifications (PTMs) consisting of a database of PTMs, a repository for experimental data, an analysis suite for PTM experiments, and a tool for visualizing the relationships between complex protein annotations. The PTM database is a compendium of public PTM data, coupled with user-uploaded experimental data. ProteomeScout provides analysis tools for experimental datasets, including summary views and subset selection, which can identify relationships within subsets of data by testing for statistically significant enrichment of protein annotations. Protein annotations are incorporated in the ProteomeScout database from external resources and include terms such as Gene Ontology annotations, domains, secondary structure and non-synonymous polymorphisms. These annotations are available in the database download, in the analysis tools and in the protein viewer. The protein viewer allows for the simultaneous visualization of annotations in an interactive web graphic, which can be exported in Scalable Vector Graphics (SVG) format. Finally, quantitative data measurements associated with public experiments are also easily viewable within protein records, allowing researchers to see how PTMs change across different contexts. ProteomeScout should prove useful for protein researchers and should benefit the proteomics community by providing a stable repository for PTM experiments

    KSTAR: An algorithm to predict patient-specific kinase activities from phosphoproteomic data

    Get PDF
    Kinase inhibitors as targeted therapies have played an important role in improving cancer outcomes. However, there are still considerable challenges, such as resistance, non-response, patient stratification, polypharmacology, and identifying combination therapy where understanding a tumor kinase activity profile could be transformative. Here, we develop a graph- and statistics-based algorithm, called KSTAR, to convert phosphoproteomic measurements of cells and tissues into a kinase activity score that is generalizable and useful for clinical pipelines, requiring no quantification of the phosphorylation sites. In this work, we demonstrate that KSTAR reliably captures expected kinase activity differences across different tissues and stimulation contexts, allows for the direct comparison of samples from independent experiments, and is robust across a wide range of dataset sizes. Finally, we apply KSTAR to clinical breast cancer phosphoproteomic data and find that there is potential for kinase activity inference from KSTAR to complement the current clinical diagnosis of HER2 status in breast cancer patients

    A Digital Neuromorphic Architecture Efficiently Facilitating Complex Synaptic Response Functions Applied to Liquid State Machines

    Full text link
    Information in neural networks is represented as weighted connections, or synapses, between neurons. This poses a problem as the primary computational bottleneck for neural networks is the vector-matrix multiply when inputs are multiplied by the neural network weights. Conventional processing architectures are not well suited for simulating neural networks, often requiring large amounts of energy and time. Additionally, synapses in biological neural networks are not binary connections, but exhibit a nonlinear response function as neurotransmitters are emitted and diffuse between neurons. Inspired by neuroscience principles, we present a digital neuromorphic architecture, the Spiking Temporal Processing Unit (STPU), capable of modeling arbitrary complex synaptic response functions without requiring additional hardware components. We consider the paradigm of spiking neurons with temporally coded information as opposed to non-spiking rate coded neurons used in most neural networks. In this paradigm we examine liquid state machines applied to speech recognition and show how a liquid state machine with temporal dynamics maps onto the STPU-demonstrating the flexibility and efficiency of the STPU for instantiating neural algorithms.Comment: 8 pages, 4 Figures, Preprint of 2017 IJCN

    An integrated comparative phosphoproteomic and bioinformatic approach reveals a novel class of MPM-2 motifs upregulated in EGFRvIII-expressing Glioblastoma Cells

    Get PDF
    Glioblastoma (GBM, WHO grade IV) is an aggressively proliferative and invasive brain tumor that carries a poor clinical prognosis with a median survival of 9 to 12 months. In a prior phosphoproteomic study performed in the U87MG glioblastoma cell line, we identified tyrosine phosphorylation events that are regulated as a result of titrating EGFRvIII, a constitutively active mutant of the epidermal growth factor receptor (EGFR) associated with poor prognosis in GBM patients. In the present study, we have used the phosphoserine/phosphothreonine-specific antibody MPM-2 (mitotic protein monoclonal #2) to quantify serine/threonine phosphorylation events in the same cell lines. By employing a bioinformatic tool to identify amino acid sequence motifs regulated in response to increasing oncogene levels, a set of previously undescribed MPM-2 epitope sequence motifs orthogonal to the canonical “pS/pT-P” motif was identified. These motifs contain acidic amino acids in combinations of the −5, −2, +1, +3, and +5 positions relative to the phosphorylated amino acid. Phosphopeptides containing these motifs are upregulated in cells expressing EGFRvIII, raising the possibility of a general role for a previously unrecognized acidophilic kinase (e.g. casein kinase II (CK2)) in cell proliferation downstream of EGFR signaling.National Cancer Institute (U.S.). Integrative Cancer Biology Program (grant U54-CA112967)National Cancer Institute (U.S.). Bioengineering Research Partnership (grant R01-CA96504)National Institutes of Health (U.S.) (grant R01-GM60594

    Defining phenotypic and functional heterogeneity of glioblastoma stem cells by mass cytometry

    Get PDF
    Most patients with glioblastoma (GBM) die within 2 years. A major therapeutic goal is to target GBM stem cells (GSCs), a subpopulation of cells that contribute to treatment resistance and recurrence. Since their discovery in 2003, GSCs have been isolated using single-surface markers, such as CD15, CD44, CD133, and α6 integrin. It remains unknown how these single-surface marker-defined GSC populations compare with each other in terms of signaling and function and whether expression of different combinations of these markers is associated with different functional capacity. Using mass cytometry and fresh operating room specimens, we found 15 distinct GSC subpopulations in patients, and they differed in their MEK/ERK, WNT, and AKT pathway activation status. Once in culture, some subpopulations were lost and previously undetectable ones materialized. GSCs that highly expressed all 4 surface markers had the greatest self-renewal capacity, WNT inhibitor sensitivity, and in vivo tumorigenicity. This work highlights the potential signaling and phenotypic diversity of GSCs. Larger patient sample sizes and antibody panels are required to confirm these findings

    MCAM: Multiple Clustering Analysis Methodology for Deriving Hypotheses and Insights from High-Throughput Proteomic Datasets

    Get PDF
    Advances in proteomic technologies continue to substantially accelerate capability for generating experimental data on protein levels, states, and activities in biological samples. For example, studies on receptor tyrosine kinase signaling networks can now capture the phosphorylation state of hundreds to thousands of proteins across multiple conditions. However, little is known about the function of many of these protein modifications, or the enzymes responsible for modifying them. To address this challenge, we have developed an approach that enhances the power of clustering techniques to infer functional and regulatory meaning of protein states in cell signaling networks. We have created a new computational framework for applying clustering to biological data in order to overcome the typical dependence on specific a priori assumptions and expert knowledge concerning the technical aspects of clustering. Multiple clustering analysis methodology (‘MCAM’) employs an array of diverse data transformations, distance metrics, set sizes, and clustering algorithms, in a combinatorial fashion, to create a suite of clustering sets. These sets are then evaluated based on their ability to produce biological insights through statistical enrichment of metadata relating to knowledge concerning protein functions, kinase substrates, and sequence motifs. We applied MCAM to a set of dynamic phosphorylation measurements of the ERRB network to explore the relationships between algorithmic parameters and the biological meaning that could be inferred and report on interesting biological predictions. Further, we applied MCAM to multiple phosphoproteomic datasets for the ERBB network, which allowed us to compare independent and incomplete overlapping measurements of phosphorylation sites in the network. We report specific and global differences of the ERBB network stimulated with different ligands and with changes in HER2 expression. Overall, we offer MCAM as a broadly-applicable approach for analysis of proteomic data which may help increase the current understanding of molecular networks in a variety of biological problems.National Institutes of Health (U.S.) (NIH-U54-CA112967 )National Institutes of Health (U.S.) (NIH-R01-CA096504
    corecore