263 research outputs found
Computational quality control tools for mass spectrometry proteomics
As mass spectrometry-based proteomics has matured during the past decade a growing emphasis has been placed on quality control. For this purpose multiple computational quality control tools have been introduced. These tools generate a set of metrics that can be used to assess the quality of a mass spectrometry experiment.
Here we review which different types of quality control metrics can be generated, and how they can be used to monitor both intra- and inter-experiment performance. We discuss the principal computational tools for quality control and list their main characteristics and applicability.
As most of these tools have specific use cases it is not straightforward to compare their performance. For this survey we used different sets of quality control metrics derived from information at various stages in a mass spectrometry process and evaluated their effectiveness at capturing qualitative information about an experiment using a supervised learning approach. Furthermore, we discuss currently available algorithmic solutions that enable the usage of these quality control metrics for decision-making.
This is the peer reviewed version of the following article: "Bittremieux, W., Valkenborg, D., Martens, L. & Laukens, K. Computational quality control tools for mass spectrometry proteomics. PROTEOMICS 17, 1600159 (2017)", which has been published in final form at https://doi.org/10.1002/pmic.201600159. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving
Proceedings of the EuBIC Winter School 2019
The 2019 European Bioinformatics Community (EuBIC) Winter School was held from January 15th to January 18th 2019 in Zakopane, Poland. This year’s meeting was the third of its kind and gathered international researchers in the field of (computational) proteomics to discuss (mainly) challenges in proteomics quantification and data independent acquisition (DIA). Here, we present an overview of the scientific program of the 2019 EuBIC Winter School. Furthermore, we can already give a small outlook to the upcoming EuBIC 2020 Developer’s Meeting
SpecHD: Hyperdimensional Computing Framework for FPGA-based Mass Spectrometry Clustering
Mass spectrometry-based proteomics is a key enabler for personalized
healthcare, providing a deep dive into the complex protein compositions of
biological systems. This technology has vast applications in biotechnology and
biomedicine but faces significant computational bottlenecks. Current
methodologies often require multiple hours or even days to process extensive
datasets, particularly in the domain of spectral clustering. To tackle these
inefficiencies, we introduce SpecHD, a hyperdimensional computing (HDC)
framework supplemented by an FPGA-accelerated architecture with integrated
near-storage preprocessing. Utilizing streamlined binary operations in an HDC
environment, SpecHD capitalizes on the low-latency and parallel capabilities of
FPGAs. This approach markedly improves clustering speed and efficiency, serving
as a catalyst for real-time, high-throughput data analysis in future healthcare
applications. Our evaluations demonstrate that SpecHD not only maintains but
often surpasses existing clustering quality metrics while drastically cutting
computational time. Specifically, it can cluster a large-scale human proteome
dataset-comprising 25 million MS/MS spectra and 131 GB of MS data-in just 5
minutes. With energy efficiency exceeding 31x and a speedup factor that spans a
range of 6x to 54x over existing state of-the-art solutions, SpecHD emerges as
a promising solution for the rapid analysis of mass spectrometry data with
great implications for personalized healthcare
ER-Mitochondria contact sites : a new regulator of cellular calcium flux comes into play
Endoplasmic reticulum (ER)-mitochondria membrane contacts are hotspots for calcium signaling. In this issue, Raturi et al. (2016. J. Cell Biol. http://dx.doi.org/10.1083/jcb.201512077) show that the thioredoxin TMX1 inhibits the calcium pump SERCA2b at ER-mitochondria contact sites, thereby affecting ER-mitochondrial calcium transfer and mitochondrial bioenergetics
Pattern mining of mass spectrometry quality control data
Pattern mining of mass spectrometry quality control data
Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics.
Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general.
In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics.
Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques.
The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information.
[1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular & Cellular Proteomics 9, 225–241 (2010).
[2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497–2509 (2014).
[3] Aksehirli, E., Goethals, B., Müller, E. & Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM ’13 937–942 (IEEE, 2013). doi:10.1109/ICDM.2013.146
[4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular & Cellular Proteomics (2014). doi:10.1074/mcp.M113.03590
Constitutive IP<sub>3</sub> signaling underlies the sensitivity of B-cell cancers to the Bcl-2/IP<sub>3</sub> receptor disruptor BIRD-2
Anti-apoptotic Bcl-2 proteins are upregulated in different cancers, including diffuse large B-cell lymphoma (DLBCL) and chronic lymphocytic leukemia (CLL), enabling survival by inhibiting pro-apoptotic Bcl-2-family members and inositol 1,4,5-trisphosphate (IP3) receptor (IP3R)-mediated Ca2+-signaling. A peptide tool (Bcl-2/IP3R Disruptor-2; BIRD-2) was developed to abrogate the interaction of Bcl-2 with IP3Rs by targeting Bcl-2′s BH4 domain. BIRD-2 triggers cell death in primary CLL cells and in DLBCL cell lines. Particularly, DLBCL cells with high levels of IP3R2 were sensitive to BIRD-2. Here, we report that BIRD-2-induced cell death in DLBCL cells does not only depend on high IP3R2-expression levels, but also on constitutive IP3 signaling, downstream of the tonically active B-cell receptor. The basal Ca2+ level in SU-DHL-4 DLBCL cells was significantly elevated due to the constitutive IP3 production. This constitutive IP3 signaling fulfilled a pro-survival role, since inhibition of phospholipase C (PLC) using U73122 (2.5 µM) caused cell death in SU-DHL-4 cells. Milder inhibition of IP3 signaling using a lower U73122 concentration (1 µM) or expression of an IP3 sponge suppressed both BIRD-2-induced Ca2+ elevation and apoptosis in SU-DHL-4 cells. Basal PLC/IP3 signaling also fulfilled a pro-survival role in other DLBCL cell lines, including Karpas 422, RI-1 and SU-DHL-6 cells, whereas PLC inhibition protected these cells against BIRD-2-evoked apoptosis. Finally, U73122 treatment also suppressed BIRD-2-induced cell death in primary CLL, both in unsupported systems and in co-cultures with CD40L-expressing fibroblasts. Thus, constitutive IP3 signaling in lymphoma and leukemia cells is not only important for cancer cell survival, but also represents a vulnerability, rendering cancer cells dependent on Bcl-2 to limit IP3R activity. BIRD-2 seems to switch constitutive IP3 signaling from pro-survival into pro-death, presenting a plausible therapeutic strategy
Machine learning applications in proteomics research: How the past can boost the future
Machine learning is a subdiscipline within artificial intelligence that focuses on algorithms that allow computers to learn solving a (complex) problem from existing data. This ability can be used to generate a solution to a particularly intractable problem, given that enough data are available to train and subsequently evaluate an algorithm on. Since MS-based proteomics has no shortage of complex problems, and since publicly available data are becoming available in ever growing amounts, machine learning is fast becoming a very popular tool in the field. We here therefore present an overview of the different applications of machine learning in proteomics that together cover nearly the entire wet- and dry-lab workflow, and that address key bottlenecks in experiment planning and design, as well as in data processing and analysis.acceptedVersio
Communicating Mass Spectrometry Quality Information in mzQC with Python, R, and Java
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intralaboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Quality Control (QC) working group of the Human Proteome Organization’s Proteomics Standards Initiative has established the standard mzQC file format for reporting and exchanging information relating to data quality. mzQC is based on the JavaScript Object Notation (JSON) format and provides a lightweight yet versatile file format that can be easily implemented in software. Here, we present open-source software libraries to process mzQC data in three programming languages: Python, using pymzqc; R, using rmzqc; and Java, using jmzqc. The libraries follow a common data model and provide shared functionalities, including the (de)serialization and validation of mzQC files. We demonstrate use of the software libraries in a workflow for extracting, analyzing, and visualizing QC metrics from different sources. Additionally, we show how these libraries can be integrated with each other, with existing software tools, and in automated workflows for the QC of mass spectrometry data. All software libraries are available as open source under the MS-Quality-Hub organization on GitHub (https://github.com/MS-Quality-Hub)
The HUPO-PSI standardized spectral library format
More and more proteomics datasets are becoming available in public repositories. The knowledge embedded in these datasets can be used to improve peptide identification workflows. Spectral library searching provides a straightforward method to boost identification rates using previously identified spectra. Alternatively, machine learning methods can learn from these spectra to accurately predict the behavior of peptides in a liquid chromatography-mass spectrometry system.
At the basis of both approaches are spectral libraries: Unified collections of previously identified spectra. Organizations and projects such as the National Institute of Standards and Technology (NIST), the Global Proteome Machine, PeptideAtlas, PRIDE Archive and MassIVE have all compiled spectral libraries for a multitude of species and experimental setups. A large obstacle, however, is that each organization provides libraries in a different file format. At the software level the problem propagates (if not expands), as different software tools require different file formats.
The solution is a standardized spectral library format that is sufficiently flexible to meet all users' demands, but that is also standardized enough to be usable across environments and software packages. This balance is achieved by setting up a standardized framework and a controlled vocabulary with metadata terms, and allow the format to be represented in different forms, such as plain text, JSON and HDF.
So far, the required (and optional) meta data has been compiled and added to the PSI-MS ontology, and versions of the text and JSON representations have been drafted. The tabular and HDF representations of the format are in development, as well as converters and validators in various programming languages
A community proposal to integrate proteomics activities in ELIXIR
Computational approaches have been major drivers behind the progress of proteomics in recent years. The aim of this white paper is to provide a framework for integrating computational proteomics into ELIXIR in the near future, and thus to broaden the portfolio of omics technologies supported by this European distributed infrastructure. This white paper is the direct result of a strategy meeting on ‘The Future of Proteomics in ELIXIR’ that took place in March 2017 in Tübingen (Germany), and involved representatives of eleven ELIXIR nodes. These discussions led to a list of priority areas in computational proteomics that would complement existing activities and close gaps in the portfolio of tools and services offered by ELIXIR so far. We provide some suggestions on how these activities could be integrated into ELIXIR’s existing platforms, and how it could lead to a new ELIXIR use case in proteomics. We also highlight connections to the related field of metabolomics, where similar activities are ongoing. This white paper could thus serve as a starting point for the integration of computational proteomics into ELIXIR. Over the next few months we will be working closely with all stakeholders involved, and in particular with other representatives of the proteomics community, to further refine this paper
- …