35 research outputs found

    A proteomics sample metadata representation for multiomics integration and big data analysis

    Get PDF
    The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.publishedVersio

    An intrinsically disordered proteins community for ELIXIR.

    Get PDF
    Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled "An intrinsically disordered protein user community proposal for ELIXIR" held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders

    Computational autoimmune biomarker discovery with protein microarrays

    No full text
    In der Vergangenheit wurde für einige neurodegenerative Erkrankungen, wie Morbus Alzheimer (MA), Amyotrophe Lateralsklerose (ALS) und Morbus Parkinson (MP), eine Autoimmunkomponente vorgeschlagen. Um Biomarkerkandidaten für Autoimmunkrankheiten zu identifizieren, sind Hochdurchsatz-Protein-Microarrays wie der ProtoArray die am besten geeignete Technologie. Allerdings konnte in dieser Dissertation gezeigt werden, dass der Standard-Workflow für die Analyse von ProtoArray-Daten nicht optimal ist. Deshalb wurde ein verbesserter Workflow für die Datenanalyse entwickelt und validiert. Zusätzlich wurde die Software PAA implementiert und publiziert, um den verbesserten Workflow allgemein verfügbar zu machen. Schließlich wurden MA-, ALS- und MP-Datensätze mit dem verbesserten Workflow analysiert, um bereits publizierte Biomarkerkandidaten zu bestätigen und neue Biomarkerkandidaten zu identifizieren, sowie die biologische Relevanz von PAA und des verbesserten Workflows nachzuweisen.In the past, for some neurodegenerative diseases such as Alzheimer’s disease (AD), Amyotrophic lateral sclerosis (ALS) and Parkinson’s disease (PD) an autoimmune component has been proposed. In order to discover biomarker candidates for autoimmune diseases, high-throughput protein microarrays such as the ProtoArray are the most appropriate technology. However, as has been shown in this thesis, the default data analysis workflow is not optimal. Thus, an improved data analysis workflow has been developed and validated. Moreover, the software tool PAA has been implemented and published in order to make the improved workflow generally available. Finally, AD, ALS and PD datasets have been analyzed with the improved workflow in order to confirm already reported and discover novel biomarker candidates as well as to show the biological relevance of PAA and the improved workflow

    Small RNAs as biomarkers to differentiate benign and malign prostate diseases: An alternative for transrectal punch biopsy of the prostate?

    No full text
    Prostate cancer (PCa) is the most common cancer and the third most frequent cause of male cancer death in Germany. MicroRNAs (miRNA) appear to be involved in the development and progression of PCa. A diagnostic differentiation from benign prostate hyperplasia (BPH) is often only possible through transrectal punch biopsy. This procedure is described as painful and carries risks. It was investigated whether urinary miRNAs can be used as biomarkers to differentiate the prostate diseases above. Therefore urine samples from urological patients with BPH (25) or PCa (28) were analysed using Next-Generation Sequencing to detect the expression profile of total and exosomal miRNA/piRNA. 79 miRNAs and 5 piwi-interacting RNAs (piRNAs) were significantly differentially expressed (adjusted p-value 1 or = 0.7). In addition, machine-learning algorithms were used to identify a panel of 22 additional miRNAs, whose interaction makes it possible to differentiate the groups as well. There are promising individual candidates for potential use as biomarkers in prostate cancer. The innovative approach of applying machine learning methods to this kind of data could lead to further small RNAs coming into scientific focus, which have so far been neglected

    Advanced Fiber Type-Specific Protein Profiles Derived from Adult Murine Skeletal Muscle

    No full text
    Skeletal muscle is a heterogeneous tissue consisting of blood vessels, connective tissue, and muscle fibers. The last are highly adaptive and can change their molecular composition depending on external and internal factors, such as exercise, age, and disease. Thus, examination of the skeletal muscles at the fiber type level is essential to detect potential alterations. Therefore, we established a protocol in which myosin heavy chain isoform immunolabeled muscle fibers were laser microdissected and separately investigated by mass spectrometry to develop advanced proteomic profiles of all murine skeletal muscle fiber types. All data are available via ProteomeXchange with the identifier PXD025359. Our in-depth mass spectrometric analysis revealed unique fiber type protein profiles, confirming fiber type-specific metabolic properties and revealing a more versatile function of type IIx fibers. Furthermore, we found that multiple myopathy-associated proteins were enriched in type I and IIa fibers. To further optimize the assignment of fiber types based on the protein profile, we developed a hypothesis-free machine-learning approach, identified a discriminative peptide panel, and confirmed our panel using a public data set

    A Current Encyclopedia of Bioinformatics Tools, Data Formats and Resources for Mass Spectrometry Lipidomics

    No full text
    Mass spectrometry is a widely used technology to identify and quantify biomolecules such as lipids, metabolites and proteins necessary for biomedical research. In this study, we catalogued freely available software tools, libraries, databases, repositories and resources that support lipidomics data analysis and determined the scope of currently used analytical technologies. Because of the tremendous importance of data interoperability, we assessed the support of standardized data formats in mass spectrometric (MS)-based lipidomics workflows. We included tools in our comparison that support targeted as well as untargeted analysis using direct infusion/shotgun (DI-MS), liquid chromatography−mass spectrometry, ion mobility or MS imaging approaches on MS1 and potentially higher MS levels. As a result, we determined that the Human Proteome Organization-Proteomics Standards Initiative standard data formats, mzML and mzTab-M, are already supported by a substantial number of recent software tools. We further discuss how mzTab-M can serve as a bridge between data acquisition and lipid bioinformatics tools for interpretation, capturing their output and transmitting rich annotated data for downstream processing. However, we identified several challenges of currently available tools and standards. Potential areas for improvement were: adaptation of common nomenclature and standardized reporting to enable high throughput lipidomics and improve its data handling. Finally, we suggest specific areas where tools and repositories need to improve to become FAIRer

    Protein variability in cerebrospinal fluid and its possible implications for neurological protein biomarker research.

    No full text
    Cerebrospinal fluid is investigated in biomarker studies for various neurological disorders of the central nervous system due to its proximity to the brain. Currently, only a limited number of biomarkers have been validated in independent studies. The high variability in the protein composition and protein abundance of cerebrospinal fluid between as well as within individuals might be an important reason for this phenomenon. To evaluate this possibility, we investigated the inter- and intraindividual variability in the cerebrospinal fluid proteome globally, with a specific focus on disease biomarkers described in the literature. Cerebrospinal fluid from a longitudinal study group including 12 healthy control subjects was analyzed by label-free quantification (LFQ) via LC-MS/MS. Data were quantified via MaxQuant. Then, the intra- and interindividual variability and the reference change value were calculated for every protein. We identified and quantified 791 proteins, and 216 of these proteins were abundant in all samples and were selected for further analysis. For these proteins, we found an interindividual coefficient of variation of up to 101.5% and an intraindividual coefficient of variation of up to 29.3%. Remarkably, these values were comparably high for both proteins that were published as disease biomarkers and other proteins. Our results support the hypothesis that natural variability greatly impacts cerebrospinal fluid protein biomarkers because high variability can lead to unreliable results. Thus, we suggest controlling the variability of each protein to distinguish between good and bad biomarker candidates, e.g., by utilizing reference change values to improve the process of evaluating potential biomarkers in future studies

    Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs

    No full text
    In bottom-up proteomics, proteins are enzymatically digested into peptides before measurement with mass spectrometry. The relationship between proteins and their corresponding peptides can be represented by bipartite graphs. We conduct a comprehensive analysis of bipartite graphs using quantified peptides from measured data sets as well as theoretical peptides from an in silico\textit {in silico} digestion of the corresponding complete taxonomic protein sequence databases. The aim of this study is to characterize and structure the different types of graphs that occur and to compare them between data sets. We observed a large influence of the accepted minimum peptide length during in silico\textit {in silico} digestion. When changing from theoretical peptides to measured ones, the graph structures are subject to two opposite effects. On the one hand, the graphs based on measured peptides are on average smaller and less complex compared to graphs using theoretical peptides. On the other hand, the proportion of protein nodes without unique peptides, which are a complicated case for protein inference and quantification, is considerably larger for measured data. Additionally, the proportion of graphs containing at least one protein node without unique peptides rises when going from database to quantitative level. The fraction of shared peptides and proteins without unique peptides as well as the complexity and size of the graphs highly depends on the data set and organism. Large differences between the structures of bipartite peptide-protein graphs have been observed between database and quantitative level as well as between analyzed species. In the analyzed measured data sets, the proportion of protein nodes without unique peptides ranged from 6.4% to 55.0%. This highlights the need for novel methods that can quantify proteins without unique peptides. The knowledge about the structure of the bipartite peptide-protein graphs gained in this study will be useful for the development of such algorithms

    Computertomography-based prediction of complete response following neoadjuvant chemoradiotherapy of locally advanced rectal cancer

    No full text
    Therapeutic strategies for patients with locally advanced rectal cancer (LARC) who are achieving a pathological complete response (pCR) after neoadjuvant radio-chemotherapy (neoCRT) are being increasingly investigated. Recent trials challenge the current standard therapy of total mesorectal excision (TME). For some patients, the treatment strategy of "watch-and-wait" seems a preferable procedure. The key factor in determining individual treatment strategies following neoCRT is the precise evaluation of the tumor response. Contrast-enhanced computer tomography (ceCT) has proven its ability to discriminate benign and malign lesions in multiple cancers. In this study, we retrospectively analyzed the ceCT based density of LARC in 30 patients, undergoing neoCRT followed by TME. We compared the tumors´ pre- and post-neoCRT density and correlated the results to the amount of residual vital tumor cells in the resected tissue. Overall, the density decreased after neoCRT, with the highest decrease in patients achieving pCR. Densitometry demonstrated a specificity of 88% and sensitivity of 68% in predicting pCR. Thus, we claim that ceCT based densitometry is a useful tool in identifying patients with LARC who may benefit from a "watch-and-wait" strategy and suggest further prospective studies
    corecore