131 research outputs found

    Feature selection in the reconstruction of complex network representations of spectral data

    Get PDF
    Complex networks have been extensively used in the last decade to characterize and analyze complex systems, and they have been recently proposed as a novel instrument for the analysis of spectra extracted from biological samples. Yet, the high number of measurements composing spectra, and the consequent high computational cost, make a direct network analysis unfeasible. We here present a comparative analysis of three customary feature selection algorithms, including the binning of spectral data and the use of information theory metrics. Such algorithms are compared by assessing the score obtained in a classification task, where healthy subjects and people suffering from different types of cancers should be discriminated. Results indicate that a feature selection strategy based on Mutual Information outperforms the more classical data binning, while allowing a reduction of the dimensionality of the data set in two orders of magnitud

    Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data

    Get PDF
    Ever-increasing numbers of quantitative proteomics data sets constitute an underexploited resource for investigating protein function. Multiprotein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex’s signature may become conditional and less identifiable. Previously we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RF) machine learning algorithm. Here we test whether small protein complexes can define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RF can detect small protein complexes and relationships between them. We identify several complexes in quantitative proteomics results of wild-type and knockout mitotic chromosomes. Other proteins covary strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes reveals known interdependences among kinetochore subunits and a novel dependence between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships. Our NanoRF pipeline is available online

    Mass Spectrometry-Based (GeLC-MS/MS) Comparative Proteomic Analysis of Endoscopically (ePFT) Collected Pancreatic and Gastroduodenal Fluids

    Get PDF
    Objectives: The secretin-stimulated endoscopic pancreatic function test (ePFT) allows for the safe collection of gastroduodenal and pancreatic fluid from the duodenum. We test the hypothesis that these endoscopically collected fluids have different proteomes. As such, we aim to show that the ePFT method can be used to collect fluid enriched in pancreatic proteins to test for pancreatic function. Methods: Gastroduodenal and pancreatic fluid were collected sequentially from chronic pancreatitis patients undergoing an ePFT. Proteins from each fluid type were extracted using previously published optimized methods and subjected to GeLC-MS/MS analysis for protein identification and bioinformatics analysis. Results: Mass spectrometry analysis identified proteins that were exclusive in either gastroduodenal (46) or pancreatic fluid (234). Subsequent quantitative analysis revealed proteins that were differentially abundant with statistical significance. As expected, proteolytic enzymes and protease inhibitors were among the differentially detected proteins. The proteases pepsinogens and gastrin were enriched in gastroduodenal fluid, while common pancreatic enzymes (e.g., aminopeptidase N, chymotrypsin C, elastase-3A, trypsin, and carboxypeptidase A1, and elastase 2B) were found in greater abundance in pancreatic fluid. Similarly for protease inhibitors, members of the cystatin family were exclusive to gastroduodenal fluid, while serpins A11, B4, and D1 were exclusive to pancreatic fluid. Conclusions: We have shown that ePFT collection coupled with mass spectrometry can be used to identify differentially detected proteins in gastroduodenal and pancreatic fluids. The data obtained using GeLC-MS/MS techniques provide further evidence supporting the feasibility of using ePFT-collected fluid to study specific diseases of the upper gastrointestinal tract, such as chronic pancreatitis

    Advances in Quantitative Hepcidin Measurements by Time-of-Flight Mass Spectrometry

    Get PDF
    Assays for the detection of the iron regulatory hormone hepcidin in plasma or urine have not yet been widely available, whereas quantitative comparisons between hepcidin levels in these different matrices were thus far even impossible due to technical restrictions. To circumvent these limitations, we here describe several advances in time-of flight mass spectrometry (TOF MS), the most important of which concerned spiking of a synthetic hepcidin analogue as internal standard into serum and urine samples. This serves both as a control for experimental variation, such as recovery and matrix-dependent ionization and ion suppression, and at the same time allows value assignment to the measured hepcidin peak intensities. The assay improvements were clinically evaluated using samples from various patients groups and its relevance was further underscored by the significant correlation of serum hepcidin levels with serum iron indices in healthy individuals. Most importantly, this approach allowed kinetic studies as illustrated by the paired analyses of serum and urine samples, showing that more than 97% of the freely filtered serum hepcidin can be reabsorbed in the kidney. Thus, the here reported advances in TOF MS-based hepcidin measurements represent critical steps in the accurate quantification of hepcidin in various body fluids and pave the way for clinical studies on the kinetic behavior of hepcidin in both healthy and diseased states

    Discovery of serum biomarkers for pancreatic adenocarcinoma using proteomic analysis

    Get PDF
    Background and aims:The serum/plasma proteome was explored for biomarkers to improve the diagnostic ability of CA19-9 in pancreatic adenocarcinoma (PC).Methods:A Training Set of serum samples from 20 resectable and 18 stage IV PC patients, 54 disease controls (DCs) and 68 healthy volunteers (HVs) were analysed by surface-enhanced laser desorption and ionisation time-of-flight mass spectrometry (SELDI-TOF MS). The resulting protein panel was validated on 40 resectable PC, 21 DC and 19 HV plasma samples (Validation-1 Set) and further by ELISA on 33 resectable PC, 28 DC and 18 HV serum samples (Validation-2 Set). Diagnostic panels were derived using binary logistic regression incorporating internal cross-validation followed by receiver operating characteristic (ROC) analysis.Results:A seven-protein panel from the training set PC vs DC and from PC vs HV samples gave the ROC area under the curve (AUC) of 0.90 and 0.90 compared with 0.87 and 0.91 for CA19-9. The AUC was greater (0.97 and 0.99, P0.05) when CA19-9 was added to the panels and confirmed on the validation-1 samples. A simplified panel of apolipoprotein C-I (ApoC-I), apolipoprotein A-II (ApoA-II) and CA19-9 was tested on the validation-2 set by ELISA, in which the ROC AUC was greater than that of CA19-9 alone for PC vs DC (0.90 vs 0.84) and for PC vs HV (0.96 vs 0.90).Conclusions:A simplified diagnostic panel of CA19-9, ApoC-I and ApoA-II improves the diagnostic ability of CA19-9 alone and may have clinical utility

    Estimation of Relevant Variables on High-Dimensional Biological Patterns Using Iterated Weighted Kernel Functions

    Get PDF
    BACKGROUND The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. METHODOLOGY/PRINCIPAL FINDINGS Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. CONCLUSIONS Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation

    Identification of novel biomarker candidates by proteomic analysis of cerebrospinal fluid from patients with moyamoya disease using SELDI-TOF-MS

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Moyamoya disease (MMD) is an uncommon cerebrovascular condition with unknown etiology characterized by slowly progressive stenosis or occlusion of the bilateral internal carotid arteries associated with an abnormal vascular network. MMD is a major cause of stroke, specifically in the younger population. Diagnosis is based on only radiological features as no other clinical data are available. The purpose of this study was to identify novel biomarker candidate proteins differentially expressed in the cerebrospinal fluid (CSF) of patients with MMD using proteomic analysis.</p> <p>Methods</p> <p>For detection of biomarkers, CSF samples were obtained from 20 patients with MMD and 12 control patients. Mass spectral data were generated by surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS) with an anion exchange chip in three different buffer conditions. After expression difference mapping was undertaken using the obtained protein profiles, a comparative analysis was performed.</p> <p>Results</p> <p>A statistically significant number of proteins (34) were recognized as single biomarker candidate proteins which were differentially detected in the CSF of patients with MMD, compared to the control patients (p < 0.05). All peak intensity profiles of the biomarker candidates underwent classification and regression tree (CART) analysis to produce prediction models. Two important biomarkers could successfully classify the patients with MMD and control patients.</p> <p>Conclusions</p> <p>In this study, several novel biomarker candidate proteins differentially expressed in the CSF of patients with MMD were identified by a recently developed proteomic approach. This is a pilot study of CSF proteomics for MMD using SELDI technology. These biomarker candidates have the potential to shed light on the underlying pathogenesis of MMD.</p
    corecore