52,019 research outputs found

    Combining Predictors for Classification using the Area Under the ROC Curve

    Get PDF
    No single biomarker for cancer is considered adequately sensitive and specific for cancer screening. It is expected that the results of multiple markers will need to be combined in order to yield adequately accurate classification. Typically the objective function that is optimized for combining markers is the likelihood function. In this paper we consider an alternative objective function -- the area under the empirical receiver operating characteristic curve (AUC). We note that it yields consistent estimates of parameters in a generalized linear model for the risk score but does not require specifying the link function. Like logistic regression it yields consistent estimation with case-control or cohort data. Simulation studies suggest that AUC-based classification scores have performance comparable with logistic likelihood based scores when the logistic regression model holds. Analysis of data from a proteomics biomarker study shows that performance can be far superior to logistic regression derived scores when the logistic regression model does not hold. Model fitting by maximizing the AUC rather than the likelihood should be considered when the goal is to derive a marker combination score for classification or prediction

    Data processing and classification analysis of proteomic changes: a case study of oil pollution in the mussel, Mytilus edulis

    Get PDF
    BACKGROUND: Proteomics may help to detect subtle pollution-related changes, such as responses to mixture pollution at low concentrations, where clear signs of toxicity are absent. The challenges associated with the analysis of large-scale multivariate proteomic datasets have been widely discussed in medical research and biomarker discovery. This concept has been introduced to ecotoxicology only recently, so data processing and classification analysis need to be refined before they can be readily applied in biomarker discovery and monitoring studies. RESULTS: Data sets obtained from a case study of oil pollution in the Blue mussel were investigated for differential protein expression by retentate chromatography-mass spectrometry and decision tree classification. Different tissues and different settings were used to evaluate classifiers towards their discriminatory power. It was found that, due the intrinsic variability of the data sets, reliable classification of unknown samples could only be achieved on a broad statistical basis (n > 60) with the observed expression changes comprising high statistical significance and sufficient amplitude. The application of stringent criteria to guard against overfitting of the models eventually allowed satisfactory classification for only one of the investigated data sets and settings. CONCLUSION: Machine learning techniques provide a promising approach to process and extract informative expression signatures from high-dimensional mass-spectrometry data. Even though characterisation of the proteins forming the expression signatures would be ideal, knowledge of the specific proteins is not mandatory for effective class discrimination. This may constitute a new biomarker approach in ecotoxicology, where working with organisms, which do not have sequenced genomes render protein identification by database searching problematic. However, data processing has to be critically evaluated and statistical constraints have to be considered before supervised classification algorithms are employed

    Multicentric validation of proteomic biomarkers in urine specific for diabetic nephropathy

    Get PDF
    Background: Urine proteome analysis is rapidly emerging as a tool for diagnosis and prognosis in disease states. For diagnosis of diabetic nephropathy (DN), urinary proteome analysis was successfully applied in a pilot study. The validity of the previously established proteomic biomarkers with respect to the diagnostic and prognostic potential was assessed on a separate set of patients recruited at three different European centers. In this case-control study of 148 Caucasian patients with diabetes mellitus type 2 and duration >= 5 years, cases of DN were defined as albuminuria >300 mg/d and diabetic retinopathy (n = 66). Controls were matched for gender and diabetes duration (n = 82). Methodology/Principal Findings: Proteome analysis was performed blinded using high-resolution capillary electrophoresis coupled with mass spectrometry (CE-MS). Data were evaluated employing the previously developed model for DN. Upon unblinding, the model for DN showed 93.8% sensitivity and 91.4% specificity, with an AUC of 0.948 (95% CI 0.898-0.978). Of 65 previously identified peptides, 60 were significantly different between cases and controls of this study. In <10% of cases and controls classification by proteome analysis not entirely resulted in the expected clinical outcome. Analysis of patient's subsequent clinical course revealed later progression to DN in some of the false positive classified DN control patients. Conclusions: These data provide the first independent confirmation that profiling of the urinary proteome by CE-MS can adequately identify subjects with DN, supporting the generalizability of this approach. The data further establish urinary collagen fragments as biomarkers for diabetes-induced renal damage that may serve as earlier and more specific biomarkers than the currently used urinary albumin

    Development of a MALDI MS-based platform for early detection of acute kidney injury

    Get PDF
    Purpose: Septic acute kidney injury (AKI) is associated with poor outcome. This can partly be attributed to delayed diagnosis and incomplete understanding of the underlying pathophysiology. Our aim was to develop an early predictive test for AKI based on the analysis of urinary peptide biomarkers by MALDI-MS. Experimental design: Urine samples from 95 patients with sepsis were analyzed by MALDI-MS. Marker search and multimarker model establishment were performed using the peptide profiles from 17 patients with existing or within the next 5 days developing AKI and 17 with no change in renal function. Replicates of urine sample pools from the AKI and non-AKI patient groups and normal controls were also included to select the analytically most robust AKI markers. Results: Thirty-nine urinary peptides were selected by cross-validated variable selection to generate a support vector machine multidimensional AKI classifier. Prognostic performance of the AKI classifier on an independent validation set including the remaining 61 patients of the study population (17 controls and 44 cases) was good with an area under the receiver operating characteristics curve of 0.82 and a sensitivity and specificity of 86% and 76%, respectively. Conclusion and clinical relevance: A urinary peptide marker model detects onset of AKI with acceptable accuracy in septic patients. Such a platform can eventually be transferred to the clinic as fast MALDI-MS test format

    Assessment of metabolomic and proteomic biomarkers in detection and prognosis of progression of renal function in chronic kidney disease

    Get PDF
    Chronic kidney disease (CKD) is part of a number of systemic and renal diseases and may reach epidemic proportions over the next decade. Efforts have been made to improve diagnosis and management of CKD. We hypothesised that combining metabolomic and proteomic approaches could generate a more systemic and complete view of the disease mechanisms. To test this approach, we examined samples from a cohort of 49 patients representing different stages of CKD. Urine samples were analysed for proteomic changes using capillary electrophoresis-mass spectrometry and urine and plasma samples for metabolomic changes using different mass spectrometry-based techniques. The training set included 20 CKD patients selected according to their estimated glomerular filtration rate (eGFR) at mild (59.9±16.5 mL/min/1.73 m2; n = 10) or advanced (8.9±4.5 mL/min/1.73 m2; n = 10) CKD and the remaining 29 patients left for the test set. We identified a panel of 76 statistically significant metabolites and peptides that correlated with CKD in the training set. We combined these biomarkers in different classifiers and then performed correlation analyses with eGFR at baseline and follow-up after 2.8±0.8 years in the test set. A solely plasma metabolite biomarker-based classifier significantly correlated with the loss of kidney function in the test set at baseline and follow-up (ρ = −0.8031; p<0.0001 and ρ = −0.6009; p = 0.0019, respectively). Similarly, a urinary metabolite biomarker-based classifier did reveal significant association to kidney function (ρ = −0.6557; p = 0.0001 and ρ = −0.6574; p = 0.0005). A classifier utilising 46 identified urinary peptide biomarkers performed statistically equivalent to the urinary and plasma metabolite classifier (ρ = −0.7752; p<0.0001 and ρ = −0.8400; p<0.0001). The combination of both urinary proteomic and urinary and plasma metabolic biomarkers did not improve the correlation with eGFR. In conclusion, we found excellent association of plasma and urinary metabolites and urinary peptides with kidney function, and disease progression, but no added value in combining the different biomarkers data

    Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data

    Get PDF
    Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets

    Mining whole sample mass spectrometry proteomics data for biomarkers: an overview

    No full text
    In this paper we aim to provide a concise overview of designing and conducting an MS proteomics experiment in such a way as to allow statistical analysis that may lead to the discovery of novel biomarkers. We provide a summary of the various stages that make up such an experiment, highlighting the need for experimental goals to be decided upon in advance. We discuss issues in experimental design at the sample collection stage, and good practise for standardising protocols within the proteomics laboratory. We then describe approaches to the data mining stage of the experiment, including the processing steps that transform a raw mass spectrum into a useable form. We propose a permutation-based procedure for determining the significance of reported error rates. Finally, because of its general advantages in speed and cost, we suggest that MS proteomics may be a good candidate for an early primary screening approach to disease diagnosis, identifying areas of risk and making referrals for more specific tests without necessarily making a diagnosis in its own right. Our discussion is illustrated with examples drawn from experiments on bovine blood serum conducted in the Centre for Proteomic Research (CPR) at Southampton University

    Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

    Get PDF
    Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics
    corecore