89 research outputs found

    Controlling the Precision-Recall Tradeoff in Differential Dependency Network Analysis

    Full text link
    Graphical models have gained a lot of attention recently as a tool for learning and representing dependencies among variables in multivariate data. Often, domain scientists are looking specifically for differences among the dependency networks of different conditions or populations (e.g. differences between regulatory networks of different species, or differences between dependency networks of diseased versus healthy populations). The standard method for finding these differences is to learn the dependency networks for each condition independently and compare them. We show that this approach is prone to high false discovery rates (low precision) that can render the analysis useless. We then show that by imposing a bias towards learning similar dependency networks for each condition the false discovery rates can be reduced to acceptable levels, at the cost of finding a reduced number of differences. Algorithms developed in the transfer learning literature can be used to vary the strength of the imposed similarity bias and provide a natural mechanism to smoothly adjust this differential precision-recall tradeoff to cater to the requirements of the analysis conducted. We present real case studies (oncological and neurological) where domain experts use the proposed technique to extract useful differential networks that shed light on the biological processes involved in cancer and brain function

    Improving Assessment of Drug Safety Through Proteomics: Early Detection and Mechanistic Characterization of the Unforeseen Harmful Effects of Torcetrapib.

    Get PDF
    BackgroundEarly detection of adverse effects of novel therapies and understanding of their mechanisms could improve the safety and efficiency of drug development. We have retrospectively applied large-scale proteomics to blood samples from ILLUMINATE (Investigation of Lipid Level Management to Understand its Impact in Atherosclerotic Events), a trial of torcetrapib (a cholesterol ester transfer protein inhibitor), that involved 15 067 participants at high cardiovascular risk. ILLUMINATE was terminated at a median of 550 days because of significant absolute increases of 1.2% in cardiovascular events and 0.4% in mortality with torcetrapib. The aims of our analysis were to determine whether a proteomic analysis might reveal biological mechanisms responsible for these harmful effects and whether harmful effects of torcetrapib could have been detected early in the ILLUMINATE trial with proteomics.MethodsA nested case-control analysis of paired plasma samples at baseline and at 3 months was performed in 249 participants assigned to torcetrapib plus atorvastatin and 223 participants assigned to atorvastatin only. Within each treatment arm, cases with events were matched to controls 1:1. Main outcomes were a survey of 1129 proteins for discovery of biological pathways altered by torcetrapib and a 9-protein risk score validated to predict myocardial infarction, stroke, heart failure, or death.ResultsPlasma concentrations of 200 proteins changed significantly with torcetrapib. Their pathway analysis revealed unexpected and widespread changes in immune and inflammatory functions, as well as changes in endocrine systems, including in aldosterone function and glycemic control. At baseline, 9-protein risk scores were similar in the 2 treatment arms and higher in participants with subsequent events. At 3 months, the absolute 9-protein derived risk increased in the torcetrapib plus atorvastatin arm compared with the atorvastatin-only arm by 1.08% (P=0.0004). Thirty-seven proteins changed in the direction of increased risk of 49 proteins previously associated with cardiovascular and mortality risk.ConclusionsHeretofore unknown effects of torcetrapib were revealed in immune and inflammatory functions. A protein-based risk score predicted harm from torcetrapib within just 3 months. A protein-based risk assessment embedded within a large proteomic survey may prove to be useful in the evaluation of therapies to prevent harm to patients.Clinical trial registrationURL: https://www.clinicaltrials.gov. Unique identifier: NCT00134264

    Unlocking biomarker discovery: Large scale application of aptamer proteomic technology for early detection of lung cancer

    Get PDF
    Lung cancer is the leading cause of cancer deaths, because ~84% of cases are diagnosed at an advanced stage. Worldwide in 2008, ~1.5 million people were diagnosed and ~1.3 million died – a survival rate unchanged since 1960. However, patients diagnosed at an early stage and have surgery experience an 86% overall 5-year survival. New diagnostics are therefore needed to identify lung cancer at this stage. Here we present the first large scale clinical use of aptamers to discover blood protein biomarkers in disease with our breakthrough proteomic technology. This multi-center case-control study was conducted in archived samples from 1,326 subjects from four independent studies of non-small cell lung cancer (NSCLC) in long-term tobacco-exposed populations. We measured >800 proteins in 15uL of serum, identified 44 candidate biomarkers, and developed a 12-protein panel that distinguished NSCLC from controls with 91% sensitivity and 84% specificity in a training set and 89% sensitivity and 83% specificity in a blinded, independent verification set. Performance was similar for early and late stage NSCLC. This is a significant advance in proteomics in an area of high clinical need

    Mechanisms of sodium-glucose cotransporter-2 inhibition: insights from large-scale proteomics

    Get PDF
    OBJECTIVE To assess the effects of empagliflozin, a selective sodium–glucose cotransporter 2 (SGLT2) inhibitor, on broad biological systems through proteomics. RESEARCH DESIGN AND METHODS Aptamer-based proteomics was used to quantify 3,713 proteins in 144 paired plasma samples obtained from 72 participants across the spectrum of glucose tolerance before and after 4 weeks of empagliflozin 25 mg/day. The biology of the plasma proteins significantly changed by empagliflozin (at false discovery rate–corrected P < 0.05) was discerned through Ingenuity Pathway Analysis. RESULTS Empagliflozin significantly affected levels of 43 proteins, 6 related to cardiomyocyte function (fatty acid–binding protein 3 and 4 [FABPA], neurotrophic receptor tyrosine kinase, renin, thrombospondin 4, and leptin receptor), 5 to iron handling (ferritin heavy chain 1, transferrin receptor protein 1, neogenin, growth differentiation factor 2 [GDF2], and β2-microglobulin), and 1 to sphingosine/ceramide metabolism (neutral ceramidase), a known pathway of cardiovascular disease. Among the protein changes achieving the strongest statistical significance, insulin-like binding factor protein-1 (IGFBP-1), transgelin-2, FABPA, GDF15, and sulphydryl oxidase 2 precursor were increased, while ferritin, thrombospondin 3, and Rearranged during Transfection (RET) were decreased by empagliflozin administration. CONCLUSIONS SGLT2 inhibition is associated, directly or indirectly, with multiple biological effects, including changes in markers of cardiomyocyte contraction/relaxation, iron handling, and other metabolic and renal targets. The most significant differences were detected in protein species (GDF15, ferritin, IGFBP-1, and FABP) potentially related to the clinical and metabolic changes that were actually measured in the same patients. These novel results may inform further studies using targeted proteomics and a prospective design

    Proteomic signatures for identification of impaired glucose tolerance

    Get PDF
    The implementation of recommendations for type 2 diabetes (T2D) screening and diagnosis focuses on the measurement of glycated hemoglobin (HbA1c) and fasting glucose. This approach leaves a large number of individuals with isolated impaired glucose tolerance (iIGT), who are only detectable through oral glucose tolerance tests (OGTTs), at risk of diabetes and its severe complications. We applied machine learning to the proteomic profiles of a single fasted sample from 11,546 participants of the Fenland study to test discrimination of iIGT defined using the gold-standard OGTTs. We observed significantly improved discriminative performance by adding only three proteins (RTN4R, CBPM and GHR) to the best clinical model (AUROC = 0.80 (95% confidence interval: 0.79–0.86), P = 0.004), which we validated in an external cohort. Increased plasma levels of these candidate proteins were associated with an increased risk for future T2D in an independent cohort and were also increased in individuals genetically susceptible to impaired glucose homeostasis and T2D. Assessment of a limited number of proteins can identify individuals likely to be missed by current diagnostic strategies and at high risk of T2D and its complications

    HFrEF subphenotypes based on 4210 repeatedly measured circulating proteins are driven by different biological mechanisms

    Get PDF
    Background: HFrEF is a heterogenous condition with high mortality. We used serial assessments of 4210 circulating proteins to identify distinct novel protein-based HFrEF subphenotypes and to investigate underlying dynamic biological mechanisms. Herewith we aimed to gain pathophysiological insights and fuel opportunities for personalised treatment. Methods: In 382 patients, we performed trimonthly blood sampling during a median follow-up of 2.1 [IQR:1.1–2.6] years. We selected all baseline samples and two samples closest to the primary endpoint (PEP; composite of cardiovascular mortality, HF hospitalization, LVAD implantation, and heart transplantation) or censoring, and applied an aptamer-based multiplex proteomic approach. Using unsupervised machine learning methods, we derived clusters from 4210 repeatedly measured proteomic biomarkers. Sets of proteins that drove cluster allocation were analysed via an enrichment analysis. Differences in clinical characteristics and PEP occurrence were evaluated. Findings: We identified four subphenotypes with different protein profiles, prognosis and clinical characteristics, including age (median [IQR] for subphenotypes 1–4, respectively:70 [64, 76], 68 [60, 79], 57 [47, 65], 59 [56, 66]years), EF (30 [26, 36], 26 [20, 38], 26 [22, 32], 33 [28, 37]%), and chronic renal failure (45%, 65%, 36%, 37%). Subphenotype allocation was driven by subsets of proteins associated with various biological functions, such as oxidative stress, inflammation and extracellular matrix organisation. Clinical characteristics of the subphenotypes were aligned with these associations. Subphenotypes 2 and 3 had the worst prognosis compared to subphenotype 1 (adjHR (95%CI):3.43 (1.76–6.69), and 2.88 (1.37–6.03), respectively). Interpretation: Four circulating-protein based subphenotypes are present in HFrEF, which are driven by varying combinations of protein subsets, and have different clinical characteristics and prognosis. Clinical Trial Registration: ClinicalTrials.gov Identifier: NCT01851538 https://clinicaltrials.gov/ct2/show/NCT01851538. Funding: EU/ EFPIA IMI2JU BigData@Heart grant n° 116074, Jaap Schouten Foundation and Noordwest Academie.</p

    Machine learning–based biomarker profile derived from 4210 serially measured proteins predicts clinical outcome of patients with heart failure

    Get PDF
    Aims Risk assessment tools are needed for timely identification of patients with heart failure (HF) with reduced ejection fraction (HFrEF) who are at high risk of adverse events. In this study, we aim to derive a small set out of 4210 repeatedly measured proteins, which, along with clinical characteristics and established biomarkers, carry optimal prognostic capacity for adverse events, in patients with HFrEF. Methods and results In 382 patients, we performed repeated blood sampling (median follow-up: 2.1 years) and applied an aptamer-based multiplex proteomic approach. We used machine learning to select the optimal set of predictors for the primary endpoint (PEP: composite of cardiovascular death, heart transplantation, left ventricular assist device implantation, and HF hospitalization). The association between repeated measures of selected proteins and PEP was investigated by multivariable joint models. Internal validation (cross-validated c-index) and external validation (Henry Ford HF PharmacoGenomic Registry cohort) were performed. Nine proteins were selected in addition to the MAGGIC risk score, N-terminal pro-hormone B-type natriuretic peptide, and troponin T: suppression of tumourigenicity 2, tryptophanyl-tRNA synthetase cytoplasmic, histone H2A Type 3, angiotensinogen, deltex-1, thrombospondin-4, ADAMTS-like protein 2, anthrax toxin receptor 1, and cathepsin D. N-terminal pro-hormone B-type natriuretic peptide and angiotensinogen showed the strongest associations [hazard ratio (95% confidence interval): 1.96 (1.17–3.40) and 0.66 (0.49–0.88), respectively]. The multivariable model yielded a c-index of 0.85 upon internal validation and c-indices up to 0.80 upon external validation. The c-index was higher than that of a model containing established risk factors (P = 0.021). Conclusion Nine serially measured proteins captured the most essential prognostic information for the occurrence of adverse events in patients with HFrEF, and provided incremental value for HF prognostication beyond established risk factors. These proteins could be used for dynamic, individual risk assessment in a prospective setting. These findings also illustrate the potential value of relatively ‘novel’ biomarkers for prognostication.</p

    Aptamer-based multiplexed proteomic technology for biomarker discovery

    Get PDF
    Interrogation of the human proteome in a highly multiplexed and efficient manner remains a coveted and challenging goal in biology. We present a new aptamer-based proteomic technology for biomarker discovery capable of simultaneously measuring thousands of proteins from small sample volumes (15 [mu]L of serum or plasma). Our current assay allows us to measure ~800 proteins with very low limits of detection (1 pM average), 7 logs of overall dynamic range, and 5% average coefficient of variation. This technology is enabled by a new generation of aptamers that contain chemically modified nucleotides, which greatly expand the physicochemical diversity of the large randomized nucleic acid libraries from which the aptamers are selected. Proteins in complex matrices such as plasma are measured with a process that transforms a signature of protein concentrations into a corresponding DNA aptamer concentration signature, which is then quantified with a DNA microarray. In essence, our assay takes advantage of the dual nature of aptamers as both folded binding entities with defined shapes and unique sequences recognizable by specific hybridization probes. To demonstrate the utility of our proteomics biomarker discovery technology, we applied it to a clinical study of chronic kidney disease (CKD). We identified two well known CKD biomarkers as well as an additional 58 potential CKD biomarkers. These results demonstrate the potential utility of our technology to discover unique protein signatures characteristic of various disease states. More generally, we describe a versatile and powerful tool that allows large-scale comparison of proteome profiles among discrete populations. This unbiased and highly multiplexed search engine will enable the discovery of novel biomarkers in a manner that is unencumbered by our incomplete knowledge of biology, thereby helping to advance the next generation of evidence-based medicine
    • …
    corecore