263 research outputs found

    Extending the scope of pooled analyses of individual patient biomarker data from heterogeneous laboratory platforms and cohorts using merging algorithms

    Get PDF
    Background: A common challenge in medicine, exemplified in the analysis of biomarker data, is that large studies are needed for sufficient statistical power. Often, this may only be achievable by aggregating multiple cohorts. However, different studies may use disparate platforms for laboratory analysis, which can hinder merging. Methods: Using circulating placental growth factor (PIGF), a potential biomarker for hypertensive disorders of pregnancy (HDP) such as preeclampsia, as an example, we investigated how such issues can be overcome by inter-platform standardization and merging algorithms. We studied 16,462 pregnancies from 22 study cohorts. PIGF measurements (gestational age >= 20 weeks) analyzed on one of four platforms: R & Systems, Alere (R) Triage, Roche (R) Elecsys or Abbott (R) Architect, were available for 13,429 women. Two merging algorithms, using Z-Score and Multiple of Median transformations, were applied. Results: Best reference curves (BRC), based on merged, transformed PIGF measurements in uncomplicated pregnancy across six gestational age groups, were estimated. Identification of HDP by these PIGF-BRCS was compared to that of platform-specific curves. Conclusions: We demonstrate the feasibility of merging PIGF concentrations from different analytical platforms. Overall BRC identification of HDP performed at least as well as platform-specific curves. Our method can be extended to any set of biomarkers obtained from different laboratory platforms in any field. Merged biomarker data from multiple studies will improve statistical power and enlarge our understanding of the pathophysiology and management of medical syndromes. (C) 2015 International Society for the Study of Hypertension in Pregnancy. Published by Elsevier B.V. All rights reserved.Peer reviewe

    Multi-class gene expression biomarker panel identification for the diagnosis of paediatric febrile illness

    Get PDF
    Febrile illness in children can result from infections by diverse viral or bacterial pathogens as well as inflammatory conditions or cancer. The limitations of the existing diagnostic pipeline, which relies on clinical symptoms and signs, pathogen detection, empirical treatment and diagnoses of exclusion, contribute to missed or de- layed diagnosis and unnecessary antibiotic use. The potential of host gene expression biomarkers measured in blood has been demonstrated for simplified binary diagnostic questions however, the clinical reality is that multiple potential aetiologies must be considered and prioritised on the basis of likelihood and risks of severe disease. In order to identify a biomarker panel which better reflects this clinical reality, we applied a multi-class supervised learning approach to whole blood transcriptomic datasets from children with infectious and inflammatory disease. Three datasets were used for the analyses presented here, a single microarray dataset, a meta-analysis of 12 publicly available microarray datasets and a newly generated RNA-sequencing dataset. These were used for preliminary investigations of the approach, discovery of a multi-class biomarker panel of febrile illness and valida- tion of the biomarker panel respectively. In the merged microarray discovery dataset a two-stage approach to feature selection and classification, based on LASSO and Ridge penalised regression was applied to distinguish 18 disease classes. Cost-sensitivity was incorporated in the approach as aetiologies of febrile illness vary considerably in the risk of severe disease. The resulting 161 transcript biomarker panel could reliably distinguish bacterial, viral, inflammatory, tuberculosis and malarial disease as well as pathogen specific aetiologies. The panel was then validated in a newly generated RNA-Seq dataset and compared to previously published binary biomarker panels. The analyses presented here demonstrate that a single test for the diagnosis of acute febrile illness in children is possible using host RNA biomarkers. A test which could distinguish multiple aetiologies soon after presentation could be used to reduce unnecessary antibiotic use, improve targetting of antibiotics to bacterial species and reduce delays in the diagnosis of inflammatory diseases.Open Acces

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    The PRECISE (PREgnancy Care Integrating translational Science, Everywhere) database: open-access data collection in maternal and newborn health

    Get PDF
    In less-resourced settings, adverse pregnancy outcome rates are unacceptably high. To effect improvement, we need accurate epidemiological data about rates of death and morbidity, as well as social determinants of health and processes of care, and from each country (or region) to contextualise strategies. The PRECISE database is a unique core infrastructure of a generic, unified data collection platform. It is built on previous work in data harmonisation, outcome and data field standardisation, open-access software (District Health Information System 2 and the Baobab Laboratory Information Management System), and clinical research networks. The database contains globally-recommended indicators included in Health Management Information System recording and reporting forms. It comprises key outcomes (maternal and perinatal death), life-saving interventions (Human Immunodeficiency Virus testing, blood pressure measurement, iron therapy, uterotonic use after delivery, postpartum maternal assessment within 48 h of birth, and newborn resuscitation, immediate skin-to-skin contact, and immediate drying), and an additional 17 core administrative variables for the mother and babies. In addition, the database has a suite of additional modules for ‘deep phenotyping’ based on established tools. These include social determinants of health (including socioeconomic status, nutrition and the environment), maternal comorbidities, mental health, violence against women and health systems. The database has the potential to enable future high-quality epidemiological research integrated with clinical care and discovery bioscience

    “Rethinking High-Grade Serous Carcinoma: Development of new tools for deep tissue profiling”

    Get PDF
    Background: High-grade serous ovarian cancer (HGSOC) is the most frequently occurring and most fatal epithelial ovarian cancer (EOC) subtype. The reciprocal interplay of the different components encompassed within the tumour microenvironment (TME) are fundamental for tumour growth, advancement, and therapy response. It is therefore important to be able to deeply characterize the complex and diverse TME with multidimensional approaches. Aims: The main aim of this project was to establish novel multiparametric mass cytometry panels and thoroughly characterise the HGSOC TME. Methods: We first developed a novel 35-marker ovarian TME-based Cytometry by time-of-flight (CyTOF) panel (pan-tumour panel) and utilized it to examine the effects of six different tissue dissociation methods on cell surface antigen expression profiles in HGSOC tumour samples (Paper I). We further established an unique immune panel (pan-immune) for the detailed immunophenotyping of chemo-naïve HGSOC patients. The individual tumour immune microenvironments were characterized with tailored computational analysis (Paper II). With the use of an established merging algorithm— CyTOFmerge—the pan-tumour and pan-immune datasets were merged for a more in- depth immune delineation of the ten ovarian chemo-naïve TME profiles in addition to tumour and stromal cell phenotyping (Paper III). Results: We have established a novel ovarian TME-based CyTOF panel for HGSOC that is capable of delineating the immune, tumour, and stromal cells of the TME. Utilizing this panel, we demonstrated that, although the six tissue dissociation methods have a certain level of influence on the TME antigen expression profiles, inter-patient differences between the tumour samples are still clear. In addition, we identified a previously undescribed stem-like cell subset (Paper I). We have developed a unique 34-marker immune panel and have provided a detailed characterization of the ovarian tumour immune microenvironment of chemo-naïve patients. We identified a high degree of interpatient immune cell heterogenicity and discovered an abundance of conventional dendritic cells (DC), natural killer (NK) cells, and unassigned hematopoietic cells. Certain monocyte and dendritic cell (DC) clusters have shown prognostic relevance within the ovarian TME (Paper II). The merged dataset analysis revealed a new level of complexity with a more in-depth immune (myeloid cells) delineation in addition to tumour and stromal (fibroblast subsets) cell phenotypes. We identified an even higher degree of interpatient TME heterogenicity and a novel tumour cell metacluster, CD45-CD56-(EpCAM-FOLR1-CD24-). As a benefit of integrating the datasets, we identified even higher clinical associations (from 12 [pan-tumour dataset] to 20 [merged dataset]). Furthermore, most of these observed associations were majorly between PFS, OS, and infiltrating immune cell subsets (Paper III). Conclusions and consequences: (Paper I) In conclusion, the panel represents a promising profiling tool for the in-depth phenotyping of the HGSOC TME cell subsets. Although the tissue dissociation methods have influence on the TME antigen expression profiles, inter-patient differences are still clear. (Paper II) Our findings revealed a high degree of heterogeneity and identified phenotypic profiles that can be explored for use in HGSOC phenotypic profiling. (Paper III) Together, the merged sketching illustrates that comprehensive individual TME mapping for HGSOC patients can contribute to a better understanding each patient’s unique micromilieu given the need for more personalized treatment approaches.Doktorgradsavhandlin

    Development and application of a platform for harmonisation and integration of metabolomics data

    Get PDF
    Integrating diverse metabolomics data for molecular epidemiology analyses provides both opportuni- ties and challenges in the field of human health research. Combining patient cohorts may improve power and sensitivity of analyses but is challenging due to significant technical and analytical vari- ability. Additionally, current systems for the storage and analysis of metabolomics data suffer from scalability, query-ability, and integration issues that limit their adoption for molecular epidemiological research. Here, a novel platform for integrative metabolomics is developed, which addresses issues of storage, harmonisation, querying, scaling, and analysis of large-scale metabolomics data. Its use is demonstrated through an investigation of molecular trends of ageing in an integrated four-cohort dataset where the advantages and disadvantages of combining balanced and unbalanced cohorts are explored, and robust metabolite trends are successfully identified and shown to be concordant with previous studies.Open Acces

    Transparent reporting of multivariable prediction models developed or validated using clustered data (TRIPOD-Cluster): explanation and elaboration.

    Get PDF
    The TRIPOD-Cluster (transparent reporting of multivariable prediction models developed or validated using clustered data) statement comprises a 19 item checklist, which aims to improve the reporting of studies developing or validating a prediction model in clustered data, such as individual participant data meta-analyses (clustering by study) and electronic health records (clustering by practice or hospital). This explanation and elaboration document describes the rationale; clarifies the meaning of each item; and discusses why transparent reporting is important, with a view to assessing risk of bias and clinical usefulness of the prediction model. Each checklist item of the TRIPOD-Cluster statement is explained in detail and accompanied by published examples of good reporting. The document also serves as a reference of factors to consider when designing, conducting, and analysing prediction model development or validation studies in clustered data. To aid the editorial process and help peer reviewers and, ultimately, readers and systematic reviewers of prediction model studies, authors are recommended to include a completed checklist in their submission

    Incorporating standardised drift-tube ion mobility to enhance non-targeted assessment of the wine metabolome (LC×IM-MS)

    Get PDF
    Liquid chromatography with drift-tube ion mobility spectrometry-mass spectrometry (LCxIM-MS) is emerging as a powerful addition to existing LC-MS workflows for addressing a diverse range of metabolomics-related questions [1,2]. Importantly, excellent precision under repeatability and reproducibility conditions of drift-tube IM separations [3] supports the development of non-targeted approaches for complex metabolome assessment such as wine characterisation [4]. In this work, fundamentals of this new analytical metabolomics approach are introduced and application to the analysis of 90 authentic red and white wine samples originating from Macedonia is presented. Following measurements, intersample alignment of metabolites using non-targeted extraction and three-dimensional alignment of molecular features (retention time, collision cross section, and high-resolution mass spectra) provides confidence for metabolite identity confirmation. Applying a fingerprinting metabolomics workflow allows statistical assessment of the influence of geographic region, variety, and age. This approach is a state-of-the-art tool to assess wine chemodiversity and is particularly beneficial for the discovery of wine biomarkers and establishing product authenticity based on development of fingerprint libraries
    corecore