1,571 research outputs found

    Machine learning and computational methods to identify molecular and clinical markers for complex diseases – case studies in cancer and obesity

    Get PDF
    In biomedical research, applied machine learning and bioinformatics are the essential disciplines heavily involved in translating data-driven findings into medical practice. This task is especially accomplished by developing computational tools and algorithms assisting in detection and clarification of underlying causes of the diseases. The continuous advancements in high-throughput technologies coupled with the recently promoted data sharing policies have contributed to presence of a massive wealth of data with remarkable potential to improve human health care. In concordance with this massive boost in data production, innovative data analysis tools and methods are required to meet the growing demand. The data analyzed by bioinformaticians and computational biology experts can be broadly divided into molecular and conventional clinical data categories. The aim of this thesis was to develop novel statistical and machine learning tools and to incorporate the existing state-of-the-art methods to analyze bio-clinical data with medical applications. The findings of the studies demonstrate the impact of computational approaches in clinical decision making by improving patients risk stratification and prediction of disease outcomes. This thesis is comprised of five studies explaining method development for 1) genomic data, 2) conventional clinical data and 3) integration of genomic and clinical data. With genomic data, the main focus is detection of differentially expressed genes as the most common task in transcriptome profiling projects. In addition to reviewing available differential expression tools, a data-adaptive statistical method called Reproducibility Optimized Test Statistic (ROTS) is proposed for detecting differential expression in RNA-sequencing studies. In order to prove the efficacy of ROTS in real biomedical applications, the method is used to identify prognostic markers in clear cell renal cell carcinoma (ccRCC). In addition to previously known markers, novel genes with potential prognostic and therapeutic role in ccRCC are detected. For conventional clinical data, ensemble based predictive models are developed to provide clinical decision support in treatment of patients with metastatic castration resistant prostate cancer (mCRPC). The proposed predictive models cover treatment and survival stratification tasks for both trial-based and realworld patient cohorts. Finally, genomic and conventional clinical data are integrated to demonstrate the importance of inclusion of genomic data in predictive ability of clinical models. Again, utilizing ensemble-based learners, a novel model is proposed to predict adulthood obesity using both genetic and social-environmental factors. Overall, the ultimate objective of this work is to demonstrate the importance of clinical bioinformatics and machine learning for bio-clinical marker discovery in complex disease with high heterogeneity. In case of cancer, the interpretability of clinical models strongly depends on predictive markers with high reproducibility supported by validation data. The discovery of these markers would increase chance of early detection and improve prognosis assessment and treatment choice

    Doctor of Philosophy

    Get PDF
    dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level

    Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

    Get PDF
    Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset

    Precision Oncology, Artificial Intelligence, and Novel Therapeutic Advancements in the Diagnosis, Prevention, and Treatment of Cancer: Highlights from the 59th Irish Association for Cancer Research (IACR) Annual Conference

    Get PDF
    Advancements in oncology, especially with the era of precision oncology, is resulting in a paradigm shift in cancer care. Indeed, innovative technologies, such as artificial intelligence, are paving the way towards enhanced diagnosis, prevention, and personalised treatments as well as novel drug discoveries. Despite excellent progress, the emergence of resistant cancers has curtailed both the pace and extent to which we can advance. By combining both their understanding of the fundamental biological mechanisms and technological advancements such as artificial intelligence and data science, cancer researchers are now beginning to address this. Together, this will revolutionise cancer care, by enhancing molecular interventions that may aid cancer prevention, inform clinical decision making, and accelerate the development of novel therapeutic drugs. Here, we will discuss the advances and approaches in both artificial intelligence and precision oncology, presented at the 59th Irish Association for Cancer Research annual conference

    Translational Bioinformatics for Human Reproductive Biology Research: Examples, Opportunities and Challenges for a Future Reproductive Medicine

    Get PDF
    Since 1978, with the first IVF (in vitro fertilization) baby birth in Manchester (England), more than eight million IVF babies have been born throughout the world, and many new techniques and discoveries have emerged in reproductive medicine. To summarize the modern technology and progress in reproductive medicine, all scientific papers related to reproductive medicine, especially papers related to reproductive translational medicine, were fully searched, manually curated and reviewed. Results indicated whether male reproductive medicine or female reproductive medicine all have made significant progress, and their markers have experienced the progress from karyotype analysis to single-cell omics. However, due to the lack of comprehensive databases, especially databases collecting risk exposures, disease markers and models, prevention drugs and effective treatment methods, the application of the latest precision medicine technologies and methods in reproductive medicine is limited.This research was funded by Project of Natural Science Foundation of Gansu Province (20JR5RA363); Project of Gansu Provincial Education Department (2020B-003)

    Metodi statistici per la stima di profili di rischio personalizzati basati sulla medicina di precisione del cancro nei pazienti oncologici

    Get PDF
    Precision medicine is beginning to emerge as a well-defined discipline with specific goals, areas of focus, and tailored methodology. Specifically, the primary goal is to discover treatment rules that leverage heterogeneity to improve clinical decision making in a manner that is reproducible, generalizable, and adaptable as needed. This endeavor spans a broad range of scientific areas including drug discovery, genetics/genomics, health communication, and causal inference, all in support of evidence-based, i.e., data-driven, decision making. Precision Medicine allows patients to be discriminated according to their level of risk (e.g. low or high) and identifies subgroups of patients according to their characteristics in order to assign the treatment to those who are likely to benefit. Statistics research in precision medicine is broadly focused on methodological development for estimation of and inference for treatment regimens that maximize some cumulative clinical outcome. The process for using statistical inference to establish personalized treatment strategies requires specific techniques for data-analysis that optimize the combination of competing therapies with candidate genetic features and characteristics of the patient and disease. The present dissertation focuses on the implementation and application of statistical methods for establishing optimal treatment rules for personalized medicine and discuss specific examples in various medical contexts with oncology as an emphasis. I have focused my research activity mainly in the study of the following topics. 1) Statistical methods to analyze continuous biomarkers. Several approaches were considered according to the design of study: from classical approach - median or mean value, percentiles, optimal cut-point identified by means standard receiver operating characteristic (ROC) analysis-to more complex analysis - time-dependent ROC, conditional inferential tree and subpopulation Treatment Effect Pattern (STEPP) method. 2) Statistical methods for time-to-event endpoints. Competing risks occur commonly in medical research. In the analysis of competing risks data, methods of standard survival analysis lead to incorrect and biased results. In the presence of competing risks, data analysis has to be performed including methods to calculate the cumulative incidence of an event of interest, to compare cumulative incidence curves in the presence of competing risks, and to perform competing risks regression analysis. 3) Meta-analysis for synthesizing evidence. 4) An important topic reviews to use of several statistical methods that handle the issue of treatment switching. The contribution aims at assessing tamoxifen treatment effect taking into account treatment switches, in order to provide a robust assessment of treatment effect applying causal inference methods. 5) The last topic deals with the use of population-based registry and administrative databases. The objective of this project is to develop an acceptable claims-based algorithm to identify second breast cancer events during a 10-year follow-up through a record-linkage of two data sources:the Friuli Venezia Giulia population based-cancer registry and the administrative individual-record FVG database.Precision medicine is beginning to emerge as a well-defined discipline with specific goals, areas of focus, and tailored methodology. Specifically, the primary goal is to discover treatment rules that leverage heterogeneity to improve clinical decision making in a manner that is reproducible, generalizable, and adaptable as needed. This endeavor spans a broad range of scientific areas including drug discovery, genetics/genomics, health communication, and causal inference, all in support of evidence-based, i.e., data-driven, decision making. Precision Medicine allows patients to be discriminated according to their level of risk (e.g. low or high) and identifies subgroups of patients according to their characteristics in order to assign the treatment to those who are likely to benefit. Statistics research in precision medicine is broadly focused on methodological development for estimation of and inference for treatment regimens that maximize some cumulative clinical outcome. The process for using statistical inference to establish personalized treatment strategies requires specific techniques for data-analysis that optimize the combination of competing therapies with candidate genetic features and characteristics of the patient and disease. The present dissertation focuses on the implementation and application of statistical methods for establishing optimal treatment rules for personalized medicine and discuss specific examples in various medical contexts with oncology as an emphasis. I have focused my research activity mainly in the study of the following topics. 1) Statistical methods to analyze continuous biomarkers. Several approaches were considered according to the design of study: from classical approach - median or mean value, percentiles, optimal cut-point identified by means standard receiver operating characteristic (ROC) analysis-to more complex analysis - time-dependent ROC, conditional inferential tree and subpopulation Treatment Effect Pattern (STEPP) method. 2) Statistical methods for time-to-event endpoints. Competing risks occur commonly in medical research. In the analysis of competing risks data, methods of standard survival analysis lead to incorrect and biased results. In the presence of competing risks, data analysis has to be performed including methods to calculate the cumulative incidence of an event of interest, to compare cumulative incidence curves in the presence of competing risks, and to perform competing risks regression analysis. 3) Meta-analysis for synthesizing evidence. 4) An important topic reviews to use of several statistical methods that handle the issue of treatment switching. The contribution aims at assessing tamoxifen treatment effect taking into account treatment switches, in order to provide a robust assessment of treatment effect applying causal inference methods. 5) The last topic deals with the use of population-based registry and administrative databases. The objective of this project is to develop an acceptable claims-based algorithm to identify second breast cancer events during a 10-year follow-up through a record-linkage of two data sources:the Friuli Venezia Giulia population based-cancer registry and the administrative individual-record FVG database
    • …
    corecore