101 research outputs found

    Developing genomic models for cancer prevention and treatment stratification

    Get PDF
    Malignant tumors remain one of the leading causes of mortality with over 8.2 million deaths worldwide in 2012. Over the last two decades, high-throughput profiling of the human transcriptome has become an essential tool to investigate molecular processes involved in carcinogenesis. In this thesis I explore how gene expression profiling (GEP) can be used in multiple aspects of cancer research, including prevention, patient stratification and subtype discovery. The first part details how GEP could be used to supplement or even replace the current gold standard assay for testing the carcinogenic potential of chemicals. This toxicogenomic approach coupled with a Random Forest algorithm allowed me to build models capable of predicting carcinogenicity with an area under the curve of up to 86.8% and provided valuable insights into the underlying mechanisms that may contribute to cancer development. The second part describes how GEP could be used to stratify heterogeneous populations of lymphoma patients into therapeutically relevant disease sub-classes, with a particular focus on diffuse large B-cell lymphoma (DLBCL). Here, I successfully translated established biomarkers from the Affymetrix platform to the clinically relevant Nanostring nCounter© assay. This translation allowed us to profile custom sets of transcripts from formalin-fixed samples, transforming these biomarkers into clinically relevant diagnostic tools. Finally, I describe my effort to discover tumor samples dependent on altered metabolism driven by oxidative phosphorylation (OxPhos) across multiple tissue types. This work was motivated by previous studies that identified a therapeutically relevant OxPhos sub-type in DLBCL, and by the hypothesis that this stratification might be applicable to other solid tumor types. To that end, I carried out a transcriptomics-based pan-cancer analysis, derived a generalized PanOxPhos gene signature, and identified mTOR as a potential regulator in primary tumor samples. High throughput GEP coupled with statistical machine learning methods represent an important toolbox in modern cancer research. It provides a cost effective and promising new approach for predicting cancer risk associated to chemical exposure, it can reduce the cost of the ever increasing drug development process by identifying therapeutically actionable disease subtypes, and it can increase patients’ survival by matching them with the most effective drugs.2016-12-01T00:00:00

    Developing tools for determination of parameters involved in COâ‚‚ based EOR methods

    Get PDF
    To mitigate the effects of climate change, CO₂ reduction strategies are suggested to lower anthropogenic emissions of greenhouse gasses owing to the use of fossil fuels. Consequently, the application of CO₂ based enhanced oil recovery methods (EORs) through petroleum reservoirs turn into the hot topic among the oil and gas researchers. This thesis includes two sections. In the first section, we developed deterministic tools for determination of three parameters which are important in CO₂ injection performance including minimum miscible pressure (MMP), equilibrium ratio (Kᵢ), and a swelling factor of oil in the presence of CO₂. For this purposes, we employed two inverse based methods including gene expression programming (GEP), and least square support vector machine (LSSVM). In the second part, we developed an easy-to-use, cheap, and robust data-driven based proxy model to determine the performance of CO₂ based EOR methods. In this section, we have to determine the input parameters and perform sensitivity analysis on them. Next step is designing the simulation runs and determining the performance of CO₂ injection in terms of technical viewpoint (recovery factor, RF). Finally, using the outputs gained from reservoir simulators and applying LSSVM method, we are going to develop the data-driven based proxy model. The proxy model can be considered as an alternative model to determine the efficiency of CO₂ based EOR methods in oil reservoir when the required experimental data are not available or accessible

    Towards Interpretable Machine Learning in Medical Image Analysis

    Get PDF
    Over the past few years, ML has demonstrated human expert level performance in many medical image analysis tasks. However, due to the black-box nature of classic deep ML models, translating these models from the bench to the bedside to support the corresponding stakeholders in the desired tasks brings substantial challenges. One solution is interpretable ML, which attempts to reveal the working mechanisms of complex models. From a human-centered design perspective, interpretability is not a property of the ML model but an affordance, i.e., a relationship between algorithm and user. Thus, prototyping and user evaluations are critical to attaining solutions that afford interpretability. Following human-centered design principles in highly specialized and high stakes domains, such as medical image analysis, is challenging due to the limited access to end users. This dilemma is further exacerbated by the high knowledge imbalance between ML designers and end users. To overcome the predicament, we first define 4 levels of clinical evidence that can be used to justify the interpretability to design ML models. We state that designing ML models with 2 levels of clinical evidence: 1) commonly used clinical evidence, such as clinical guidelines, and 2) iteratively developed clinical evidence with end users are more likely to design models that are indeed interpretable to end users. In this dissertation, we first address how to design interpretable ML in medical image analysis that affords interpretability with these two different levels of clinical evidence. We further highly recommend formative user research as the first step of the interpretable model design to understand user needs and domain requirements. We also indicate the importance of empirical user evaluation to support transparent ML design choices to facilitate the adoption of human-centered design principles. All these aspects in this dissertation increase the likelihood that the algorithms afford interpretability and enable stakeholders to capitalize on the benefits of interpretable ML. In detail, we first propose neural symbolic reasoning to implement public clinical evidence into the designed models for various routinely performed clinical tasks. We utilize the routinely applied clinical taxonomy for abnormality classification in chest x-rays. We also establish a spleen injury grading system by strictly following the clinical guidelines for symbolic reasoning with the detected and segmented salient clinical features. Then, we propose the entire interpretable pipeline for UM prognostication with cytopathology images. We first perform formative user research and found that pathologists believe cell composition is informative for UM prognostication. Thus, we build a model to analyze cell composition directly. Finally, we conduct a comprehensive user study to assess the human factors of human-machine teaming with the designed model, e.g., whether the proposed model indeed affords interpretability to pathologists. The human-centered design process is proven to be truly interpretable to pathologists for UM prognostication. All in all, this dissertation introduces a comprehensive human-centered design for interpretable ML solutions in medical image analysis that affords interpretability to end users

    The best treatment for every patient: New algorithms to predict treatment benefit in cancer using genomics and transcriptomics

    Get PDF
    Many cancer drugs only benefit a subset of the patients that receive them. Because these drugs are often associated with serious side effects, it is very important to be able to predict who will benefit and who will not. This thesis presents several algorithms that can build models that can predict whether a patient will benefit more from a drug of interest than an alternative treatment. We show these algorithms can be used for various types of cancer and different datatype

    Classification of clinical outcomes using high-throughput and clinical informatics.

    Get PDF
    It is widely recognized that many cancer therapies are effective only for a subset of patients. However clinical studies are most often powered to detect an overall treatment effect. To address this issue, classification methods are increasingly being used to predict a subset of patients which respond differently to treatment. This study begins with a brief history of classification methods with an emphasis on applications involving melanoma. Nonparametric methods suitable for predicting subsets of patients responding differently to treatment are then reviewed. Each method has different ways of incorporating continuous, categorical, clinical and high-throughput covariates. For nonparametric and parametric methods, distance measures specific to the method are used to make classification decisions. Approaches are outlined which employ these distances to measure treatment interactions and predict patients more sensitive to treatment. Simulations are also carried out to examine empirical power of some of these classification methods in an adaptive signature design. Results were compared with logistic regression models. It was found that parametric and nonparametric methods performed reasonably well. Relative performance of the methods depends on the simulation scenario. Finally a method was developed to evaluate power and sample size needed for an adaptive signature design in order to predict the subset of patients sensitive to treatment. It is hoped that this study will stimulate more development of nonparametric and parametric methods to predict subsets of patients responding differently to treatment

    Advances in Binders for Construction Materials

    Get PDF
    The global binder production for construction materials is approximately 7.5 billion tons per year, contributing ~6% to the global anthropogenic atmospheric CO2 emissions. Reducing this carbon footprint is a key aim of the construction industry, and current research focuses on developing new innovative ways to attain more sustainable binders and concrete/mortars as a real alternative to the current global demand for Portland cement.With this aim, several potential alternative binders are currently being investigated by scientists worldwide, based on calcium aluminate cement, calcium sulfoaluminate cement, alkali-activated binders, calcined clay limestone cements, nanomaterials, or supersulfated cements. This Special Issue presents contributions that address research and practical advances in i) alternative binder manufacturing processes; ii) chemical, microstructural, and structural characterization of unhydrated binders and of hydrated systems; iii) the properties and modelling of concrete and mortars; iv) applications and durability of concrete and mortars; and v) the conservation and repair of historic concrete/mortar structures using alternative binders.We believe this Special Issue will be of high interest in the binder industry and construction community, based upon the novelty and quality of the results and the real potential application of the findings to the practice and industry

    Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review

    Full text link
    Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.Comment: 93 pages, 18 figures, under revie

    Archives of Data Science, Series A. Vol. 1,1: Special Issue: Selected Papers of the 3rd German-Polish Symposium on Data Analysis and Applications

    Get PDF
    The first volume of Archives of Data Science, Series A is a special issue of a selection of contributions which have been originally presented at the {\em 3rd Bilateral German-Polish Symposium on Data Analysis and Its Applications} (GPSDAA 2013). All selected papers fit into the emerging field of data science consisting of the mathematical sciences (computer science, mathematics, operations research, and statistics) and an application domain (e.g. marketing, biology, economics, engineering)

    Interpretable methods in cancer diagnostics

    Get PDF
    Cancer is a hard problem. It is hard for the patients, for the doctors and nurses, and for the researchers working on understanding the disease and finding better treatments for it. The challenges faced by a pathologist diagnosing the disease for a patient is not necessarily the same as the ones faced by cell biologists working on experimental treatments and understanding the fundamentals of cancer. In this thesis we work on different challenges faced by both of the above teams. This thesis first presents methods to improve the analysis of the flow cy- tometry data used frequently in the diagnosis process, specifically for the two subtypes of non-Hodgkin Lymphoma which are our focus: Follicular Lymphoma and Diffuse Large B Cell Lymphoma. With a combination of concepts from graph theory, dynamic programming, and machine learning, we present methods to improve the diagnosis process and the analysis of the abovementioned data. The interpretability of the method helps a pathologist to better understand a patient’s disease, which itself improves their choices for a treatment. In the second part, we focus on the analysis of DNA-methylation and gene expression data, both of which presenting the challenge of being very high dimen- sional yet with a few number of samples comparatively. We present an ensemble model which adapts to different patterns seen in each given data, in order to adapt to noise and batch effects. At the same time, the interpretability of our model helps a pathologist to better find and tune the treatment for the patient: a step further towards personalized medicine.Krebs ist ein schweres Problem. Es ist schwer für die Patienten, für die Ärzte und Krankenschwestern und für die Forscher, die daran arbeiten, die Krankheit zu verstehen und eine bessere Behandlung dafür zu finden. Die Herausforderungen, mit denen ein Pathologe konfrontiert ist, um die Krankheit eines Patienten zu diagnostizieren, müssen nicht die gleichen sein, mit denen Zellbiologen konfrontiert sind, die an experimentellen Behandlungen arbeiten und die Grundlagen von Krebs verstehen. In dieser Arbeit beschäftigen wir uns mit verschiedenen Herausforderungen, denen sich beide oben genannten Teams stellen. In dieser Arbeit werden zunächst Methoden vorgestellt, um die Analyse der im Diagnoseverfahren häufig verwendeten Durchflusszytometriedaten zu verbessern, insbesondere für die beiden Subtypen des Non-Hodgkin-Lymphoms, auf die wir uns konzentrieren: das follikuläre Lymphom und das diffuse großzellige B-Zell-Lymphom. Mit einer Kombination von Konzepten aus Graphentheorie, dynamischer Programmierung und künstliche Intelligenz präsentieren wir Methoden zur Verbesserung des Diagnoseprozesses und der Analyse der oben genannten Daten. Die Interpretierbarkeit der Methode hilft einem Pathologen, die Apatientenkrankheit besser zu verstehen, was wiederum seine Wahlmöglichkeiten für eine Behandlung verbessert. Im zweiten Teil konzentrieren wir uns auf die Analyse von DNA-Methylierungsund Genexpressionsdaten, die beide die Herausforderung darstellen, sehr hochdimensional zu sein, jedoch mit nur wenigen Proben im Vergleich.Wir präsentieren ein Zusammenstellungsmodell, das sich an unterschiedliche Muster anpasst, die in den jeweiligen Daten zu sehen sind, um sich an Rauschen und Batch-Effekte anzupassen. Gleichzeitig hilft die Interpretierbarkeit unseres Modells einem Pathologen, die Behandlung für den Patienten besser zu finden und abzustimmen: ein Schritt weiter in Richtung personalisierter Medizin
    • …
    corecore