130 research outputs found

    Decision Tree and Random Forest Methodology for Clustered and Longitudinal Binary Outcomes

    Get PDF
    Clustered binary outcomes are frequently encountered in medical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios (e.g. high dimensional data). In the first dissertation aim, we develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM. We propose a procedure akin to the expectation maximization algorithm, which iterates between developing a classification and regression tree using all predictors and developing a GLMM which includes indicator variables for terminal nodes from the tree as predictors along with a random effect for the clustering variable. Since prediction accuracy may be increased through ensemble methods, we extend BiMM tree methodology within the random forest setting in the second dissertation aim. BiMM forest combines random forest and GLMM within a unified framework using an algorithmic procedure which iterates between developing a random forest and using the predicted probabilities of observations from the random forest within a GLMM that contains a random effect for the clustering variable. Simulation studies show that BiMM tree and BiMM forest methodology offer similar or superior prediction accuracy compared to standard classification and regression tree, random forest and GLMM for clustered binary outcomes. The new BiMM methods are used to develop prediction models within the acute liver failure setting using the first seven days of hospital data for the third dissertation aim. Acute liver failure is a rare and devastating condition characterized by rapid onset of severe liver damage. The majority of prediction models developed for acute liver failure patients use admission data only, even though many clinical and laboratory variables are collected daily. The novel BiMM tree and forest methodology developed in this dissertation can be used in diverse research settings to provide highly accurate and efficient prediction models for clustered and longitudinal binary outcomes

    Deep Learning Applications for Biomedical Data and Natural Language Processing

    Get PDF
    The human brain can be seen as an ensemble of interconnected neurons, more or less specialized to solve different cognitive and motor tasks. In computer science, the term deep learning is often applied to signify sets of interconnected nodes, where deep means that they have several computational layers. Development of deep learning is essentially a quest to mimic how the human brain, at least partially, operates.In this thesis, I will use machine learning techniques to tackle two different domain of problems. The first is a problem in natural language processing. We improved classification of relations within images, using text associated with the pictures. The second domain is regarding heart transplant. We created models for pre- and post-transplant survival and simulated a whole transplantation queue, to be able to asses the impact of different allocation policies. We used deep learning models to solve these problems.As introduction to these problems, I will present the basic concepts of machine learning, how to represent data, how to evaluate prediction results, and how to create different models to predict values from data. Following that, I will also introduce the field of heart transplant and some information about simulation

    Partial order label decomposition approaches for melanoma diagnosis

    Get PDF
    Melanoma is a type of cancer that develops from the pigment-containing cells known as melanocytes. Usually occurring on the skin, early detection and diagnosis is strongly related to survival rates. Melanoma recognition is a challenging task that nowadays is performed by well trained dermatologists who may produce varying diagnosis due to the task complexity. This motivates the development of automated diagnosis tools, in spite of the inherent difficulties (intra-class variation, visual similarity between melanoma and non-melanoma lesions, among others). In the present work, we propose a system combining image analysis and machine learning to detect melanoma presence and severity. The severity is assessed in terms of melanoma thickness, which is measured by the Breslow index. Previous works mainly focus on the binary problem of detecting the presence of the melanoma. However, the system proposed in this paper goes a step further by also considering the stage of the lesion in the classification task. To do so, we extract 100 features that consider the shape, colour, pigment network and texture of the benign and malignant lesions. The problem is tackled as a five-class classification problem, where the first class represents benign lesions, and the remaining four classes represent the different stages of the melanoma (via the Breslow index). Based on the problem definition, we identify the learning setting as a partial order problem, in which the patterns belonging to the different melanoma stages present an order relationship, but where there is no order arrangement with respect to the benign lesions. Under this assumption about the class topology, we design several proposals to exploit this structure and improve data preprocessing. In this sense, we experimentally demonstrate that those proposals exploiting the partial order assumption achieve better performance than 12 baseline nominal and ordinal classifiers (including a deep learning model) which do not consider this partial order. To deal with class imbalance, we additionally propose specific over-sampling techniques that consider the structure of the problem for the creation of synthetic patterns. The experimental study is carried out with clinician-curated images from the Interactive Atlas of Dermoscopy, which eases reproducibility of experiments. Concerning the results obtained, in spite of having augmented the complexity of the classification problem with more classes, the performance of our proposals in the binary problem is similar to the one reported in the literature

    Data-Driven Decision Making in Healthcare

    Full text link
    The increasing availability of healthcare data has provided a great opportunity for the development of data-driven models to guide health policy and medical practice. The objective of this dissertation is to present new methods that use these data to make better healthcare decisions at a population and patient level. We first model the supply, demand, and allocation of organs for transplantation using data from the Organ Procurement and Transplantation Network and the US Census Bureau. Then, we introduce personalized treatment plans and genetic testing strategies for the management of cardiovascular diseases. We evaluate the clinical and policy implications of the treatment and testing strategies at a population level using data from the National Health and Nutrition Examination Survey. Lastly, we propose a modeling framework to consider physicians' judgment and patients' preferences in the implementation of treatment protocols. To illustrate how this method can be implemented in medical practice, we find ranges of near-optimal antihypertensive treatment choices for 16.72 million adults in the US. This research has the potential to improve healthcare practice by giving flexible and achievable guidelines to policymakers and medical professionals based on patient and population-level data.PHDIndustrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167908/1/wmarrero_1.pd
    corecore