1,575 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Contributions on metric spaces with applications in personalized medicine

    Get PDF
    Esta tesis tiene como objetivo proponer nuevas representaciones distribucionales y métodos estadísticos en espacios métricos para modelar de forma eficaz los datos procedentes de la monitorización continua de los pacientes durante las actividades propias de su vida diaria. Proponemos nuevas pruebas de hipótesis para datos emparejados, modelos de regresión, algoritmos de cuantificación de la incertidumbre, pruebas de independencia estadística y algoritmos de análisis de conglomerados para las nuevas representaciones distribucionales y otros objetos estadísticos complejos. Los diferentes resultados recogidos a lo largo de la tesis muestran las ventajas en términos de predicción, interpretabilidad y capacidad de modelización de las nuevas propuestas frente a los metodos existentes

    Bayesian Nonparametric Methods For Causal Inference And Prediction

    Get PDF
    In this thesis we present novel approaches to regression and causal inference using popular Bayesian nonparametric methods. Bayesian Additive Regression Trees (BART) is a Bayesian machine learning algorithm in which the conditional distribution is modeled as a sum of regression trees. We extend BART into a semiparametric generalized linear model framework so that a portion of the covariates are modeled nonparametrically using BART and a subset of the covariates have parametric form. This presents an attractive option for research in which only a few covariates are of scientific interest but there are other covariates must be controlled for. Under certain causal assumptions, this model can be used as a structural mean model. We demonstrate this method by examining the effect of initiating certain antiretroviral medications has on mortality among HIV/HCV coinfected subjects. In later chapters, we propose a joint model for a continuous longitudinal outcome and baseline covariates using penalized splines and an enriched Dirichlet process (EDP) prior. This joint model decomposes into local linear mixed models for the outcome given the covariates and marginals for the covariates. The EDP prior that is placed on the regression parameters and the parameters on the covariates induces clustering among subjects determined by similarity in their regression parameters and nested within those clusters, sub-clusters based on similarity in the covariate space. When there are a large number of covariates, we find improved prediction over the same model with Dirichlet process (DP) priors. Since the model clusters based on regression parameters, this model also serves as a functional clustering algorithm where one does not have to choose the number of clusters beforehand. We use the method to estimate incidence rates of diabetes when longitudinal laboratory values from electronic health records are used to augment diagnostic codes for outcome identification. We later extend this work by using our EDP model in a causal inference setting using the parametric g-formula. We demonstrate this using electronic health record data consisting of subjects initiating second generation antipsychotics

    Machine Learning Techniques for Screening and Diagnosis of Diabetes: a Survey

    Get PDF
    Diabetes has become one of the major causes of national disease and death in most countries. By 2015, diabetes had affected more than 415 million people worldwide. According to the International Diabetes Federation report, this figure is expected to rise to more than 642 million in 2040, so early screening and diagnosis of diabetes patients have great significance in detecting and treating diabetes on time. Diabetes is a multifactorial metabolic disease, its diagnostic criteria is difficult to cover all the ethology, damage degree, pathogenesis and other factors, so there is a situation for uncertainty and imprecision under various aspects of medical diagnosis process. With the development of Data mining, researchers find that machine learning is playing an increasingly important role in diabetes research. Machine learning techniques can find the risky factors of diabetes and reasonable threshold of physiological parameters to unearth hidden knowledge from a huge amount of diabetes-related data, which has a very important significance for diagnosis and treatment of diabetes. So this paper provides a survey of machine learning techniques that has been applied to diabetes data screening and diagnosis of the disease. In this paper, conventional machine learning techniques are described in early screening and diagnosis of diabetes, moreover deep learning techniques which have a significance of biomedical effect are also described
    corecore