1,575 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Contributions on metric spaces with applications in personalized medicine
Esta tesis tiene como objetivo proponer nuevas
representaciones distribucionales y métodos estadísticos en
espacios métricos para modelar de forma eficaz los datos
procedentes de la monitorización continua de los pacientes
durante las actividades propias de su vida diaria. Proponemos
nuevas pruebas de hipótesis para datos emparejados, modelos
de regresión, algoritmos de cuantificación de la incertidumbre,
pruebas de independencia estadística y algoritmos de análisis
de conglomerados para las nuevas representaciones
distribucionales y otros objetos estadísticos complejos. Los
diferentes resultados recogidos a lo largo de la tesis muestran
las ventajas en términos de predicción, interpretabilidad y
capacidad de modelización de las nuevas propuestas frente a
los metodos existentes
Bayesian Nonparametric Methods For Causal Inference And Prediction
In this thesis we present novel approaches to regression and causal inference using popular Bayesian nonparametric methods. Bayesian Additive Regression Trees (BART) is a Bayesian machine learning algorithm in which the conditional distribution is modeled as a sum of regression trees. We extend BART into a semiparametric generalized linear model framework so that a portion of the covariates are modeled nonparametrically using BART and a subset of the covariates have parametric form. This presents an attractive option for research in which only a few covariates are of scientific interest but there are other covariates must be controlled for. Under certain causal assumptions, this model can be used as a structural mean model. We demonstrate this method by examining the effect of initiating certain antiretroviral medications has on mortality among HIV/HCV coinfected subjects. In later chapters, we propose a joint model for a continuous longitudinal outcome and baseline covariates using penalized splines and an enriched Dirichlet process (EDP) prior. This joint model decomposes into local linear mixed models for the outcome given the covariates and marginals for the covariates. The EDP prior that is placed on the regression parameters and the parameters on the covariates induces clustering among subjects determined by similarity in their regression parameters and nested within those clusters, sub-clusters based on similarity in the covariate space. When there are a large number of covariates, we find improved prediction over the same model with Dirichlet process (DP) priors. Since the model clusters based on regression parameters, this model also serves as a functional clustering algorithm where one does not have to choose the number of clusters beforehand. We use the method to estimate incidence rates of diabetes when longitudinal laboratory values from electronic health records are used to augment diagnostic codes for outcome identification. We later extend this work by using our EDP model in a causal inference setting using the parametric g-formula. We demonstrate this using electronic health record data consisting of subjects initiating second generation antipsychotics
Recommended from our members
Building trajectories through clinical data to model disease progression
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Clinical trials are typically conducted over a population within a defined time period
in order to illuminate certain characteristics of a health issue or disease process. These cross-sectional studies provide a snapshot of these disease processes over a large number of people but do not allow us to model the temporal nature of disease, which is essential for modeling detailed prognostic predictions. Longitudinal studies, on the other hand, are used to explore how these processes develop over time in a number of people but can be expensive and time-consuming, and many studies only cover a relatively small window within the disease process. This thesis describes the application of intelligent data analysis techniques for extracting information from time series generated by different diseases. The aim of this thesis is to identify intermediate stages
in a disease process and sub-categories of the disease exhibiting subtly different symptoms. It explores the use of a bootstrap technique that fits trajectories through the data generating “pseudo time-series”. It addresses issues including: how clinical variables interact as a disease progresses along the trajectories in the data; and how to automatically identify different disease states along these trajectories, as well as the transitions between them. The thesis documents how reliable time-series models can be created from large amounts of historical cross-sectional data and a novel relabling/latent variable approach has enabled the exploration of the temporal nature of disease progression. The proposed algorithms are tested extensively on simulated data and on three real clinical datasets. Finally, a study is carried out to explore whether we can “calibrate” pseudo time-series models with real longitudinal data in order to improve them. Plausible directions for future research are discussed at the end of the thesis
Machine Learning Techniques for Screening and Diagnosis of Diabetes: a Survey
Diabetes has become one of the major causes of national disease and death in most countries. By 2015, diabetes had affected more than 415 million people worldwide. According to the International Diabetes Federation report, this figure is expected to rise to more than 642 million in 2040, so early screening and diagnosis of diabetes patients have great significance in detecting and treating diabetes on time. Diabetes is a multifactorial metabolic disease, its diagnostic criteria is difficult to cover all the ethology, damage degree, pathogenesis and other factors, so there is a situation for uncertainty and imprecision under various aspects of medical diagnosis process. With the development of Data mining, researchers find that machine learning is playing an increasingly important role in diabetes research. Machine learning techniques can find the risky factors of diabetes and reasonable threshold of physiological parameters to unearth hidden knowledge from a huge amount of diabetes-related data, which has a very important significance for diagnosis and treatment of diabetes. So this paper provides a survey of machine learning techniques that has been applied to diabetes data screening and diagnosis of the disease. In this paper, conventional machine learning techniques are described in early screening and diagnosis of diabetes, moreover deep learning techniques which have a significance of biomedical effect are also described
- …