662 research outputs found

    Bayesian network structure learning using characteristic properties of permutation representations with applications to prostate cancer treatment.

    Get PDF
    Over the last decades, Bayesian Networks (BNs) have become an increasingly popular technique to model data under presence of uncertainty. BNs are probabilistic models that represent relationships between variables by means of a node structure and a set of parameters. Learning efficiently the structure that models a particular dataset is a NP-hard task that requires substantial computational efforts to be successful. Although there exist many families of techniques for this purpose, this thesis focuses on the study and improvement of search and score methods such as Evolutionary Algorithms (EAs). In the domain of BN structure learning, previous work has investigated the use of permutations to represent variable orderings within EAs. In this thesis, the characteristic properties of permutation representations are analysed and used in order to enhance BN structure learning. The thesis assesses well-established algorithms to provide a detailed analysis of the difficulty of learning BN structures using permutation representations. Using selected benchmarks, rugged and plateaued fitness landscapes are identified that result in a loss of population diversity throughout the search. The thesis proposes two approaches to handle the loss of diversity. First, the benefits of introducing the Island Model (IM) paradigm are studied, showing that diversity loss can be significantly reduced. Second, a novel agent-based metaheuristic is presented in which evolution is based on the use of several mutation operators and the definition of a distance metric in permutation spaces. The latter approach shows that diversity can be maintained throughout the search while exploring efficiently the solution space. In addition, the use of IM is investigated in the context of distributed data, a common property of real-world problems. Experiments prove that privacy can be preserved while learning BNs of high quality. Finally, using UK-wide data related to prostate cancer patients, the thesis assesses the general suitability of BNs alongside the proposed learning approaches for medical data modeling. Following comparisons with tools currently used in clinical settings and with alternative classifiers, it is shown that BNs can improve the predictive power of prostate cancer staging tools, a major concern in the field of urology

    RIDDLE: Race and ethnicity Imputation from Disease history with Deep LEarning

    Full text link
    Anonymized electronic medical records are an increasingly popular source of research data. However, these datasets often lack race and ethnicity information. This creates problems for researchers modeling human disease, as race and ethnicity are powerful confounders for many health exposures and treatment outcomes; race and ethnicity are closely linked to population-specific genetic variation. We showed that deep neural networks generate more accurate estimates for missing racial and ethnic information than competing methods (e.g., logistic regression, random forest). RIDDLE yielded significantly better classification performance across all metrics that were considered: accuracy, cross-entropy loss (error), and area under the curve for receiver operating characteristic plots (all p<10βˆ’6p < 10^{-6}). We made specific efforts to interpret the trained neural network models to identify, quantify, and visualize medical features which are predictive of race and ethnicity. We used these characterizations of informative features to perform a systematic comparison of differential disease patterns by race and ethnicity. The fact that clinical histories are informative for imputing race and ethnicity could reflect (1) a skewed distribution of blue- and white-collar professions across racial and ethnic groups, (2) uneven accessibility and subjective importance of prophylactic health, (3) possible variation in lifestyle, such as dietary habits, and (4) differences in background genetic variation which predispose to diseases

    OCDaf: Ordered Causal Discovery with Autoregressive Flows

    Full text link
    We propose OCDaf, a novel order-based method for learning causal graphs from observational data. We establish the identifiability of causal graphs within multivariate heteroscedastic noise models, a generalization of additive noise models that allow for non-constant noise variances. Drawing upon the structural similarities between these models and affine autoregressive normalizing flows, we introduce a continuous search algorithm to find causal structures. Our experiments demonstrate state-of-the-art performance across the Sachs and SynTReN benchmarks in Structural Hamming Distance (SHD) and Structural Intervention Distance (SID). Furthermore, we validate our identifiability theory across various parametric and nonparametric synthetic datasets and showcase superior performance compared to existing baselines
    • …
    corecore