625 research outputs found
Learning to Address Health Inequality in the United States with a Bayesian Decision Network
Life-expectancy is a complex outcome driven by genetic, socio-demographic,
environmental and geographic factors. Increasing socio-economic and health
disparities in the United States are propagating the longevity-gap, making it a
cause for concern. Earlier studies have probed individual factors but an
integrated picture to reveal quantifiable actions has been missing. There is a
growing concern about a further widening of healthcare inequality caused by
Artificial Intelligence (AI) due to differential access to AI-driven services.
Hence, it is imperative to explore and exploit the potential of AI for
illuminating biases and enabling transparent policy decisions for positive
social and health impact. In this work, we reveal actionable interventions for
decreasing the longevity-gap in the United States by analyzing a County-level
data resource containing healthcare, socio-economic, behavioral, education and
demographic features. We learn an ensemble-averaged structure, draw inferences
using the joint probability distribution and extend it to a Bayesian Decision
Network for identifying policy actions. We draw quantitative estimates for the
impact of diversity, preventive-care quality and stable-families within the
unified framework of our decision network. Finally, we make this analysis and
dashboard available as an interactive web-application for enabling users and
policy-makers to validate our reported findings and to explore the impact of
ones beyond reported in this work.Comment: 8 pages, 4 figures, 1 table (excluding the supplementary material),
accepted for publication in AAAI 201
bnstruct: an R package for Bayesian Network structure learning in the presence of missing data.
Abstract
Motivation
A Bayesian Network is a probabilistic graphical model that encodes probabilistic dependencies between a set of random variables. We introduce bnstruct, an open source R package to (i) learn the structure and the parameters of a Bayesian Network from data in the presence of missing values and (ii) perform reasoning and inference on the learned Bayesian Networks. To the best of our knowledge, there is no other open source software that provides methods for all of these tasks, particularly the manipulation of missing data, which is a common situation in practice.
Availability and Implementation
The software is implemented in R and C and is available on CRAN under a GPL licence.
Supplementary information
Supplementary data are available at Bioinformatics online
A temporal prognostic model based on dynamic Bayesian networks: mining medical insurance data
A prognostic model is a formal combination of multiple predictors from which risk probability of a specific diagnosis can be modelled for patients. Prognostic models have become essential instruments in medicine. The models are used for prediction purposes of guiding doctors to make a smart diagnosis, patient-specific decisions or help in planning the utilization of resources for patient groups who have similar prognostic paths. Dynamic Bayesian networks theoretically provide a very expressive and flexible model to solve temporal problems in medicine. However, this involves various challenges due both to the nature of the clinical domain, and the nature of the DBN modelling and inference process itself. The challenges from the clinical domain include insufficient knowledge of temporal interactions of processes in the medical literature, the sparse nature and variability of medical data collection, and the difficulty in preparing and abstracting clinical data in a suitable format without losing valuable information in the process. Challenges about the DBN methodology and implementation include the lack of tools that allow easy modelling of temporal processes. Overcoming this challenge will help to solve various clinical temporal reasoning problems. In this thesis, we addressed these challenges while building a temporal network with explanations of the effects of predisposing factors, such as age and gender, and the progression information of all diagnoses using claims data from an insurance company in Kenya. We showed that our network could differentiate the possible probability exposure to a diagnosis given the age and gender and possible paths given a patient's history. We also presented evidence that the more patient history is provided, the better the prediction of future diagnosis
Recommended from our members
Effective techniques for handling incomplete data using decision trees
Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete.
In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty.
The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions.
The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity.
The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets.
Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy
Open problems in causal structure learning: A case study of COVID-19 in the UK
Causal machine learning (ML) algorithms recover graphical structures that
tell us something about cause-and-effect relationships. The causal
representation praovided by these algorithms enables transparency and
explainability, which is necessary for decision making in critical real-world
problems. Yet, causal ML has had limited impact in practice compared to
associational ML. This paper investigates the challenges of causal ML with
application to COVID-19 UK pandemic data. We collate data from various public
sources and investigate what the various structure learning algorithms learn
from these data. We explore the impact of different data formats on algorithms
spanning different classes of learning, and assess the results produced by each
algorithm, and groups of algorithms, in terms of graphical structure, model
dimensionality, sensitivity analysis, confounding variables, predictive and
interventional inference. We use these results to highlight open problems in
causal structure learning and directions for future research. To facilitate
future work, we make all graphs, models, data sets, and source code publicly
available online
- …