Search CORE

625 research outputs found

Learning to Address Health Inequality in the United States with a Bayesian Decision Network

Author: Chugh Samarth
Maheshwari Shubham
Mittal Anant
Sethi Tavpritesh
Publication venue
Publication date: 16/11/2018
Field of study

Life-expectancy is a complex outcome driven by genetic, socio-demographic, environmental and geographic factors. Increasing socio-economic and health disparities in the United States are propagating the longevity-gap, making it a cause for concern. Earlier studies have probed individual factors but an integrated picture to reveal quantifiable actions has been missing. There is a growing concern about a further widening of healthcare inequality caused by Artificial Intelligence (AI) due to differential access to AI-driven services. Hence, it is imperative to explore and exploit the potential of AI for illuminating biases and enabling transparent policy decisions for positive social and health impact. In this work, we reveal actionable interventions for decreasing the longevity-gap in the United States by analyzing a County-level data resource containing healthcare, socio-economic, behavioral, education and demographic features. We learn an ensemble-averaged structure, draw inferences using the joint probability distribution and extend it to a Bayesian Decision Network for identifying policy actions. We draw quantitative estimates for the impact of diversity, preventive-care quality and stable-families within the unified framework of our decision network. Finally, we make this analysis and dashboard available as an interactive web-application for enabling users and policy-makers to validate our reported findings and to explore the impact of ones beyond reported in this work.Comment: 8 pages, 4 figures, 1 table (excluding the supplementary material), accepted for publication in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

bnstruct: an R package for Bayesian Network structure learning in the presence of missing data.

Author: Alberto Franzin
Alberto Franzin
Barbara Di Camillo
Francesco Sambo
Publication venue
Publication date: 21/12/2016
Field of study

Abstract Motivation A Bayesian Network is a probabilistic graphical model that encodes probabilistic dependencies between a set of random variables. We introduce bnstruct, an open source R package to (i) learn the structure and the parameters of a Bayesian Network from data in the presence of missing values and (ii) perform reasoning and inference on the learned Bayesian Networks. To the best of our knowledge, there is no other open source software that provides methods for all of these tasks, particularly the manipulation of missing data, which is a common situation in practice. Availability and Implementation The software is implemented in R and C and is available on CRAN under a GPL licence. Supplementary information Supplementary data are available at Bioinformatics online

Open Access Repository

A temporal prognostic model based on dynamic Bayesian networks: mining medical insurance data

Author: Mbaka Sarah Kerubo
Publication venue: Department of Statistical Sciences
Publication date: 10/09/2021
Field of study

A prognostic model is a formal combination of multiple predictors from which risk probability of a specific diagnosis can be modelled for patients. Prognostic models have become essential instruments in medicine. The models are used for prediction purposes of guiding doctors to make a smart diagnosis, patient-specific decisions or help in planning the utilization of resources for patient groups who have similar prognostic paths. Dynamic Bayesian networks theoretically provide a very expressive and flexible model to solve temporal problems in medicine. However, this involves various challenges due both to the nature of the clinical domain, and the nature of the DBN modelling and inference process itself. The challenges from the clinical domain include insufficient knowledge of temporal interactions of processes in the medical literature, the sparse nature and variability of medical data collection, and the difficulty in preparing and abstracting clinical data in a suitable format without losing valuable information in the process. Challenges about the DBN methodology and implementation include the lack of tools that allow easy modelling of temporal processes. Overcoming this challenge will help to solve various clinical temporal reasoning problems. In this thesis, we addressed these challenges while building a temporal network with explanations of the effects of predisposing factors, such as age and gender, and the progression information of all diagnoses using claims data from an insurance company in Kenya. We showed that our network could differentiate the possible probability exposure to a diagnosis given the age and gender and possible paths given a patient's history. We also presented evidence that the more patient history is provided, the better the prediction of future diagnosis

Cape Town University OpenUCT

Bayesian and non-Bayesian techniques applied to censored survival data with missing values

Author: Andersen Morten Nonboe
Publication venue
Publication date: 01/11/2007
Field of study

Online Research Database In Technology

Recommended from our members

Effective techniques for handling incomplete data using decision trees

Author: Twala Bhekisipho E.T.H.
Publication venue
Publication date: 01/01/2005
Field of study

Decision Trees (DTs) have been recognized as one of the most successful formalisms for knowledge representation and reasoning and are currently applied to a variety of data mining or knowledge discovery applications, particularly for classification problems. There are several efficient methods to learn a DT from data. However, these methods are often limited to the assumption that data are complete. In this thesis, some contributions to the field of machine learning and statistics that solve the problem of extracting DTs for learning and classification tasks from incomplete databases are presented. The methodology underlying the thesis blends together well-established statistical theories with the most advanced techniques for machine learning and automated reasoning with uncertainty. The first contribution is the extensive simulations which study the impact of missing data on predictive accuracy of existing DTs which can cope with missing values, when missing values are in both the training and test sets or when they are in either of the two sets. All simulations are performed under missing completely at random, missing at random and informatively missing mechanisms and for different missing data patterns and proportions. The proposal of a simple, novel, yet effective proposed procedure for training and testing using decision trees in the presence of missing data is the next contribution. Original and simple splitting criteria for attribute selection in tree building are put forward. The proposed technique is evaluated and validated in empirical tests over many real world application domains. In this work, the proposed algorithm maintains (sometimes exceeds) the outstanding accuracy of multiple imputation, especially on datasets containing mixed attributes and purely nominal attributes. Also, the proposed algorithm greatly improves in accuracy for IM data. Another major advantage of this method over multiple imputation is the important saving in computational resources due to it simplicity. The next contribution is the proposal of three versions of simple probabilistic techniques that could be used for classifying incomplete vectors using decision trees based on complete data. The proposed procedure is superficially similar to that of fractional cases but more effective. The experimental results demonstrate that these approaches can achieve comparative quality to sophisticated algorithms like multiple imputation and therefore are applicable to all kinds of datasets. Finally, novel uses of two proposed ensemble procedures for handling incomplete training and test data are proposed and discussed. The algorithms combine the two best approaches either with resampling (REMIMIA) or without resampling (EMIMIA) of the training data before growing the decision trees. Experiments are used to evaluate and validate the success of the proposed ensemble methods with respect to individual missing data techniques in the form of empirical tests. EMIMIA attains the highest overall level of prediction accuracy

Open Research Online (The Open University)

Open problems in causal structure learning: A case study of COVID-19 in the UK

Author: Chobtham Kiattikun
Constantinou Anthony
Hashemzadeh Arian
Kitson Neville K.
Liu Yang
Mbuvha Rendani
Nanavati Praharsh A.
Petrungaro Bruno
Publication venue
Publication date: 06/09/2023
Field of study

Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online

arXiv.org e-Print Archive