859 research outputs found
Impact of Bayesian network model structure on the accuracy of medical diagnostic systems
While Bayesian network models may contain a handful of numerical parameters that are important for their quality, several empirical studies have confirmed that overall precision of their probabilities is not crucial. In this paper, we study the impact of the structure of a Bayesian network on the precision of medical diagnostic systems. We show that also the structure is not that important - diagnostic accuracy of several medical diagnostic models changes minimally when we subject their structures to such transformations as arc removal and arc reversal. © 2014 Springer International Publishing
Cost-Sensitive Decision Tree with Multiple Resource Constraints
Resource constraints are commonly found in classification tasks. For example, there could be a budget limit on implementation and a deadline for finishing the classification task. Applying the top-down approach for tree induction in this situation may have significant drawbacks. In particular, it is difficult, especially in an early stage of tree induction, to assess an attribute’s contribution to improving the total implementation cost and its impact on attribute selection in later stages because of the deadline constraint. To address this problem, we propose an innovative algorithm, namely, the Cost-Sensitive Associative Tree (CAT) algorithm. Essentially, the algorithm first extracts and retains association classification rules from the training data which satisfy resource constraints, and then uses the rules to construct the final decision tree. The approach has advantages over the traditional top-down approach, first because only feasible classification rules are considered in the tree induction and, second, because their costs and resource use are known. In contrast, in the top-down approach, the information is not available for selecting splitting attributes. The experiment results show that the CAT algorithm significantly outperforms the top-down approach and adapts very well to available resources.Cost-sensitive learning, mining methods and algorithms, decision trees
Impact of Quality of Bayesian Network Parameters on Accuracy of Medical Diagnostic Systems
While most knowledge engineers believe that the quality of results obtained by means of Bayesian networks is not too sensitive to imprecision in probabilities, this remains a conjecture with only modest empirical support. We summarize the results of several previously presented experiments involving Hepar II model, in which we manipulated the quality of the model's numerical parameters and checked the impact of these manipulations on the model's accuracy. The chief contribution of this paper are results of replicating our experiments on several medical diagnostic models derived from data sets available at the Irvine Machine Learning Repository. We show that the results of our experiments are qualitatively identical to those obtained earlier with Hepar II
Interpretable Narrative Explanation for ML Predictors with LP: A Case Study for XAI
In the era of digital revolution, individual lives are going to cross and interconnect ubiquitous online domains and offline reality based on smart technologies\u2014discovering, storing, processing, learning, analysing, and predicting from huge amounts of environment-collected data. Sub-symbolic techniques, such as deep learning, play a key role there, yet they are often built as black boxes, which are not inspectable, interpretable, explainable. New research efforts towards explainable artificial intelligence (XAI) are trying to address those issues, with the final purpose of building understandable, accountable, and trustable AI systems\u2014still, seemingly with a long way to go. Generally speaking, while we fully understand and appreciate the power of sub-symbolic approaches, we believe that symbolic approaches to machine intelligence, once properly combined with sub-symbolic ones, have a critical role to play in order to achieve key properties of XAI such as observability, interpretability, explainability, accountability, and trustability. In this paper we describe an example of integration of symbolic and sub-symbolic techniques. First, we sketch a general framework where symbolic and sub-symbolic approaches could fruitfully combine to produce intelligent behaviour in AI applications. Then, we focus in particular on the goal of building a narrative explanation for ML predictors: to this end, we exploit the logical knowledge obtained translating decision tree predictors into logical programs
Distinguishing cause from effect using observational data: methods and benchmarks
The discovery of causal relationships from purely observational data is a
fundamental problem in science. The most elementary form of such a causal
discovery problem is to decide whether X causes Y or, alternatively, Y causes
X, given joint observations of two variables X, Y. An example is to decide
whether altitude causes temperature, or vice versa, given only joint
measurements of both variables. Even under the simplifying assumptions of no
confounding, no feedback loops, and no selection bias, such bivariate causal
discovery problems are challenging. Nevertheless, several approaches for
addressing those problems have been proposed in recent years. We review two
families of such methods: Additive Noise Methods (ANM) and Information
Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs
that consists of data for 100 different cause-effect pairs selected from 37
datasets from various domains (e.g., meteorology, biology, medicine,
engineering, economy, etc.) and motivate our decisions regarding the "ground
truth" causal directions of all pairs. We evaluate the performance of several
bivariate causal discovery methods on these real-world benchmark data and in
addition on artificially simulated data. Our empirical results on real-world
data indicate that certain methods are indeed able to distinguish cause from
effect using only purely observational data, although more benchmark data would
be needed to obtain statistically significant conclusions. One of the best
performing methods overall is the additive-noise method originally proposed by
Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of
0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of
this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning
Researc
DANMAP 2015:DANMAP 2015 - Use of antimicrobial agents and occurrence of antimicrobial resistance in bacteria from food animals, food and humans in Denmark
- …
