13 research outputs found

    Inferring Causal Direction from Observational Data: A Complexity Approach

    Get PDF
    At the heart of causal structure learning from observational data lies a deceivingly simple question: given two statistically dependent random variables, which one has a causal effect on the other? This is impossible to answer using statistical dependence testing alone and requires that we make additional assumptions. We propose several fast and simple criteria for distinguishing cause and effect in pairs of discrete or continuous random variables. The intuition behind them is that predicting the effect variable using the cause variable should be ‘simpler’ than the reverse – different notions of ‘simplicity’ giving rise to different criteria. We demonstrate the accuracy of the criteria on synthetic data generated under a broad family of causal mechanisms and types of noise

    Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming

    Get PDF
    This paper proposes Auto-MEKAGGP, an Automated Machine Learning (Auto-ML) method for Multi-Label Classification (MLC) based on the MEKA tool, which offers a number of MLC algorithms. In MLC, each example can be associated with one or more class labels, making MLC problems harder than conventional (single-label) classification problems. Hence, it is essential to select an MLC algorithm and its configuration tailored (optimized) for the input dataset. Auto-MEKAGGP addresses this problem with two key ideas. First, a large number of choices of MLC algorithms and configurations from MEKA are represented into a grammar. Second, our proposed Grammar-based Genetic Programming (GGP) method uses that grammar to search for the best MLC algorithm and configuration for the input dataset. Auto-MEKAGGP was tested in 10 datasets and compared to two well-known MLC methods, namely Binary Relevance and Classifier Chain, and also compared to GA-AutoMLC, a genetic algorithm we recently proposed for the same task. Two versions of Auto-MEKAGGP were tested: a full version with the proposed grammar, and a simplified version where the grammar includes only the algorithmic components used by GA-Auto-MLC. Overall, the full version of Auto-MEKAGGP achieved the best predictive accuracy among all five evaluated methods, being the winner in six out of the 10 datasets

    Markov blanket discovery in positive-unlabelled and semi-supervised data

    No full text
    Abstract. The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learn-ing of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully to partially super-vised scenarios is a challenging problem. While there are many different algorithms to derive the Markov blanket of fully supervised nodes, the partially-labelled problem is far more challenging, and there is a lack of principled approaches in the literature. Our work derives a generaliza-tion of the conditional tests of independence for partially labelled binary target variables, which can handle the two main partially labelled scenar-ios: positive-unlabelled and semi-supervised. The result is a significantly deeper understanding of how to control false negative errors in Markov Blanket discovery procedures and how unlabelled data can help

    Multi-label LeGo — Enhancing Multi-label Classifiers with Local Patterns ∗

    No full text
    Abstract. The straightforward approach to multi-label classification is based on decomposition, which essentially treats all labels independently and ignores interactions between labels. We propose to enhance multilabel classifiers with features constructed from local patterns representing explicitly such interdependencies. An Exceptional Model Mining instance is employed to find local patterns representing parts of the data where the conditional dependence relations between the labels are exceptional. We construct binary features from these patterns that can be interpreted as partial solutions to local complexities in the data. These features are then used as input for multi-label classifiers. We experimentally show that using such constructed features can improve the classification performance of decompositive multi-label learning techniques. Keywords: Exceptional Model Mining; Multi-Label Classification; LeGo
    corecore