19 research outputs found

    Neural Graphical Models

    Full text link
    Probabilistic Graphical Models are often used to understand dynamics of a system. They can model relationships between features (nodes) and the underlying distribution. Theoretically these models can represent very complex dependency functions, but in practice often simplifying assumptions are made due to computational limitations associated with graph operations. In this work we introduce Neural Graphical Models (NGMs) which attempt to represent complex feature dependencies with reasonable computational costs. Given a graph of feature relationships and corresponding samples, we capture the dependency structure between the features along with their complex function representations by using a neural network as a multi-task learning framework. We provide efficient learning, inference and sampling algorithms. NGMs can fit generic graph structures including directed, undirected and mixed-edge graphs as well as support mixed input data types. We present empirical studies that show NGMs' capability to represent Gaussian graphical models, perform inference analysis of a lung cancer data and extract insights from a real world infant mortality data provided by Centers for Disease Control and Prevention

    Federated Learning with Neural Graphical Models

    Full text link
    Federated Learning (FL) addresses the need to create models based on proprietary data in such a way that multiple clients retain exclusive control over their data, while all benefit from improved model accuracy due to pooled resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic Graphical models that utilize the expressive power of neural networks to learn complex non-linear dependencies between the input features. They learn to capture the underlying data distribution and have efficient algorithms for inference and sampling. We develop a FL framework which maintains a global NGM model that learns the averaged information from the local NGM models while keeping the training data within the client's environment. Our design, FedNGMs, avoids the pitfalls and shortcomings of neuron matching frameworks like Federated Matched Averaging that suffers from model parameter explosion. Our global model size remains constant throughout the process. In the cases where clients have local variables that are not part of the combined global distribution, we propose a `Stitching' algorithm, which personalizes the global NGM models by merging the additional variables using the client's data. FedNGM is robust to data heterogeneity, large number of participants, and limited communication bandwidth

    Axiomatic Interpretability for Multiclass Additive Models

    Full text link
    Generalized additive models (GAMs) are favored in many regression and binary classification problems because they are able to fit complex, nonlinear functions while still remaining interpretable. In the first part of this paper, we generalize a state-of-the-art GAM learning algorithm based on boosted trees to the multiclass setting, and show that this multiclass algorithm outperforms existing GAM learning algorithms and sometimes matches the performance of full complexity models such as gradient boosted trees. In the second part, we turn our attention to the interpretability of GAMs in the multiclass setting. Surprisingly, the natural interpretability of GAMs breaks down when there are more than two classes. Naive interpretation of multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we identify two axioms that any additive model must satisfy in order to not be visually misleading. We then develop a technique called Additive Post-Processing for Interpretability (API), that provably transforms a pre-trained additive model to satisfy the interpretability axioms without sacrificing accuracy. The technique works not just on models trained with our learning algorithm, but on any multiclass additive model, including multiclass linear and logistic regression. We demonstrate the effectiveness of API on a 12-class infant mortality dataset.Comment: KDD 201

    Defining explanation in probabilistic systems

    No full text
    As probabilistic systems gain popularity and are coming into wider use, the need for a mechanism that explains the system’s findings and recommendations becomes more critical. The system will also need a mechanism for ordering competing explanations. We examine two representative approaches to explanation in the literature— one due to Gärdenfors and one due to Pearl—and show that both suffer from significant problems. We propose an approach to defining a notion of “better explanation ” that combines some of the features of both together with more recent work by Pearl and others on causality.

    Making rational decisions using adaptive utility elicitation

    No full text
    Rational decision making requires full knowledge of the utility function of the person affected by the decisions. However, in many cases, the task of acquiring such knowledge is not feasible due to the size of the outcome space and the complexity of the utility elicitation process. Given that the amount of utility information we can acquire is limited, we need to make decisions with partial utility information and should carefully select which utility elicitation questions we ask. In this paper, we propose a new approach for this problem that utilizes a prior probability distribution over the person’s utility function, perhaps learned from a population of similar people. The relevance of a utility elicitation question for the current decision problem can then be measured using its value of information. We propose an algorithm that interleaves the analysis of the decision problem and utility elicitation to allow these two tasks to inform each other. At every step, it asks the utility elicitation question giving us the highest value of information and computes the best strategy based on the information acquired so far, stopping when the expected utility loss resulting from our recommendation falls below a pre-specified threshold. We show how the various steps of this algorithm can be implemented efficiently
    corecore