Search CORE

801 research outputs found

Structure and Parameter Learning for Causal Independence and Causal Interaction Models

Author: Heckerman David
Meek Christopher
Publication venue
Publication date: 16/05/2015
Field of study

This paper discusses causal independence models and a generalization of these models called causal interaction models. Causal interaction models are models that have independent mechanisms where a mechanism can have several causes. In addition to introducing several particular types of causal interaction models, we show how we can apply the Bayesian approach to learning causal interaction models obtaining approximate posterior distributions for the models and obtain MAP and ML estimates for the parameters. We illustrate the approach with a simulation study of learning model posteriors.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997

arXiv.org e-Print Archive

Models and Selection Criteria for Regression and Classification

Author: Heckerman David
Meek Christopher
Publication venue
Publication date: 06/02/2013
Field of study

When performing regression or classification, we are interested in the conditional probability distribution for an outcome or class variable Y given a set of explanatoryor input variables X. We consider Bayesian models for this task. In particular, we examine a special class of models, which we call Bayesian regression/classification (BRC) models, that can be factored into independent conditional (y|x) and input (x) models. These models are convenient, because the conditional model (the portion of the full model that we care about) can be analyzed by itself. We examine the practice of transforming arbitrary Bayesian models to BRC models, and argue that this practice is often inappropriate because it ignores prior knowledge that may be important for learning. In addition, we examine Bayesian methods for learning models from data. We discuss two criteria for Bayesian model selection that are appropriate for repression/classification: one described by Spiegelhalter et al. (1993), and another by Buntine (1993). We contrast these two criteria using the prequential framework of Dawid (1984), and give sufficient conditions under which the criteria agree.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997

arXiv.org e-Print Archive

A Bayesian Approach to Learning Bayesian Networks with Local Structure

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Publication venue
Publication date: 16/05/2015
Field of study

Recently several researchers have investigated techniques for using data to learn Bayesian networks containing compact representations for the conditional probability distributions (CPDs) stored at each node. The majority of this work has concentrated on using decision-tree representations for the CPDs. In addition, researchers typically apply non-Bayesian (or asymptotically Bayesian) scoring functions such as MDL to evaluate the goodness-of-fit of networks to the data. In this paper we investigate a Bayesian approach to learning Bayesian networks that contain the more general decision-graph representations of the CPDs. First, we describe how to evaluate the posterior probability that is, the Bayesian score of such a network, given a database of observed cases. Second, we describe various search spaces that can be used, in conjunction with a scoring function and a search procedure, to identify one or more high-scoring networks. Finally, we present an experimental evaluation of the search spaces, using a greedy algorithm and a Bayesian scoring function.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI1997

arXiv.org e-Print Archive

Large-Sample Learning of Bayesian Networks is NP-Hard

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Publication venue
Publication date: 19/10/2012
Field of study

In this paper, we provide new complexity results for algorithms that learn discrete-variable Bayesian networks from data. Our results apply whenever the learning algorithm uses a scoring criterion that favors the simplest model able to represent the generative distribution exactly. Our results therefore hold whenever the learning algorithm uses a consistent scoring criterion and is applied to a sufficiently large dataset. We show that identifying high-scoring structures is hard, even when we are given an independence oracle, an inference oracle, and/or an information oracle. Our negative results also apply to the learning of discrete-variable Bayesian networks in which each node has at most k parents, for all k > 3.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence (UAI2003

arXiv.org e-Print Archive

Finding Optimal Bayesian Networks

Author: Chickering David Maxwell
Meek Christopher
Publication venue
Publication date: 12/12/2012
Field of study

In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. generative) DAG model. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the {em composition property} over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an {em inclusion-optimal} model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002

arXiv.org e-Print Archive

Selective Greedy Equivalence Search: Finding Optimal Bayesian Networks Using a Polynomial Number of Score Evaluations

Author: Chickering David Maxwell
Meek Christopher
Publication venue
Publication date: 05/06/2015
Field of study

We introduce Selective Greedy Equivalence Search (SGES), a restricted version of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of GES but, unlike GES, has polynomial performance guarantees. In particular, we show that when data are sampled independently from a distribution that is perfect with respect to a DAG

{\cal G}

defined over the observable variables then, in the limit of large data, SGES will identify

{\cal G}

's equivalence class after a number of score evaluations that is (1) polynomial in the number of nodes and (2) exponential in various complexity measures including maximum-number-of-parents, maximum-clique-size, and a new measure called {\em v-width} that is at least as small as---and potentially much smaller than---the other two. More generally, we show that for any hereditary and equivalence-invariant property

\Pi

known to hold in

{\cal G}

, we retain the large-sample optimality guarantees of GES even if we ignore any GES deletion operator during the backward phase that results in a state for which

\Pi

does not hold in the common-descendants subgraph.Comment: Full version of UAI pape

arXiv.org e-Print Archive

Asymptotic Model Selection for Directed Networks with Hidden Variables

Author: Geiger Dan
Heckerman David
Meek Christopher
Publication venue
Publication date: 16/05/2015
Field of study

We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence (UAI1996

arXiv.org e-Print Archive

CFW: A Collaborative Filtering System Using Posteriors Over Weights Of Evidence

Author: Heckerman David
Kadie Carl
Meek Christopher
Publication venue
Publication date: 16/05/2015
Field of study

We describe CFW, a computationally efficient algorithm for collaborative filtering that uses posteriors over weights of evidence. In experiments on real data, we show that this method predicts as well or better than other methods in situations where the size of the user query is small. The new approach works particularly well when the user s query CONTAINS low frequency(unpopular) items.The approach complements that OF dependency networks which perform well WHEN the size OF the query IS large.Also IN this paper, we argue that the USE OF posteriors OVER weights OF evidence IS a natural way TO recommend similar items collaborative - filtering task.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002

arXiv.org e-Print Archive

ARMA Time-Series Modeling with Graphical Models

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Thiesson Bo
Publication venue
Publication date: 08/08/2012
Field of study

We express the classic ARMA time-series model as a directed graphical model. In doing so, we find that the deterministic relationships in the model make it effectively impossible to use the EM algorithm for learning model parameters. To remedy this problem, we replace the deterministic relationships with Gaussian distributions having a small variance, yielding the stochastic ARMA (ARMA) model. This modification allows us to use the EM algorithm to learn parmeters and to forecast,even in situations where some data is missing. This modification, in conjunction with the graphicalmodel approach, also allows us to include cross predictors in situations where there are multiple times series and/or additional nontemporal covariates. More surprising,experiments suggest that the move to stochastic ARMA yields improved accuracy through better smoothing. We demonstrate improvements afforded by cross prediction and better smoothing on real data.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004

arXiv.org e-Print Archive

Learning Mixtures of DAG Models

Author: Chickering David Maxwell
Heckerman David
Meek Christopher
Thiesson Bo
Publication venue
Publication date: 16/05/2015
Field of study

We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman--Stutz asymptotic approximation for model posterior probability and (2) the Expectation--Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples.Comment: Appears in Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI1998

arXiv.org e-Print Archive