801 research outputs found
Structure and Parameter Learning for Causal Independence and Causal Interaction Models
This paper discusses causal independence models and a generalization of these
models called causal interaction models. Causal interaction models are models
that have independent mechanisms where a mechanism can have several causes. In
addition to introducing several particular types of causal interaction models,
we show how we can apply the Bayesian approach to learning causal interaction
models obtaining approximate posterior distributions for the models and obtain
MAP and ML estimates for the parameters. We illustrate the approach with a
simulation study of learning model posteriors.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in
Artificial Intelligence (UAI1997
Models and Selection Criteria for Regression and Classification
When performing regression or classification, we are interested in the
conditional probability distribution for an outcome or class variable Y given a
set of explanatoryor input variables X. We consider Bayesian models for this
task. In particular, we examine a special class of models, which we call
Bayesian regression/classification (BRC) models, that can be factored into
independent conditional (y|x) and input (x) models. These models are
convenient, because the conditional model (the portion of the full model that
we care about) can be analyzed by itself. We examine the practice of
transforming arbitrary Bayesian models to BRC models, and argue that this
practice is often inappropriate because it ignores prior knowledge that may be
important for learning. In addition, we examine Bayesian methods for learning
models from data. We discuss two criteria for Bayesian model selection that are
appropriate for repression/classification: one described by Spiegelhalter et
al. (1993), and another by Buntine (1993). We contrast these two criteria using
the prequential framework of Dawid (1984), and give sufficient conditions under
which the criteria agree.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in
Artificial Intelligence (UAI1997
A Bayesian Approach to Learning Bayesian Networks with Local Structure
Recently several researchers have investigated techniques for using data to
learn Bayesian networks containing compact representations for the conditional
probability distributions (CPDs) stored at each node. The majority of this work
has concentrated on using decision-tree representations for the CPDs. In
addition, researchers typically apply non-Bayesian (or asymptotically Bayesian)
scoring functions such as MDL to evaluate the goodness-of-fit of networks to
the data. In this paper we investigate a Bayesian approach to learning Bayesian
networks that contain the more general decision-graph representations of the
CPDs. First, we describe how to evaluate the posterior probability that is, the
Bayesian score of such a network, given a database of observed cases. Second,
we describe various search spaces that can be used, in conjunction with a
scoring function and a search procedure, to identify one or more high-scoring
networks. Finally, we present an experimental evaluation of the search spaces,
using a greedy algorithm and a Bayesian scoring function.Comment: Appears in Proceedings of the Thirteenth Conference on Uncertainty in
Artificial Intelligence (UAI1997
Large-Sample Learning of Bayesian Networks is NP-Hard
In this paper, we provide new complexity results for algorithms that learn
discrete-variable Bayesian networks from data. Our results apply whenever the
learning algorithm uses a scoring criterion that favors the simplest model able
to represent the generative distribution exactly. Our results therefore hold
whenever the learning algorithm uses a consistent scoring criterion and is
applied to a sufficiently large dataset. We show that identifying high-scoring
structures is hard, even when we are given an independence oracle, an inference
oracle, and/or an information oracle. Our negative results also apply to the
learning of discrete-variable Bayesian networks in which each node has at most
k parents, for all k > 3.Comment: Appears in Proceedings of the Nineteenth Conference on Uncertainty in
Artificial Intelligence (UAI2003
Finding Optimal Bayesian Networks
In this paper, we derive optimality results for greedy Bayesian-network
search algorithms that perform single-edge modifications at each step and use
asymptotically consistent scoring criteria. Our results extend those of Meek
(1997) and Chickering (2002), who demonstrate that in the limit of large
datasets, if the generative distribution is perfect with respect to a DAG
defined over the observable variables, such search algorithms will identify
this optimal (i.e. generative) DAG model. We relax their assumption about the
generative distribution, and assume only that this distribution satisfies the
{em composition property} over the observable variables, which is a more
realistic assumption for real domains. Under this assumption, we guarantee that
the search algorithms identify an {em inclusion-optimal} model; that is, a
model that (1) contains the generative distribution and (2) has no sub-model
that contains this distribution. In addition, we show that the composition
property is guaranteed to hold whenever the dependence relationships in the
generative distribution can be characterized by paths between singleton
elements in some generative graphical model (e.g. a DAG, a chain graph, or a
Markov network) even when the generative model includes unobserved variables,
and even when the observed data is subject to selection bias.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in
Artificial Intelligence (UAI2002
Selective Greedy Equivalence Search: Finding Optimal Bayesian Networks Using a Polynomial Number of Score Evaluations
We introduce Selective Greedy Equivalence Search (SGES), a restricted version
of Greedy Equivalence Search (GES). SGES retains the asymptotic correctness of
GES but, unlike GES, has polynomial performance guarantees. In particular, we
show that when data are sampled independently from a distribution that is
perfect with respect to a DAG defined over the observable variables
then, in the limit of large data, SGES will identify 's equivalence
class after a number of score evaluations that is (1) polynomial in the number
of nodes and (2) exponential in various complexity measures including
maximum-number-of-parents, maximum-clique-size, and a new measure called {\em
v-width} that is at least as small as---and potentially much smaller than---the
other two. More generally, we show that for any hereditary and
equivalence-invariant property known to hold in , we retain the
large-sample optimality guarantees of GES even if we ignore any GES deletion
operator during the backward phase that results in a state for which does
not hold in the common-descendants subgraph.Comment: Full version of UAI pape
Asymptotic Model Selection for Directed Networks with Hidden Variables
We extend the Bayesian Information Criterion (BIC), an asymptotic
approximation for the marginal likelihood, to Bayesian networks with hidden
variables. This approximation can be used to select models given large samples
of data. The standard BIC as well as our extension punishes the complexity of a
model according to the dimension of its parameters. We argue that the dimension
of a Bayesian network with hidden variables is the rank of the Jacobian matrix
of the transformation between the parameters of the network and the parameters
of the observable variables. We compute the dimensions of several networks
including the naive Bayes model with a hidden root node.Comment: Appears in Proceedings of the Twelfth Conference on Uncertainty in
Artificial Intelligence (UAI1996
CFW: A Collaborative Filtering System Using Posteriors Over Weights Of Evidence
We describe CFW, a computationally efficient algorithm for collaborative
filtering that uses posteriors over weights of evidence. In experiments on real
data, we show that this method predicts as well or better than other methods in
situations where the size of the user query is small. The new approach works
particularly well when the user s query CONTAINS low frequency(unpopular)
items.The approach complements that OF dependency networks which perform well
WHEN the size OF the query IS large.Also IN this paper, we argue that the USE
OF posteriors OVER weights OF evidence IS a natural way TO recommend similar
items collaborative - filtering task.Comment: Appears in Proceedings of the Eighteenth Conference on Uncertainty in
Artificial Intelligence (UAI2002
ARMA Time-Series Modeling with Graphical Models
We express the classic ARMA time-series model as a directed graphical model.
In doing so, we find that the deterministic relationships in the model make it
effectively impossible to use the EM algorithm for learning model parameters.
To remedy this problem, we replace the deterministic relationships with
Gaussian distributions having a small variance, yielding the stochastic ARMA
(ARMA) model. This modification allows us to use the EM algorithm to learn
parmeters and to forecast,even in situations where some data is missing. This
modification, in conjunction with the graphicalmodel approach, also allows us
to include cross predictors in situations where there are multiple times series
and/or additional nontemporal covariates. More surprising,experiments suggest
that the move to stochastic ARMA yields improved accuracy through better
smoothing. We demonstrate improvements afforded by cross prediction and better
smoothing on real data.Comment: Appears in Proceedings of the Twentieth Conference on Uncertainty in
Artificial Intelligence (UAI2004
Learning Mixtures of DAG Models
We describe computationally efficient methods for learning mixtures in which
each component is a directed acyclic graphical model (mixtures of DAGs or
MDAGs). We argue that simple search-and-score algorithms are infeasible for a
variety of problems, and introduce a feasible approach in which parameter and
structure search is interleaved and expected data is treated as real data. Our
approach can be viewed as a combination of (1) the Cheeseman--Stutz asymptotic
approximation for model posterior probability and (2) the
Expectation--Maximization algorithm. We evaluate our procedure for selecting
among MDAGs on synthetic and real examples.Comment: Appears in Proceedings of the Fourteenth Conference on Uncertainty in
Artificial Intelligence (UAI1998
- …