5 research outputs found
Scaling Bayesian network discovery through incremental recovery
Bayesian networks are a type of graphical models that, e.g., allow one to analyze the interaction among the variables in a database. A well-known problem with the discovery of such models from a database is the ``problem of high-dimensionality''. That is, the discovery of a network from a database with a moderate to large number of variables quickly becomes intractable. Most solutions towards this problem have relied on prior knowledge on the structure of the network, e.g., through the definition of an order on the variables. With a growing number of variables, however, this becomes a considerable burden on the data miner. Moreover, mistakes in such prior knowledge have large effects on the final network. Another approach is rather than asking the expert insight in the structure of the final network, asking the database. Our work fits in this approach. More in particular, before we start recovering the network, we first cluster the variables based on a chi-squared measure of association. Then we use an incremental algorithm to discover the network. This algorithm uses the small networks discovered for the individual clusters of variables as its starting point. We illustrate the feasibility of our approach with some experiments. More in particular, we show that in the case where one knows the network, and thus the order, our algorithm yields almost the same network which is, moreover, still an I-map
Genetic algorithms and Gaussian Bayesian networks to uncover the predictive core set of bibliometric indices
The diversity of bibliometric indices today poses the
challenge of exploiting the relationships among them.
Our research uncovers the best core set of relevant
indices for predicting other bibliometric indices. An
added difficulty is to select the role of each variable, that
is, which bibliometric indices are predictive variables
and which are response variables. This results in a novel
multioutput regression problem where the role of each
variable (predictor or response) is unknown beforehand.
We use Gaussian Bayesian networks to solve the this
problem and discover multivariate relationships among
bibliometric indices. These networks are learnt by a
genetic algorithm that looks for the optimal models that
best predict bibliometric data. Results show that the
optimal induced Gaussian Bayesian networks corroborate
previous relationships between several indices, but
also suggest new, previously unreported interactions.
An extended analysis of the best model illustrates that a
set of 12 bibliometric indices can be accurately predicted
using only a smaller predictive core subset composed
of citations, g-index, q2-index, and hr-index. This
research is performed using bibliometric data on
Spanish full professors associated with the computer
science area
Learning Bayesian Networks Using Feature Selection
This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, high-predictive-accuracy networks. The new approach selects a subset of features that maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks that are computationally simpler to evaluate and display predictive accuracy comparable to that of Bayesian networks which model all attributes. 28.1 Introduction Bayesian networks are being increasingly recognized as an important representation for probabilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 [Cooper92] could alleviate such specification difficulties. We describe an extension to the Bayesian network learning approaches introduced in K2. Our goal is to..