8 research outputs found
Probabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation
process where benchmarks availability is essential. In this paper, we aim to
randomly generate relational database benchmarks that allow to check
probabilistic dependencies among the attributes. We are particularly interested
in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs)
to a relational data mining context and enable effective and robust reasoning
over relational data. Even though a panoply of works have focused, separately ,
on the generation of random Bayesian networks and relational databases, no work
has been identified for PRMs on that track. This paper provides an algorithmic
approach for generating random PRMs from scratch to fill this gap. The proposed
method allows to generate PRMs as well as synthetic relational data from a
randomly generated relational schema and a random set of probabilistic
dependencies. This can be of interest not only for machine learning researchers
to evaluate their proposals in a common framework, but also for databases
designers to evaluate the effectiveness of the components of a database
management system
Is Hyper-extensionality Preservable Under Deletions of Graph Elements?
Any hereditarily finite set S can be represented as a finite pointed graph \u2013dubbed membership graph\u2013 whose nodes denote elements of the transitive closure of {S} and whose edges model the membership relation. Membership graphs must be hyper-extensional, that is pairwise distinct nodes are not bisimilar and (uniquely) represent hereditarily finite sets.
We will see that the removal of even a single node or edge from a membership graph can cause \u201ccollapses\u201d of different nodes and, therefore, the loss of hyper-extensionality of the graph itself. With the intent of gaining a deeper understanding on the class of hyper-extensional hereditarily finite sets, this paper investigates whether pointed hyper-extensional graphs always contain either a node or an edge whose removal does not disrupt the hyper-extensionality property
Uniform random generation of large acyclic digraphs
Directed acyclic graphs are the basic representation of the structure
underlying Bayesian networks, which represent multivariate probability
distributions. In many practical applications, such as the reverse engineering
of gene regulatory networks, not only the estimation of model parameters but
the reconstruction of the structure itself is of great interest. As well as for
the assessment of different structure learning algorithms in simulation
studies, a uniform sample from the space of directed acyclic graphs is required
to evaluate the prevalence of certain structural features. Here we analyse how
to sample acyclic digraphs uniformly at random through recursive enumeration,
an approach previously thought too computationally involved. Based on
complexity considerations, we discuss in particular how the enumeration
directly provides an exact method, which avoids the convergence issues of the
alternative Markov chain methods and is actually computationally much faster.
The limiting behaviour of the distribution of acyclic digraphs then allows us
to sample arbitrarily large graphs. Building on the ideas of recursive
enumeration based sampling we also introduce a novel hybrid Markov chain with
much faster convergence than current alternatives while still being easy to
adapt to various restrictions. Finally we discuss how to include such
restrictions in the combinatorial enumeration and the new hybrid Markov chain
method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin
The birth of the strong components
Random directed graphs undergo a phase transition around the point
, and the width of the transition window has been known since the
works of Luczak and Seierstad. They have established that as
when , the asymptotic probability that the strongly
connected components of a random directed graph are only cycles and single
vertices decreases from 1 to 0 as goes from to .
By using techniques from analytic combinatorics, we establish the exact
limiting value of this probability as a function of and provide more
properties of the structure of a random digraph around, below and above its
transition point. We obtain the limiting probability that a random digraph is
acyclic and the probability that it has one strongly connected complex
component with a given difference between the number of edges and vertices
(called excess). Our result can be extended to the case of several complex
components with given excesses as well in the whole range of sparse digraphs.
Our study is based on a general symbolic method which can deal with a great
variety of possible digraph families, and a version of the saddle-point method
which can be systematically applied to the complex contour integrals appearing
from the symbolic method. While the technically easiest model is the model of
random multidigraphs, in which multiple edges are allowed, and where edge
multiplicities are sampled independently according to a Poisson distribution
with a fixed parameter , we also show how to systematically approach the
family of simple digraphs, where multiple edges are forbidden, and where
2-cycles are either allowed or not.
Our theoretical predictions are supported by numerical simulations, and we
provide tables of numerical values for the integrals of Airy functions that
appear in this study.Comment: 62 pages, 12 figures, 6 tables. Supplementary computer algebra
computations available at https://gitlab.com/vit.north/strong-components-au
Sets as graphs
The aim of this thesis is a mutual transfer of computational and structural results and techniques between sets and graphs. We study combinatorial enumeration of sets, canonical encodings, random generation, digraph immersions. We also investigate the underlying structure of sets in algorithmic terms, or in connection with hereditary graphs classes. Finally, we employ a set-based proof-checker to verify two classical results on claw-free graph