8 research outputs found

    Probabilistic Relational Model Benchmark Generation

    Get PDF
    The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relational data mining context and enable effective and robust reasoning over relational data. Even though a panoply of works have focused, separately , on the generation of random Bayesian networks and relational databases, no work has been identified for PRMs on that track. This paper provides an algorithmic approach for generating random PRMs from scratch to fill this gap. The proposed method allows to generate PRMs as well as synthetic relational data from a randomly generated relational schema and a random set of probabilistic dependencies. This can be of interest not only for machine learning researchers to evaluate their proposals in a common framework, but also for databases designers to evaluate the effectiveness of the components of a database management system

    Is Hyper-extensionality Preservable Under Deletions of Graph Elements?

    Get PDF
    Any hereditarily finite set S can be represented as a finite pointed graph \u2013dubbed membership graph\u2013 whose nodes denote elements of the transitive closure of {S} and whose edges model the membership relation. Membership graphs must be hyper-extensional, that is pairwise distinct nodes are not bisimilar and (uniquely) represent hereditarily finite sets. We will see that the removal of even a single node or edge from a membership graph can cause \u201ccollapses\u201d of different nodes and, therefore, the loss of hyper-extensionality of the graph itself. With the intent of gaining a deeper understanding on the class of hyper-extensional hereditarily finite sets, this paper investigates whether pointed hyper-extensional graphs always contain either a node or an edge whose removal does not disrupt the hyper-extensionality property

    Uniform random generation of large acyclic digraphs

    Full text link
    Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin

    The birth of the strong components

    Full text link
    Random directed graphs D(n,p)D(n,p) undergo a phase transition around the point p=1/np = 1/n, and the width of the transition window has been known since the works of Luczak and Seierstad. They have established that as n→∞n \to \infty when p=(1+μn−1/3)/np = (1 + \mu n^{-1/3})/n, the asymptotic probability that the strongly connected components of a random directed graph are only cycles and single vertices decreases from 1 to 0 as μ\mu goes from −∞-\infty to ∞\infty. By using techniques from analytic combinatorics, we establish the exact limiting value of this probability as a function of μ\mu and provide more properties of the structure of a random digraph around, below and above its transition point. We obtain the limiting probability that a random digraph is acyclic and the probability that it has one strongly connected complex component with a given difference between the number of edges and vertices (called excess). Our result can be extended to the case of several complex components with given excesses as well in the whole range of sparse digraphs. Our study is based on a general symbolic method which can deal with a great variety of possible digraph families, and a version of the saddle-point method which can be systematically applied to the complex contour integrals appearing from the symbolic method. While the technically easiest model is the model of random multidigraphs, in which multiple edges are allowed, and where edge multiplicities are sampled independently according to a Poisson distribution with a fixed parameter pp, we also show how to systematically approach the family of simple digraphs, where multiple edges are forbidden, and where 2-cycles are either allowed or not. Our theoretical predictions are supported by numerical simulations, and we provide tables of numerical values for the integrals of Airy functions that appear in this study.Comment: 62 pages, 12 figures, 6 tables. Supplementary computer algebra computations available at https://gitlab.com/vit.north/strong-components-au

    Sets as graphs

    Get PDF
    The aim of this thesis is a mutual transfer of computational and structural results and techniques between sets and graphs. We study combinatorial enumeration of sets, canonical encodings, random generation, digraph immersions. We also investigate the underlying structure of sets in algorithmic terms, or in connection with hereditary graphs classes. Finally, we employ a set-based proof-checker to verify two classical results on claw-free graph
    corecore