71 research outputs found

    Uniform random generation of large acyclic digraphs

    Full text link
    Directed acyclic graphs are the basic representation of the structure underlying Bayesian networks, which represent multivariate probability distributions. In many practical applications, such as the reverse engineering of gene regulatory networks, not only the estimation of model parameters but the reconstruction of the structure itself is of great interest. As well as for the assessment of different structure learning algorithms in simulation studies, a uniform sample from the space of directed acyclic graphs is required to evaluate the prevalence of certain structural features. Here we analyse how to sample acyclic digraphs uniformly at random through recursive enumeration, an approach previously thought too computationally involved. Based on complexity considerations, we discuss in particular how the enumeration directly provides an exact method, which avoids the convergence issues of the alternative Markov chain methods and is actually computationally much faster. The limiting behaviour of the distribution of acyclic digraphs then allows us to sample arbitrarily large graphs. Building on the ideas of recursive enumeration based sampling we also introduce a novel hybrid Markov chain with much faster convergence than current alternatives while still being easy to adapt to various restrictions. Finally we discuss how to include such restrictions in the combinatorial enumeration and the new hybrid Markov chain method for efficient uniform sampling of the corresponding graphs.Comment: 15 pages, 2 figures. To appear in Statistics and Computin

    Partition MCMC for inference on acyclic digraphs

    Full text link
    Acyclic digraphs are the underlying representation of Bayesian networks, a widely used class of probabilistic graphical models. Learning the underlying graph from data is a way of gaining insights about the structural properties of a domain. Structure learning forms one of the inference challenges of statistical graphical models. MCMC methods, notably structure MCMC, to sample graphs from the posterior distribution given the data are probably the only viable option for Bayesian model averaging. Score modularity and restrictions on the number of parents of each node allow the graphs to be grouped into larger collections, which can be scored as a whole to improve the chain's convergence. Current examples of algorithms taking advantage of grouping are the biased order MCMC, which acts on the alternative space of permuted triangular matrices, and non ergodic edge reversal moves. Here we propose a novel algorithm, which employs the underlying combinatorial structure of DAGs to define a new grouping. As a result convergence is improved compared to structure MCMC, while still retaining the property of producing an unbiased sample. Finally the method can be combined with edge reversal moves to improve the sampler further.Comment: Revised version. 34 pages, 16 figures. R code available at https://github.com/annlia/partitionMCM

    Counting and Sampling Markov Equivalent Directed Acyclic Graphs

    Get PDF
    Exploring directed acyclic graphs (DAGs) in a Markov equivalence class is pivotal to infer causal effects or to discover the causal DAG via appropriate interventional data. We consider counting and uniform sampling of DAGs that are Markov equivalent to a given DAG. These problems efficiently reduce to counting the moral acyclic orientations of a given undirected connected chordal graph on n vertices, for which we give two algorithms. Our first algorithm requires O(2(n)n(4)) arithmetic operations, improving a previous super-exponential upper bound. The second requires O (k! 2(k) k(2)n) operations, where k is the size of the largest clique in the graph; for bounded-degree graphs this bound is linear in n. After a single run, both algorithms enable uniform sampling from the equivalence class at a computational cost linear in the graph size. Empirical results indicate that our algorithms are superior to previously presented algorithms over a range of inputs; graphs with hundreds of vertices and thousands of edges are processed in a second on a desktop computer.Peer reviewe

    Sets as graphs

    Get PDF
    The aim of this thesis is a mutual transfer of computational and structural results and techniques between sets and graphs. We study combinatorial enumeration of sets, canonical encodings, random generation, digraph immersions. We also investigate the underlying structure of sets in algorithmic terms, or in connection with hereditary graphs classes. Finally, we employ a set-based proof-checker to verify two classical results on claw-free graph

    Renormalization: an advanced overview

    Full text link
    We present several approaches to renormalization in QFT: the multi-scale analysis in perturbative renormalization, the functional methods \`a la Wetterich equation, and the loop-vertex expansion in non-perturbative renormalization. While each of these is quite well-established, they go beyond standard QFT textbook material, and may be little-known to specialists of each other approach. This review is aimed at bridging this gap.Comment: Review, 130 pages, 33 figures; v2: misprints corrected, refs. added, minor improvements; v3: some changes to sect. 5, refs. adde

    The combinatorics of minimal unsatisfiability: connecting to graph theory

    Get PDF
    Minimally Unsatisfiable CNFs (MUs) are unsatisfiable CNFs where removing any clause destroys unsatisfiability. MUs are the building blocks of unsatisfia-bility, and our understanding of them can be very helpful in answering various algorithmic and structural questions relating to unsatisfiability. In this thesis we study MUs from a combinatorial point of view, with the aim of extending the understanding of the structure of MUs. We show that some important classes of MUs are very closely related to known classes of digraphs, and using arguments from logic and graph theory we characterise these MUs.Two main concepts in this thesis are isomorphism of CNFs and the implica-tion digraph of 2-CNFs (at most two literals per disjunction). Isomorphism of CNFs involves renaming the variables, and flipping the literals. The implication digraph of a 2-CNF F has both arcs (¬a → b) and (¬b → a) for every binary clause (a ∨ b) in F .In the first part we introduce a novel connection between MUs and Minimal Strong Digraphs (MSDs), strongly connected digraphs, where removing any arc destroys the strong connectedness. We introduce the new class DFM of special MUs, which are in close correspondence to MSDs. The known relation between 2-CNFs and implication digraphs is used, but in a simpler and more direct way, namely that we have a canonical choice of one of the two arcs. As an application of this new framework we provide short and intuitive new proofs for two im-portant but isolated characterisations for nonsingular MUs (every literal occurs at least twice), both with ingenious but complicated proofs: Characterising 2-MUs (minimally unsatisfiable 2-CNFs), and characterising MUs with deficiency 2 (two more clauses than variables).In the second part, we provide a fundamental addition to the study of 2-CNFs which have efficient algorithms for many interesting problems, namely that we provide a full classification of 2-MUs and a polytime isomorphism de-cision of this class. We show that implication digraphs of 2-MUs are “Weak Double Cycles” (WDCs), big cycles of small cycles (with possible overlaps). Combining logical and graph-theoretical methods, we prove that WDCs have at most one skew-symmetry (a self-inverse fixed-point free anti-symmetry, re-versing the direction of arcs). It follows that the isomorphisms between 2-MUs are exactly the isomorphisms between their implication digraphs (since digraphs with given skew-symmetry are the same as 2-CNFs). This reduces the classifi-cation of 2-MUs to the classification of a nice class of digraphs.Finally in the outlook we discuss further applications, including an alter-native framework for enumerating some special Minimally Unsatisfiable Sub-clause-sets (MUSs)

    Counting and Sampling Directed Acyclic Graphs for Learning Bayesian Networks

    Get PDF
    Bayesian networks are probabilistic models that represent dependencies between random variables via directed acyclic graphs (DAGs). They provide a succinct representation for the joint distribution in cases where the dependency structure is sparse. Specifying the network by hand is often unfeasible, and thus it would be desirable to learn the model from observed data over the variables. In this thesis, we study computational problems encountered in different approaches to learning Bayesian networks. All of the problems involve counting or sampling DAGs under various constraints. One important computational problem in the fully Bayesian approach to structure learning is the problem of sampling DAGs from the posterior distribution over all the possible structures for the Bayesian network. From the typical modeling assumptions it follows that the distribution is modular, which means that the probability of each DAG factorizes into per-node weights, each of which depends only on the parent set of the node. For this problem, we give the first exact algorithm with a time complexity bound exponential in the number of nodes, and thus polynomial in the size of the input, which consists of all the possible per-node weights. We also adapt the algorithm such that it outperforms the previous methods in the special case of sampling DAGs from the uniform distribution. We also study the problem of counting the linear extensions of a given partial order, which appears as a subroutine in some importance sampling methods for modular distributions. This problem is a classic example of a #P-complete problem that can be approximately solved in polynomial time by reduction to sampling linear extensions uniformly at random. We present two new randomized approximation algorithms for the problem. The first algorithm extends the applicable range of an exact dynamic programming algorithm by using sampling to reduce the given instance into an easier instance. The second algorithm is obtained by combining a novel, Markov chain-based exact sampler with the Tootsie Pop algorithm, a recent generic scheme for reducing counting into sampling. Together, these two algorithms speed up approximate linear extension counting by multiple orders of magnitude in practice. Finally, we investigate the problem of counting and sampling DAGs that are Markov equivalent to a given DAG. This problem is important in learning causal Bayesian networks, because distinct Markov equivalent DAGs cannot be distinguished only based on observational data, yet they are different from the causal viewpoint. We speed up the state-of-the-art recursive algorithm for the problem by using dynamic programming. We also present a new, tree decomposition-based algorithm, which runs in linear time if the size of the maximum clique is bounded.Bayes-verkot mallintavat satunnaismuuttujien välisiä tilastollisia suhteita suunnattuina syklittöminä verkkoina, joissa solmut vastaavat satunnaismuuttujia ja kaaret niiden välisiä riippuvuuksia. Verkkorakenne havainnollistaa muuttujien kuvaaman ilmiön rakennetta ja mahdollistaa muuttujien yhteisjakauman esittämisen tiiviissä muodossa. Vaikka Bayes-verkko voidaan periaatteessa rakentaa käsin, se on epäkäytännöllistä, mikäli muuttujia on paljon tai mallinnettavaa ilmiötä ei ymmärretä täydellisesti. Tämän takia on hyödyllistä oppia verkon rakenne ilmiöstä kerätyn datan perusteella. Väitöskirjassa tutkitaan laskennallisia ongelmia, jotka liittyvät Bayes-verkon rakenteen oppimiseen. Kaikki nämä ongelmat koskevat suunnattujen syklittömien verkkojen laskemista tai satunnaisotantaa erilaisilla rajoitteilla. Yksi keskeinen ongelma Bayes-verkon rakenteen oppimisessa on rakenteen poiminta posteriorisatunnaisjakaumasta, joka painottaa parhaiten dataa vastaavia rakenteita. Väitöskirjassa esitellään tähän ongelmaan ensimmäinen eksakti algoritmi, joka hyödyntämällä posteriorijakauman erityisominaisuuksia saavuttaa polynomisen aikavaativuuden suhteessa jakauman määrittelevän tietorakenteen kokoon. Algoritmi tarjoaa myös aiempia algoritmeja tehokkaamman tavan suunnattujen syklittömien verkkojen poimintaan tasajakaumasta. Toinen väitöskirjassa tutkittu ongelma on osittaisjärjestyksen lineaariekstensioiden laskenta. Tämä ongelma tiedetään kuuluvaksi vaikeiden laskentaongelmien #P-luokkaan, mutta se voidaan silti ratkaista likimäärisesti polynomisessa ajassa palauttamalla se vastaavaan satunnaisotantaongelmaan. Väitöskirja esittelee kaksi uutta likimääräistä satunnaisalgoritmia lineaariekstensioiden laskentaan. Ensimmäinen algoritmi muuttaa tunnetun eksaktin laskenta-algoritmin likimääräiseksi yhdistämällä siihen satunnaisotokseen perustuvaa arviointia. Toinen algoritmi palauttaa laskentaongelman uuteen Markovin ketjuihin perustuvan satunnaisotantamenetelmään. Yhdessä nämä kaksi algoritmia nopeuttavat käytännön tapauksissa likimääräistä lineaariekstensioiden laskentaa usealla kertaluokalla. Työn loppuosassa tutkitaan tietyssä Markov-ekvivalenssiluokassa olevien suunnattujen syklittömien verkkojen laskenta- ja satunnaisotantaongelmia. Ongelma on tärkeä Bayes-verkkojen käytössä kausaalisten riippuvuuksien mallintamiseen, koska Markov-ekvivalentteja rakenteita ei voi erottaa pelkästään havaintodatan perusteella, vaikka ne ovat kausaalisesta näkökulmasta erilaisia. Työssä esitellään tapa nopeuttaa parasta tunnettua algoritmia dynaamisen ohjelmoinnin avulla. Tämän lisäksi väitöskirja esittelee uuden verkon puuhajotelmaan perustuvan menetelmän, jonka aikavaativuus on lineaarinen, mikäli verkon suurimman klikin koko on rajoitettu
    corecore