143 research outputs found

    Scalable Learning of Bayesian Networks Using Feedback Arc Set-Based Heuristics

    Get PDF
    Bayesianske nettverk er en viktig klasse av probabilistiske grafiske modeller. De består av en struktur (en rettet asyklisk graf) som beskriver betingede uavhengighet mellom stokastiske variabler og deres parametere (lokale sannsynlighetsfordelinger). Med andre ord er Bayesianske nettverk generative modeller som beskriver simultanfordelingene på en kompakt form. Den største utfordringen med å lære et Bayesiansk nettverk skyldes selve strukturen, og på grunn av den kombinatoriske karakteren til asyklisitetsegenskapen er det ingen overraskelse at strukturlæringsproblemet generelt er NP-hardt. Det eksisterer algoritmer som løser dette problemet eksakt: dynamisk programmering og heltalls lineær programmering er de viktigste kandidatene når man ønsker å finne strukturen til små til mellomstore Bayesianske nettverk fra data. På den annen side er heuristikk som bakkeklatringsvarianter ofte brukt når man forsøker å lære strukturen til større nettverk med tusenvis av variabler, selv om disse heuristikkene vanligvis ikke har teoretiske garantier og ytelsen i praksis kan bli uforutsigbar når man arbeider med storskala læring. Denne oppgaven tar for seg utvikling av skalerbare metoder som takler det strukturlæringsproblemet av Bayesianske nettverk, samtidig som det forsøkes å opprettholde et nivå av teoretisk kontroll. Dette ble oppnådd ved bruk av relaterte kombinatoriske problemer, nemlig det maksimale asykliske subgrafproblemet (maximum acyclic subgraph) og det duale problemet (feedback arc set). Selv om disse problemene er NP-harde i seg selv, er de betydelig mer håndterbare i praksis. Denne oppgaven utforsker måter å kartlegge Bayesiansk nettverksstrukturlæring til maksimale asykliske subgrafforekomster og trekke ut omtrentlige løsninger for det første problemet, basert på løsninger oppnådd for det andre. Vår forskning tyder på at selv om økt skalerbarhet kan oppnås på denne måten, er det adskillig mer utfordrende å opprettholde den teoretisk forståelsen med denne tilnærmingen. Videre fant vi ut at å lære strukturen til Bayesianske nettverk basert på maksimal asyklisk subgraf kanskje ikke er den beste metoden generelt, men vi identifiserte en kontekst - lineære strukturelle ligningsmodeller - der vi eksperimentelt kunne validere fordelene med denne tilnærmingen, som fører til rask og skalerbar identifisering av strukturen og med mulighet til å lære komplekse strukturer på en måte som er konkurransedyktig med moderne metoder.Bayesian networks form an important class of probabilistic graphical models. They consist of a structure (a directed acyclic graph) expressing conditional independencies among random variables, as well as parameters (local probability distributions). As such, Bayesian networks are generative models encoding joint probability distributions in a compact form. The main difficulty in learning a Bayesian network comes from the structure itself, owing to the combinatorial nature of the acyclicity property; it is well known and does not come as a surprise that the structure learning problem is NP-hard in general. Exact algorithms solving this problem exist: dynamic programming and integer linear programming are prime contenders when one seeks to recover the structure of small-to-medium sized Bayesian networks from data. On the other hand, heuristics such as hill climbing variants are commonly used when attempting to approximately learn the structure of larger networks with thousands of variables, although these heuristics typically lack theoretical guarantees and their performance in practice may become unreliable when dealing with large scale learning. This thesis is concerned with the development of scalable methods tackling the Bayesian network structure learning problem, while attempting to maintain a level of theoretical control. This was achieved via the use of related combinatorial problems, namely the maximum acyclic subgraph problem and its dual problem the minimum feedback arc set problem. Although these problems are NP-hard themselves, they exhibit significantly better tractability in practice. This thesis explores ways to map Bayesian network structure learning into maximum acyclic subgraph instances and extract approximate solutions for the first problem, based on the solutions obtained for the second. Our research suggests that although increased scalability can be achieved this way, maintaining theoretical understanding based on this approach is much more challenging. Furthermore, we found that learning the structure of Bayesian networks based on maximum acyclic subgraph/minimum feedback arc set may not be the go-to method in general, but we identified a setting - linear structural equation models - in which we could experimentally validate the benefits of this approach, leading to fast and scalable structure recovery with the ability to learn complex structures in a competitive way compared to state-of-the-art baselines.Doktorgradsavhandlin

    Formally Verified Compositional Algorithms for Factored Transition Systems

    Get PDF
    Artificial Intelligence (AI) planning and model checking are two disciplines that found wide practical applications. It is often the case that a problem in those two fields concerns a transition system whose behaviour can be encoded in a digraph that models the system's state space. However, due to the very large size of state spaces of realistic systems, they are compactly represented as propositionally factored transition systems. These representations have the advantage of being exponentially smaller than the state space of the represented system. Many problems in AI~planning and model checking involve questions about state spaces, which correspond to graph theoretic questions on digraphs modelling the state spaces. However, existing techniques to answer those graph theoretic questions effectively require, in the worst case, constructing the digraph that models the state space, by expanding the propositionally factored representation of the syste\ m. This is not practical, if not impossible, in many cases because of the state space size compared to the factored representation. One common approach that is used to avoid constructing the state space is the compositional approach, where only smaller abstractions of the system at hand are processed and the given problem (e.g. reachability) is solved for them. Then, a solution for the problem on the concrete system is derived from the solutions of the problem on the abstract systems. The motivation of this approach is that, in the worst case, one need only construct the state spaces of the abstractions which can be exponentially smaller than the state space of the concrete system. We study the application of the compositional approach to two fundamental problems on transition systems: upper-bounding the topological properties (e.g. the largest distance between any two states, i.e. the diameter) of the state spa\ ce, and computing reachability between states. We provide new compositional algorithms to solve both problems by exploiting different structures of the given system. In addition to the use of an existing abstraction (usually referred to as projection) based on removing state space variables, we develop two new abstractions for use within our compositional algorithms. One of the new abstractions is also based on state variables, while the other is based on assignments to state variables. We theoretically and experimentally show that our new compositional algorithms improve the state-of-the-art in solving both problems, upper-bounding state space topological parameters and reachability. We designed the algorithms as well as formally verified them with the aid of an interactive theorem prover. This is the first application that we are aware of, for such a theorem prover based methodology to the design of new algorithms in either AI~planning or model checking

    A Survey of Pipelined Workflow Scheduling: Models and Algorithms

    Get PDF
    International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches

    Semi-automated Design of High-performance Digital Circuits with Xilinx FPGAs

    Get PDF
    Tato diplomová práce se zabývá návrhem sekvenčních digitálních obvodů s ohledem na optimalizaci zpoždění. V práci je popsána problematika dvou technik, které jsou běžně používané při optimalizaci – stručně je popsána technika tzv. synchronizace registrů (angl. retiming), větší pozornost je však věnována technice tzv. zřetězení (angl. pipelining). V rámci praktické části byla vypracována forma abstrakce sekvenčních digitálních obvodů pomocí acyklických orientovaných grafů. Obvod je tak přenesen do roviny, ve které je jednodušší jej transformovat. Zároveň je představen nástroj pro polo-automatickou optimalizaci digitálních obvodů vyvíjených v prostředí Xilinx ISE Design Suite využitím techniky zřetězení.This master's thesis deals with sequential digital circuit design optimization concerning delay optimization. Two techniques commonly used for the optimization are described in the thesis – a brief description of the retiming technique and a more in-depth description of the pipelining technique. A form of abstraction of sequential digital circuits using Directed Acyclic Graphs (DAGs) was developed in the practical part of the thesis. This abstraction represents the circuit in a more manageable way for transformations. At the same time, a tool for semi-automatic digital circuit optimization using pipelining is introduced. This tool is compatible with Xilinx ISE Design Suite.

    Acyclic n-Level Hypergraph Partitioning

    Get PDF

    Centrality measures and analyzing dot-product graphs

    Full text link
    In this thesis we investigate two topics in data mining on graphs; in the first part we investigate the notion of centrality in graphs, in the second part we look at reconstructing graphs from aggregate information. In many graph related problems the goal is to rank nodes based on an importance score. This score is in general referred to as node centrality. In Part I. we start by giving a novel and more efficient algorithm for computing betweenness centrality. In many applications not an individual node but rather a set of nodes is chosen to perform some task. We generalize the notion of centrality to groups of nodes. While group centrality was first formally defined by Everett and Borgatti (1999), we are the first to pose it as a combinatorial optimization problem; find a group of k nodes with largest centrality. We give an algorithm for solving this optimization problem for a general notion of centrality that subsumes various instantiations of centrality that find paths in the graph. We prove that this problem is NP-hard for specific centrality definitions and we provide a universal algorithm for this problem that can be modified to optimize the specific measures. We also investigate the problem of increasing node centrality by adding or deleting edges in the graph. We conclude this part by solving the optimization problem for two specific applications; one for minimizing redundancy in information propagation networks and one for optimizing the expected number of interceptions of a group in a random navigational network. In the second part of the thesis we investigate what we can infer about a bipartite graph if only some aggregate information -- the number of common neighbors among each pair of nodes -- is given. First, we observe that the given data is equivalent to the dot-product of the adjacency vectors of each node. Based on this knowledge we develop an algorithm that is based on SVD-decomposition, that is capable of almost perfectly reconstructing graphs from such neighborhood data. We investigate two versions of this problem, in the versions the dot-product of nodes with themselves, e.g. the node degrees, are either known or hidden

    A survey of Bayesian Network structure learning

    Get PDF

    Practical and effective higher-order optimizations

    Full text link
    Inlining is an optimization that replaces a call to a function with that function’s body. This optimization not only reduces the overhead of a function call, but can expose additional optimization oppor-tunities to the compiler, such as removing redundant operations or unused conditional branches. Another optimization, copy propaga-tion, replaces a redundant copy of a still-live variable with the origi-nal. Copy propagation can reduce the total number of live variables, reducing register pressure and memory usage, and possibly elimi-nating redundant memory-to-memory copies. In practice, both of these optimizations are implemented in nearly every modern com-piler. These two optimizations are practical to implement and effec-tive in first-order languages, but in languages with lexically-scoped first-class functions (aka, closures), these optimizations are no
    corecore