    Bayesian Discovery of Multiple Bayesian Networks via Transfer Learning

    Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data is difficult, therefore transfer learning can improve the robustness of learned networks by leveraging data from related tasks. Existing transfer learning algorithms for Bayesian network structure learning give a single maximum a posteriori estimate of network models. Yet, many other models may be equally likely, and so a more informative result is provided by Bayesian structure discovery. Bayesian structure discovery algorithms estimate posterior probabilities of structural features, such as edges. We present transfer learning for Bayesian structure discovery which allows us to explore the shared and unique structural features among related tasks. Efficient computation requires that our transfer learning objective factors into local calculations, which we prove is given by a broad class of transfer biases. Theoretically, we show the efficiency of our approach. Empirically, we show that compared to single task learning, transfer learning is better able to positively identify true edges. We apply the method to whole-brain neuroimaging data.Comment: 10 page

    Improving Structure MCMC for Bayesian Networks through Markov Blanket Resampling

    Algorithms for inferring the structure of Bayesian networks from data have become an increasingly popular method for uncovering the direct and indirect influences among variables in complex systems. A Bayesian approach to structure learning uses posterior probabilities to quantify the strength with which the data and prior knowledge jointly support each possible graph feature. Existing Markov Chain Monte Carlo (MCMC) algorithms for estimating these posterior probabilities are slow in mixing and convergence, especially for large networks. We present a novel Markov blanket resampling (MBR) scheme that intermittently reconstructs the Markov blanket of nodes, thus allowing the sampler to more effectively traverse low-probability regions between local maxima. As we can derive the complementary forward and backward directions of the MBR proposal distribution, the Metropolis-Hastings algorithm can be used to account for any asymmetries in these proposals. Experiments across a range of network sizes show that the MBR scheme outperforms other state-of-the-art algorithms, both in terms of learning performance and convergence rate. In particular, MBR achieves better learning performance than the other algorithms when the number of observations is relatively small and faster convergence when the number of variables in the network is large

    Bayesian Network Approximation from Local Structures

    This work is focused on the problem of Bayesian network structure learning. There are two main areas in this field which are here discussed.The first area is a theoretical one. We consider some aspects of the Bayesian network structure learning hardness. In particular we prove that the problem of finding a Bayesian network structure with a minimal number of edges encoding the joint probability distribution of a given dataset is NP-hard. This result can be considered as a significantly different than the standard one view on the NP-hardness of the Bayesian network structure learning. The most notable so far results in this area are focused mainly on the specific characterization of the problem, where the aim is to find a Bayesian network structure maximizing some given probabilistic criterion. These criteria arise from quite advanced considerations in the area of statistics, and in particular their interpretation might be not intuitive---especially for the people not familiar with the Bayesian networks domain. In contrary the proposed here criterion, for which the NP-hardness is proved, does not require any advanced knowledge and it can be easily understandable.The second area is related to concrete algorithms. We focus on one of the most interesting branch in history of Bayesian network structure learning methods, leading to a very significant solutions. Namely we consider the branch of local Bayesian network structure learning methods, where the main aim is to gather first of all some information describing local properties of constructed networks, and then use this information appropriately in order to construct the whole network structure. The algorithm which is the root of this branch is focused on the important local characterization of Bayesian networks---so called Markov blankets. The Markov blanket of a given attribute consists of such other attributes which in the probabilistic sense correspond to the maximal in strength and minimal in size set of its causes. The aforementioned first algorithm in the considered here branch is based on one important observation. Subject to appropriate assumptions it is possible to determine the optimal Bayesian network structure by examining relations between attributes only within the Markov blankets. In the case of datasets derived from appropriately sparse distributions, where Markov blanket of each attribute has a limited by some common constant size, such procedure leads to a well time scalable Bayesian network structure learning approach.The Bayesian network local learning branch has mainly evolved in direction of reducing the gathered local information into even smaller and more reliably learned patterns. This reduction has raised from the parallel progress in the Markov blankets approximation field.The main result of this dissertation is the proposal of Bayesian network structure learning procedure which can be placed into the branch of local learning methods and which leads to the fork in its root in fact. The fundamental idea is to appropriately aggregate learned over the Markov blankets local knowledge not in the form of derived dependencies within these blankets---as it happens in the root method, but in the form of local Bayesian networks. The user can thanks to this have much influence on the character of this local knowledge---by choosing appropriate to his needs Bayesian network structure learning method used in order to learn the local structures. The merging approach of local structures into a global one is justified theoretically and evaluated empirically, showing its ability to enhance even very advanced Bayesian network structure learning algorithms, when applying them locally in the proposed scheme.Praca ta skupia się na problemie uczenia struktury sieci bayesowskiej. Są dwa główne pola w tym temacie, które są tutaj omówione.Pierwsze pole ma charakter teoretyczny. Rozpatrujemy pewne aspekty trudności uczenia struktury sieci bayesowskiej. W szczególności pokozujemy, że problem wyznaczenia struktury sieci bayesowskiej o minimalnej liczbie krawędzi kodującej w sobie łączny rozkład prawdopodobieństwa atrybutów danej tabeli danych jest NP-trudny. Rezultat ten może być postrzegany jako istotnie inne od standardowego spojrzenie na NP-trudność uczenia struktury sieci bayesowskiej. Najbardziej znaczące jak dotąd rezultaty w tym zakresie skupiają się głównie na specyficznej charakterystyce problemu, gdzie celem jest wyznaczenie struktury sieci bayesowskiej maksymalizującej pewne zadane probabilistyczne kryterium. Te kryteria wywodzą się z dość zaawansowanych rozważań w zakresie statystyki i w szczególności mogą nie być intuicyjne---szczególnie dla ludzi niezaznajomionych z dziedziną sieci bayesowskich. W przeciwieństwie do tego zaproponowane tutaj kryterium, dla którego została wykazana NP-trudność, nie wymaga żadnej zaawansowanej wiedzy i może być łatwo zrozumiane.Drugie pole wiąże się z konkretnymi algorytmami. Skupiamy się na jednej z najbardziej interesujących gałęzi w historii metod uczenia struktur sieci bayesowskich, prowadzącej do bardzo znaczących rozwiązań. Konkretnie rozpatrujemy gałąź metod lokalnego uczenia struktur sieci bayesowskich, gdzie głównym celem jest zebranie w pierwszej kolejności pewnych informacji opisujących lokalne własności konstruowanych sieci, a następnie użycie tych informacji w odpowiedni sposób celem konstrukcji pełnej struktury sieci. Algorytm będący korzeniem tej gałęzi skupia się na ważnej lokalnej charakteryzacji sieci bayesowskich---tak zwanych kocach Markowa. Koc Markowa dla zadanego atrybutu składa się z tych pozostałych atrybutów, które w sensie probabilistycznym odpowiadają maksymalnymu w sile i minimalnemu w rozmiarze zbiorowi jego przyczyn. Wspomniany pierwszy algorytm w rozpatrywanej tu gałęzi opiera się na jednej istotnej obserwacji. Przy odpowiednich założeniach możliwe jest wyznaczenie optymalnej struktury sieci bayesowskiej poprzez badanie relacji między atrybutami jedynie w obrębie koców Markowa. W przypadku zbiorów danych wywodzących się z odpowiednio rzadkiego rozkładu, gdzie koc Markowa każdego atrybutu ma ograniczony przez pewną wspólną stałą rozmiar, taka procedura prowadzi do dobrze skalowalnego czasowo podejścia uczenia struktury sieci bayesowskiej.Gałąź lokalnego uczenia sieci bayesowskich rozwinęła się głównie w kierunku redukcji zbieranych lokalnych informacji do jeszcze mniejszych i bardziej niezawodnie wyuczanych wzorców. Redukcja ta wyrosła na bazie równoległego rozwoju w dziedzinie aproksymacji koców Markowa.Głównym rezultatem tej rozprawy jest zaproponowanie procedury uczenia struktury sieci bayesowskiej, która może być umiejscowiona w gałęzi metod lokalnego uczenia i która faktycznie wyznacza rozgałęzienie w jego korzeniu. Fundamentalny pomysł polega tu na tym, żeby odpowiednio agregować wyuczoną w obrębie koców Markowa lokalną wiedzę nie w formie wyprowadzonych zależności w obrębie tych koców---tak jak to się dzieje w przypadku metody - korzenia, ale w formie lokalnych sieci bayesowskich. Użytkownik może mieć dzięki temu duży wpływ na charakter tej lokalnej wiedzy---poprzez wybór odpowiedniej dla jego potrzeb metody uczenia struktury sieci bayesowskiej użytej w celu wyznaczenia lokalnych struktur. Procedura scalenia lokalnych modeli celem utworzenia globalnego jest uzasadniona teoretycznie oraz zbadana eksperymentalnie, pokazując jej zdolność do poprawienia nawet bardzo zaawansowanych algorytmów uczenia struktury sieci bayesowskiej, gdy zastosuje się je lokalnie w ramach zaproponowanego schematu

    Learning Patient-Specific Models From Clinical Data

    A key purpose of building a model from clinical data is to predict the outcomes of future individual patients. This work introduces a Bayesian patient-specific predictive framework for constructing predictive models from data that are optimized to predict well for a particular patient case. The construction of such patient-specific models is influenced by the particular history, symptoms, laboratory results, and other features of the patient case at hand. This approach is in contrast to the commonly used population-wide models that are constructed to perform well on average on all future cases.The new patient-specific method described in this research uses Bayesian network models, carries out Bayesian model averaging over a set of models to predict the outcome of interest for the patient case at hand, and employs a patient-specific heuristic to locate a set of suitable models to average over. Two versions of the method are developed that differ in the representation used for the conditional probability distributions in the Bayesian networks. One version uses a representation that captures only the so called global structure among the variables of a Bayesian network and the second representation captures additional local structure among the variables. The patient-specific methods were experimentally evaluated on one synthetic dataset, 21 UCI datasets and three medical datasets. Their performance was measured using five different performance measures and compared to that of several commonly used methods for constructing predictive models including naïve Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor and Lazy Bayesian Rules. Over all the datasets, both patient-specific methods performed better on average on all performance measures and against all the comparison algorithms. The global structure method that performs Bayesian model averaging in conjunction with the patient-specific search heuristic had better performance than either model selection with the patient-specific heuristic or non-patient-specific Bayesian model averaging. However, the additional learning of local structure by the local structure method did not lead to significant improvements over the use of global structure alone. The specific implementation limitations of the local structure method may have limited its performance

    Scalable Learning of Bayesian Networks Using Feedback Arc Set-Based Heuristics

    Bayesianske nettverk er en viktig klasse av probabilistiske grafiske modeller. De består av en struktur (en rettet asyklisk graf) som beskriver betingede uavhengighet mellom stokastiske variabler og deres parametere (lokale sannsynlighetsfordelinger). Med andre ord er Bayesianske nettverk generative modeller som beskriver simultanfordelingene på en kompakt form. Den største utfordringen med å lære et Bayesiansk nettverk skyldes selve strukturen, og på grunn av den kombinatoriske karakteren til asyklisitetsegenskapen er det ingen overraskelse at strukturlæringsproblemet generelt er NP-hardt. Det eksisterer algoritmer som løser dette problemet eksakt: dynamisk programmering og heltalls lineær programmering er de viktigste kandidatene når man ønsker å finne strukturen til små til mellomstore Bayesianske nettverk fra data. På den annen side er heuristikk som bakkeklatringsvarianter ofte brukt når man forsøker å lære strukturen til større nettverk med tusenvis av variabler, selv om disse heuristikkene vanligvis ikke har teoretiske garantier og ytelsen i praksis kan bli uforutsigbar når man arbeider med storskala læring. Denne oppgaven tar for seg utvikling av skalerbare metoder som takler det strukturlæringsproblemet av Bayesianske nettverk, samtidig som det forsøkes å opprettholde et nivå av teoretisk kontroll. Dette ble oppnådd ved bruk av relaterte kombinatoriske problemer, nemlig det maksimale asykliske subgrafproblemet (maximum acyclic subgraph) og det duale problemet (feedback arc set). Selv om disse problemene er NP-harde i seg selv, er de betydelig mer håndterbare i praksis. Denne oppgaven utforsker måter å kartlegge Bayesiansk nettverksstrukturlæring til maksimale asykliske subgrafforekomster og trekke ut omtrentlige løsninger for det første problemet, basert på løsninger oppnådd for det andre. Vår forskning tyder på at selv om økt skalerbarhet kan oppnås på denne måten, er det adskillig mer utfordrende å opprettholde den teoretisk forståelsen med denne tilnærmingen. Videre fant vi ut at å lære strukturen til Bayesianske nettverk basert på maksimal asyklisk subgraf kanskje ikke er den beste metoden generelt, men vi identifiserte en kontekst - lineære strukturelle ligningsmodeller - der vi eksperimentelt kunne validere fordelene med denne tilnærmingen, som fører til rask og skalerbar identifisering av strukturen og med mulighet til å lære komplekse strukturer på en måte som er konkurransedyktig med moderne metoder.Bayesian networks form an important class of probabilistic graphical models. They consist of a structure (a directed acyclic graph) expressing conditional independencies among random variables, as well as parameters (local probability distributions). As such, Bayesian networks are generative models encoding joint probability distributions in a compact form. The main difficulty in learning a Bayesian network comes from the structure itself, owing to the combinatorial nature of the acyclicity property; it is well known and does not come as a surprise that the structure learning problem is NP-hard in general. Exact algorithms solving this problem exist: dynamic programming and integer linear programming are prime contenders when one seeks to recover the structure of small-to-medium sized Bayesian networks from data. On the other hand, heuristics such as hill climbing variants are commonly used when attempting to approximately learn the structure of larger networks with thousands of variables, although these heuristics typically lack theoretical guarantees and their performance in practice may become unreliable when dealing with large scale learning. This thesis is concerned with the development of scalable methods tackling the Bayesian network structure learning problem, while attempting to maintain a level of theoretical control. This was achieved via the use of related combinatorial problems, namely the maximum acyclic subgraph problem and its dual problem the minimum feedback arc set problem. Although these problems are NP-hard themselves, they exhibit significantly better tractability in practice. This thesis explores ways to map Bayesian network structure learning into maximum acyclic subgraph instances and extract approximate solutions for the first problem, based on the solutions obtained for the second. Our research suggests that although increased scalability can be achieved this way, maintaining theoretical understanding based on this approach is much more challenging. Furthermore, we found that learning the structure of Bayesian networks based on maximum acyclic subgraph/minimum feedback arc set may not be the go-to method in general, but we identified a setting - linear structural equation models - in which we could experimentally validate the benefits of this approach, leading to fast and scalable structure recovery with the ability to learn complex structures in a competitive way compared to state-of-the-art baselines.Doktorgradsavhandlin