Search CORE

108 research outputs found

Learning Logistic Circuits

Author: Broeck Guy Van den
Liang Yitao
Publication venue
Publication date: 27/02/2019
Field of study

This paper proposes a new classification model called logistic circuits. On MNIST and Fashion datasets, our learning algorithm outperforms neural networks that have an order of magnitude more parameters. Yet, logistic circuits have a distinct origin in symbolic AI, forming a discriminative counterpart to probabilistic-logical circuits such as ACs, SPNs, and PSDDs. We show that parameter learning for logistic circuits is convex optimization, and that a simple local search algorithm can induce strong model structures from data.Comment: Published in the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI19

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Joints in Random Forests

Author: Campos Cassio P. de
Correia Alvaro H. C.
Peharz Robert
Publication venue
Publication date: 01/01/2020
Field of study

Decision Trees (DTs) and Random Forests (RFs) are powerful discriminative learners and tools of central importance to the everyday machine learning practitioner and data scientist. Due to their discriminative nature, however, they lack principled methods to process inputs with missing features or to detect outliers, which requires pairing them with imputation techniques or a separate generative model. In this paper, we demonstrate that DTs and RFs can naturally be interpreted as generative models, by drawing a connection to Probabilistic Circuits, a prominent class of tractable probabilistic models. This reinterpretation equips them with a full joint distribution over the feature space and leads to Generative Decision Trees (GeDTs) and Generative Forests (GeFs), a family of novel hybrid generative-discriminative models. This family of models retains the overall characteristics of DTs and RFs while additionally being able to handle missing features by means of marginalisation. Under certain assumptions, frequently made for Bayes consistency results, we show that consistency in GeDTs and GeFs extend to any pattern of missing input features, if missing at random. Empirically, we show that our models often outperform common routines to treat missing data, such as K-nearest neighbour imputation, and moreover, that our models can naturally detect outliers by monitoring the marginal probability of input features

Pure OAI Repository

Conditional Sum-Product Networks: Imposing Structure on Deep Probabilistic Architectures

Author: Kersting Kristian
Liebig Thomas
Molina Alejandro
Peharz Robert
Shao Xiaoting
Stelzner Karl
Vergari Antonio
Publication venue
Publication date: 01/01/2019
Field of study

Probabilistic graphical models are a central tool in AI; however, they are generally not as expressive as deep neural models, and inference is notoriously hard and slow. In contrast, deep probabilistic models such as sum-product networks (SPNs) capture joint distributions in a tractable fashion, but still lack the expressive power of intractable models based on deep neural networks. Therefore, we introduce conditional SPNs (CSPNs), conditional density estimators for multivariate and potentially hybrid domains which allow harnessing the expressive power of neural networks while still maintaining tractability guarantees. One way to implement CSPNs is to use an existing SPN structure and condition its parameters on the input, e.g., via a deep neural network. This approach, however, might misrepresent the conditional independence structure present in data. Consequently, we also develop a structure-learning approach that derives both the structure and parameters of CSPNs from data. Our experimental evidence demonstrates that CSPNs are competitive with other probabilistic models and yield superior performance on multilabel image classification compared to mean field and mixture density networks. Furthermore, they can successfully be employed as building blocks for structured probabilistic models, such as autoregressive image models.Comment: 13 pages, 6 figure

arXiv.org e-Print Archive

TUbiblio

Group Fairness by Probabilistic Modeling with Latent Fair Decisions

Author: Broeck Guy Van den
Choi YooJung
Dang Meihua
Publication venue
Publication date: 16/12/2020
Field of study

Machine learning systems are increasingly being used to make impactful decisions such as loan applications and criminal justice risk assessments, and as such, ensuring fairness of these systems is critical. This is often challenging as the labels in the data are biased. This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label. In particular, we aim to achieve demographic parity by enforcing certain independencies in the learned model. We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data. In order to closely model the data distribution, we employ probabilistic circuits, an expressive and tractable probabilistic model, and propose an algorithm to learn them from incomplete data. We evaluate our approach on a synthetic dataset in which observed labels indeed come from fair labels but with added bias, and demonstrate that the fair labels are successfully retrieved. Moreover, we show on real-world datasets that our approach not only is a better model than existing methods of how the data was generated but also achieves competitive accuracy

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Joints in Random Forests

Author: Correia Alvaro H. C.
de Campos Cassio
Peharz Robert
Publication venue
Publication date: 01/01/2020
Field of study

arXiv.org e-Print Archive

Pure OAI Repository

The Circle of Meaning: From Translation to Paraphrasing and Back

Author: Madnani Nitin
Publication venue
Publication date: 01/01/2010
Field of study

The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this goal of more obvious importance than for the tasks of machine translation and paraphrase generation. Preserving meaning between the input and the output is paramount for both, the monolingual vs bilingual distinction notwithstanding. In this thesis, I present a novel, symbiotic relationship between these two tasks that I term the "circle of meaning''. Today's statistical machine translation (SMT) systems require high quality human translations for parameter tuning, in addition to large bi-texts for learning the translation units. This parameter tuning usually involves generating translations at different points in the parameter space and obtaining feedback against human-authored reference translations as to how good the translations. This feedback then dictates what point in the parameter space should be explored next. To measure this feedback, it is generally considered wise to have multiple (usually 4) reference translations to avoid unfair penalization of translation hypotheses which could easily happen given the large number of ways in which a sentence can be translated from one language to another. However, this reliance on multiple reference translations creates a problem since they are labor intensive and expensive to obtain. Therefore, most current MT datasets only contain a single reference. This leads to the problem of reference sparsity---the primary open problem that I address in this dissertation---one that has a serious effect on the SMT parameter tuning process. Bannard and Callison-Burch (2005) were the first to provide a practical connection between phrase-based statistical machine translation and paraphrase generation. However, their technique is restricted to generating phrasal paraphrases. I build upon their approach and augment a phrasal paraphrase extractor into a sentential paraphraser with extremely broad coverage. The novelty in this augmentation lies in the further strengthening of the connection between statistical machine translation and paraphrase generation; whereas Bannard and Callison-Burch only relied on SMT machinery to extract phrasal paraphrase rules and stopped there, I take it a few steps further and build a full English-to-English SMT system. This system can, as expected, ``translate'' any English input sentence into a new English sentence with the same degree of meaning preservation that exists in a bilingual SMT system. In fact, being a state-of-the-art SMT system, it is able to generate n-best "translations" for any given input sentence. This sentential paraphraser, built almost entirely from existing SMT machinery, represents the first 180 degrees of the circle of meaning. To complete the circle, I describe a novel connection in the other direction. I claim that the sentential paraphraser, once built in this fashion, can provide a solution to the reference sparsity problem and, hence, be used to improve the performance a bilingual SMT system. I discuss two different instantiations of the sentential paraphraser and show several results that provide empirical validation for this connection

CiteSeerX

Digital Repository at the University of Maryland

Tractable probabilistic models for causal learning and reasoning

Author: Wang Benjie
Publication venue
Publication date: 20/09/2023
Field of study

This thesis examines the application of tractable probabilistic modelling principles to causal learning and reasoning. Tractable probabilistic modelling is a promising paradigm that has emerged in recent years, which focuses on probabilistic models that enable exact and efficient probabilistic reasoning. In particular, the framework of probabilistic circuits provides a systematic language of the tractability of models for various inference queries based on their structural properties, with recent proposals pushing the boundaries of expressiveness and tractability. However, not all information about a system can be captured through a probability distribution over observed variables; for example, the causal direction between two variables can be indistinguishable from data alone. Formalizing this, Pearl’s Causal Hierarchy (also known as the information hierarchy) delineates three levels of causal queries, namely, associational, interventional, and counterfactual, that require increasingly greater knowledge of the underlying causal system, represented by a structural causal model and associated causal diagram. Motivated by this, we investigate the possibility of tractable causal modelling; that is, exact and efficient reasoning with respect to classes of causal queries. In particular, we identify three scenarios, separated by the amount of knowledge available to the modeler: namely, when the full causal diagram/model is available, when only the observational distribution and identifiable causal estimand are available, and when there is additionally uncertainty over the causal diagram. In each of the scenarios, we propose probabilistic circuit representations, structural properties, and algorithms that enable efficient and exact causal reasoning. These models are distinguished from tractable probabilistic models in that they can not only answer different probabilistic inference queries, but also causal queries involving different interventions and even different causal diagrams. However, we also identify key limitations that cast doubt on the existence of a fully general tractable causal model. Our contributions also extend the theory of probabilistic circuits by proposing new properties and circuit architectures, which enable the analysis of advanced inference queries including, but not limited to, causal inference estimands

Oxford University Research Archive

Recommended from our members

The Syntactic Bits of Nouns: How Prior Syntactic Distributions Affect Comprehension, Production, and Acquisition

Author: Lester Nicholas
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

Usage-based linguistic theory argues that experience is the fundamental organizing principle of language. Linguistic representations are extracted from – and continuously tuned by – probabilistic features of language use. Much psycholinguistic evidence supports this argument, particularly in the domain of lexical processing. For example, how a word is distributed across its various lexical and morphological contexts influences how quickly it is recognized and produced in isolation. Fewer studies have explored how syntactic distributions affect lexical processing, and of these, even fewer have adopted comprehensive, abstract measurements of syntax. In this dissertation, I present several new information-theoretic tools for measuring the syntactic distributions of words based on the Dependency Grammar formalism. This formalism allows me to contrast two independent dimensions of syntactic structure: hierarchical status and word order. Further, I provide a new method for teasing apart information bound to syntactic and lexical contexts. I compute these measures for nouns based on two large corpora of English.These measures are correlated with behavior in several contexts. First, I re-analyze the noun-based trials of two previously published databases of visual lexical decision response time data, one simple and the other primed. I then turn to production, reporting two picture-naming studies. In the first, participants produce nouns in isolation. This task consitutes a stong attack on the hypothesis that syntactic distributions affect noun production; at least on its face, it does not require participants to access syntactic information in order to successfully complete the task. In a follow up, participants were asked to name the images using a syntactic frame (the + NAME). This task should promote syntactic access, increasing the likelihood that prior syntactic distributions should play a role. Finally, I test whether children are senstive to these syntactic distributions (based on adult speech) as they begin to produce nouns in syntactic contexts for the first time using a large, densely sampled longitudinal corpus of child speech.Results show that isolated noun processing is affected by prior syntactic distributions in both comprehension and production. However, the specific nature of these effects differs across modalities, and in production, as a function of whether the nouns were produced in isolation or within a syntactic frame. The measures also predict the age at which nouns first emerge in the speech of children

eScholarship - University of California

Probabilistic Modelling of Morphologically Rich Languages

Author: Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

arXiv.org e-Print Archive

Oxford University Research Archive

Insights on Learning Tractable Probabilistic Graphical Models

Author: Chaim Correia Alvaro Henrique
Publication venue: Eindhoven University of Technology
Publication date: 22/06/2023
Field of study

Pure OAI Repository