79 research outputs found

    Advances in identifiability of nonlinear probabilistic models

    Get PDF
    Identifiability is a highly prized property of statistical models. This thesis investigates this property in nonlinear models encountered in two fields of statistics: representation learning and causal discovery. In representation learning, identifiability leads to learning interpretable and reproducible representations, while in causal discovery, it is necessary for the estimation of correct causal directions. We begin by leveraging recent advances in nonlinear ICA to show that the latent space of a VAE is identifiable up to a permutation and pointwise nonlinear transformations of its components. A factorized prior distribution over the latent variables conditioned on an auxiliary observed variable, such as a class label or nearly any other observation, is required for our result. We also extend previous identifiability results in nonlinear ICA to the case of noisy or undercomplete observations, and incorporate them into a maximum likelihood framework. Our second contribution is to develop the Independently Modulated Component Analysis (IMCA) framework, a generalization of nonlinear ICA to non-independent latent variables. We show that we can drop the independence assumption in ICA while maintaining identifiability, resulting in a very flexible and generic framework for principled disentangled representation learning. This finding is predicated on the existence of an auxiliary variable that modulates the joint distribution of the latent variables in a factorizable manner. As a third contribution, we extend the identifiability theory to a broad family of conditional energy-based models (EBMs). This novel model generalizes earlier results by removing any distributional assumptions on the representations, which are ubiquitous in the latent variable setting. The conditional EBM can learn identifiable overcomplete representations and has universal approximation capabilities/. Finally, we investigate a connection between the framework of autoregressive normalizing flow models and causal discovery. Causal models derived from affine autoregressive flows are shown to be identifiable, generalizing the wellknown additive noise model. Using normalizing flows, we can compute the exact likelihood of the causal model, which is subsequently used to derive a likelihood ratio measure for causal discovery. They are also invertible, making them perfectly suitable for performing causal inference tasks like interventions and counterfactuals

    Towards music perception by redundancy reduction and unsupervised learning in probabilistic models

    Get PDF
    PhDThe study of music perception lies at the intersection of several disciplines: perceptual psychology and cognitive science, musicology, psychoacoustics, and acoustical signal processing amongst others. Developments in perceptual theory over the last fifty years have emphasised an approach based on Shannon’s information theory and its basis in probabilistic systems, and in particular, the idea that perceptual systems in animals develop through a process of unsupervised learning in response to natural sensory stimulation, whereby the emerging computational structures are well adapted to the statistical structure of natural scenes. In turn, these ideas are being applied to problems in music perception. This thesis is an investigation of the principle of redundancy reduction through unsupervised learning, as applied to representations of sound and music. In the first part, previous work is reviewed, drawing on literature from some of the fields mentioned above, and an argument presented in support of the idea that perception in general and music perception in particular can indeed be accommodated within a framework of unsupervised learning in probabilistic models. In the second part, two related methods are applied to two different low-level representations. Firstly, linear redundancy reduction (Independent Component Analysis) is applied to acoustic waveforms of speech and music. Secondly, the related method of sparse coding is applied to a spectral representation of polyphonic music, which proves to be enough both to recognise that the individual notes are the important structural elements, and to recover a rough transcription of the music. Finally, the concepts of distance and similarity are considered, drawing in ideas about noise, phase invariance, and topological maps. Some ecologically and information theoretically motivated distance measures are suggested, and put in to practice in a novel method, using multidimensional scaling (MDS), for visualising geometrically the dependency structure in a distributed representation.Engineering and Physical Science Research Counci

    Causal discovery beyond Markov equivalence

    Get PDF
    The focus of the dissertation is on learning causal diagrams beyond Markov equivalence. The baseline assumptions in causal structure learning are the acyclicity of the underlying structure and causal sufficiency, which requires that there are no unobserved confounder variables in the system. Under these assumptions, conditional independence relationships contain all the information in the distribution that can be used for structure learning. Therefore, the causal diagram can be identified only up to Markov equivalence, which is the set of structures reflecting the same conditional independence relationships. Hence, for many ground truth structures, the direction of a large portion of the edges will remain unidentified. Hence, in order to learn the structure beyond Markov equivalence, generating or having access to extra joint distributions from the perturbed causal system is required. There are two main scenarios for acquiring the extra joint distributions. The first and main scenario is when an experimenter is directly performing a sequence of interventions on subsets of the variables of the system to generate interventional distributions. We refer to the task of causal discovery from such interventional data as interventional causal structure learning. In this setting, the key question is determining which variables should be intervened on to gain the most information. This is the first focus of this dissertation. The second scenario for acquiring the extra joint distributions is when a subset of causal mechanisms, and consequently the joint distribution of the system, have varied or evolved due to reasons beyond the control of the experimenter. In this case, it is not even a priori known to the experimenter which causal mechanisms have varied. We refer to the task of causal discovery from such multi-domain data as multi-domain causal structure learning. In this setup the main question is how one can take the most advantage of the changes across domains for the task of causal discovery. This is the second focus of this dissertation. Next, we consider cases under which conditional independency may not reflect all the information in the distribution that can be used to identify the underlying structure. One such case is when cycles are allowed in the underlying structure. Unfortunately, a suitable characterization for equivalence for the case of cyclic directed graphs has been unknown so far. The third focus of this dissertation is on bridging the gap between cyclic and acyclic directed graphs by introducing a general approach for equivalence characterization and structure learning. Another case in which conditional independency may not reflect all the information in the distribution is when there are extra assumptions on the generating causal modules. A seminal result in this direction is that a linear model with non-Gaussian exogenous variables is uniquely identifiable. As the forth focus of this dissertation, we consider this setup, yet go one step further and allow for violation of causal sufficiency, and investigate how this generalization affects the identifiability

    Book reports

    Get PDF

    Nonlinearity, Feedback and Uniform Consistency in Causal Structural Learning

    Full text link
    The goal of Causal Discovery is to find automated search methods for learning causal structures from observational data. In some cases all variables of the interested causal mechanism are measured, and the task is to predict the effects one measured variable has on another. In contrast, sometimes the variables of primary interest are not directly observable but instead inferred from their manifestations in the data. These are referred to as latent variables. One commonly known example is the psychological construct of intelligence, which cannot directly measured so researchers try to assess through various indicators such as IQ tests. In this case, casual discovery algorithms can uncover underlying patterns and structures to reveal the causal connections between the latent variables and between the latent and observed variables. This thesis focuses on two questions in causal discovery: providing an alternative definition of k-Triangle Faithfulness that (i) is weaker than strong faithfulness when applied to the Gaussian family of distributions, (ii) can be applied to non-Gaussian families of distributions, and (iii) under the assumption that the modified version of Strong Faithfulness holds, can be used to show the uniform consistency of a modified causal discovery algorithm; relaxing the sufficiency assumption to learn causal structures with latent variables. Given the importance of inferring cause-and-effect relationships for understanding and forecasting complex systems, the work in this thesis of relaxing various simplification assumptions is expected to extend the causal discovery method to be applicable in a wider range with diversified causal mechanism and statistical phenomena

    Syy-seuraussuhteiden oppiminen piilomuuttujien vaikutuksessa

    Get PDF
    The causal relationships determining the behaviour of a system under study are inherently directional: by manipulating a cause we can control its effect, but an effect cannot be used to control its cause. Understanding the network of causal relationships is necessary, for example, if we want to predict the behaviour in settings where the system is subject to different manipulations. However, we are rarely able to directly observe the causal processes in action; we only see the statistical associations they induce in the collected data. This thesis considers the discovery of the fundamental causal relationships from data in several different learning settings and under various modeling assumptions. Although the research is mostly theoretical, possible application areas include biology, medicine, economics and the social sciences. Latent confounders, unobserved common causes of two or more observed parts of a system, are especially troublesome when discovering causal relations. The statistical dependence relations induced by such latent confounders often cannot be distinguished from directed causal relationships. Possible presence of feedback, that induces a cyclic causal structure, provides another complicating factor. To achieve informative learning results in this challenging setting, some restricting assumptions need to be made. One option is to constrain the functional forms of the causal relationships to be smooth and simple. In particular, we explore how linearity of the causal relations can be effectively exploited. Another common assumption under study is causal faithfulness, with which we can deduce the lack of causal relations from the lack of statistical associations. Along with these assumptions, we use data from randomized experiments, in which the system under study is observed under different interventions and manipulations. In particular, we present a full theoretical foundation of learning linear cyclic models with latent variables using second order statistics in several experimental data sets. This includes sufficient and necessary conditions on the different experimental settings needed for full model identification, a provably complete learning algorithm and characterization of the underdetermination when the data do not allow for full model identification. We also consider several ways of exploiting the faithfulness assumption for this model class. We are able to learn from overlapping data sets, in which different (but overlapping) subsets of variables are observed. In addition, we formulate a model class called Noisy-OR models with latent confounding. We prove sufficient and worst case necessary conditions for the identifiability of the full model and derive several learning algorithms. The thesis also suggests the optimal sets of experiments for the identification of the above models and others. For settings without latent confounders, we develop a Bayesian learning algorithm that is able to exploit non-Gaussianity in passively observed data.Syy-seuraussuhteet, jotka viime kädessä määrittävät tutkittavan järjestelmän toiminnan, ovat suunnattuja: syyhyn puuttumalla voimme vaikuttaa seuraukseen, mutta seuraukseen puuttumalla ei voida vaikuttaa syyhyn. Syy-seuraussuhteiden verkon tunteminen on ensiarvoisen tärkeää, erityisesti jos haluamme todella ymmärtää miten järjestelmä toimii, esimerkiksi kun sitä manipuloidaan tai muutetaan. Useimmiten syy-seuraus mekanismien toimintaa ei voida suoraan nähdä, ainostaan mekanismien aikaansaamat tilastolliset riippuvuudet havaitaan. Tässä väitöskirjassa esitellään menetelmiä syy-seuraussuhteiden oppimiseen havaituista riippuvuuksista tilastollisessa datassa, erilaisissa ympäristöissä ja tilanteissa. Tutkimuksen lähtökohta on teoreettinen, mahdollisia sovelluskohteita voi löytyä mm. biologiasta, lääketieteestä, taloustieteestä ja yhteiskuntatieteestä. Erityinen hankaluus syy-seuraussuhteiden oppimisen kannalta ovat piilomuuttujat, jotka vastaavat tutkittavan järjestelmän mittaamattomia osia. Piilomuuttujat voivat saada aikaan tilastollisia riippuvuuksia, joita on vaikea erottaa syy-seuraussuhteiden aiheuttamista riippuvuuksista. Syy-seuraussuhdeverkot voivat myös pitää sisällään syklejä. Jotta seuraussuhteita voidaan oppia näissä tilanteissa, tarvitaan muita yksinkertaistavia oletuksia. Yksittäisten seuraussuhteiden kompleksisuutta voidaan rajoittaa esimerkiksi lineaariseksi. Myös niin kutsuttu uskollisuusoletus, jonka mukaan eri seuraussuhteet eivät täysin kumoa toistensa vaikutusta, on hyödyllinen. Jossain tapauksissa tutkittavasta järjestelmästä saadaan havaintoja siihen itse vaikuttaen, esimerkiksi satunnaistetuissa kokeissa. Väitöskirjassa esitellään useita oppimismenetelmiä, useissa eri oppimistilainteissa, eri oletusten vallitessa. Syy-seuraussuhteita opitaan käyttäen erilaisissa koetilanteissa havaittua dataa. Erityisesti tarkastellaan teoreettisesti mitä seuraussuhteita voidaan oppia missäkin tilanteessa ja mitä ei. Väitöskirjassa esitellään myös optimaalisia koejärjestelyitä
    corecore