168 research outputs found

    Efficient Algorithms for Searching the Minimum Information Partition in Integrated Information Theory

    Full text link
    The ability to integrate information in the brain is considered to be an essential property for cognition and consciousness. Integrated Information Theory (IIT) hypothesizes that the amount of integrated information (Φ\Phi) in the brain is related to the level of consciousness. IIT proposes that to quantify information integration in a system as a whole, integrated information should be measured across the partition of the system at which information loss caused by partitioning is minimized, called the Minimum Information Partition (MIP). The computational cost for exhaustively searching for the MIP grows exponentially with system size, making it difficult to apply IIT to real neural data. It has been previously shown that if a measure of Φ\Phi satisfies a mathematical property, submodularity, the MIP can be found in a polynomial order by an optimization algorithm. However, although the first version of Φ\Phi is submodular, the later versions are not. In this study, we empirically explore to what extent the algorithm can be applied to the non-submodular measures of Φ\Phi by evaluating the accuracy of the algorithm in simulated data and real neural data. We find that the algorithm identifies the MIP in a nearly perfect manner even for the non-submodular measures. Our results show that the algorithm allows us to measure Φ\Phi in large systems within a practical amount of time

    Generalized Permutohedra from Probabilistic Graphical Models

    Get PDF
    A graphical model encodes conditional independence relations via the Markov properties. For an undirected graph these conditional independence relations can be represented by a simple polytope known as the graph associahedron, which can be constructed as a Minkowski sum of standard simplices. There is an analogous polytope for conditional independence relations coming from a regular Gaussian model, and it can be defined using multiinformation or relative entropy. For directed acyclic graphical models and also for mixed graphical models containing undirected, directed and bidirected edges, we give a construction of this polytope, up to equivalence of normal fans, as a Minkowski sum of matroid polytopes. Finally, we apply this geometric insight to construct a new ordering-based search algorithm for causal inference via directed acyclic graphical models.Comment: Appendix B is expanded. Final version to appear in SIAM J. Discrete Mat

    Information-theoretic inference of common ancestors

    Full text link
    A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class that can be derived from observations of a subsystem only. To this end, we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in any DAG representing some unknown larger system. More explicitly, we show that a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information. Within the causal interpretation of DAGs our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause to more than two variables. Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version.Comment: 18 pages, 4 figure

    Discovering robust dependencies from data

    Get PDF
    Science revolves around forming hypotheses, designing experiments, collecting data, and tests. It was not until recently, with the advent of modern hardware and data analytics, that science shifted towards a big-data-driven paradigm that led to an unprecedented success across various fields. What is perhaps the most astounding feature of this new era, is that interesting hypotheses can now be automatically discovered from observational data. This dissertation investigates knowledge discovery procedures that do exactly this. In particular, we seek algorithms that discover the most informative models able to compactly “describe” aspects of the phenomena under investigation, in both supervised and unsupervised settings. We consider interpretable models in the form of subsets of the original variable set. We want the models to capture all possible interactions, e.g., linear, non-linear, between all types of variables, e.g., discrete, continuous, and lastly, we want their quality to be meaningfully assessed. For this, we employ information-theoretic measures, and particularly, the fraction of information for the supervised setting, and the normalized total correlation for the unsupervised. The former measures the uncertainty reduction of the target variable conditioned on a model, and the latter measures the information overlap of the variables included in a model. Without access to the true underlying data generating process, we estimate the aforementioned measures from observational data. This process is prone to statistical errors, and in our case, the errors manifest as biases towards larger models. This can lead to situations where the results are utterly random, hindering therefore further analysis. We correct this behavior with notions from statistical learning theory. In particular, we propose regularized estimators that are unbiased under the hypothesis of independence, leading to robust estimation from limited data samples and arbitrary dimensionalities. Moreover, we do this for models consisting of both discrete and continuous variables. Lastly, to discover the top scoring models, we derive effective optimization algorithms for exact, approximate, and heuristic search. These algorithms are powered by admissible, tight, and efficient-to-compute bounding functions for our proposed estimators that can be used to greatly prune the search space. Overall, the products of this dissertation can successfully assist data analysts with data exploration, discovering powerful description models, or concluding that no satisfactory models exist, implying therefore new experiments and data are required for the phenomena under investigation. This statement is supported by Materials Science researchers who corroborated our discoveries.In der Wissenschaft geht es um Hypothesenbildung, Entwerfen von Experimenten, Sammeln von Daten und Tests. Jüngst hat sich die Wissenschaft, durch das Aufkommen moderner Hardware und Datenanalyse, zu einem Big-Data-basierten Paradigma hin entwickelt, das zu einem beispiellosen Erfolg in verschiedenen Bereichen geführt hat. Ein erstaunliches Merkmal dieser neuen ra ist, dass interessante Hypothesen jetzt automatisch aus Beobachtungsdaten entdeckt werden k nnen. In dieser Dissertation werden Verfahren zur Wissensentdeckung untersucht, die genau dies tun. Insbesondere suchen wir nach Algorithmen, die Modelle identifizieren, die in der Lage sind, Aspekte der untersuchten Ph nomene sowohl in beaufsichtigten als auch in unbeaufsichtigten Szenarien kompakt zu “beschreiben”. Hierzu betrachten wir interpretierbare Modelle in Form von Untermengen der ursprünglichen Variablenmenge. Ziel ist es, dass diese Modelle alle m glichen Interaktionen erfassen (z.B. linear, nicht-lineare), zwischen allen Arten von Variablen unterscheiden (z.B. diskrete, kontinuierliche) und dass schlussendlich ihre Qualit t sinnvoll bewertet wird. Dazu setzen wir informationstheoretische Ma e ein, insbesondere den Informationsanteil für das überwachte und die normalisierte Gesamtkorrelation für das unüberwachte Szenario. Ersteres misst die Unsicherheitsreduktion der Zielvariablen, die durch ein Modell bedingt ist, und letztere misst die Informationsüberlappung der enthaltenen Variablen. Ohne Kontrolle des Datengenerierungsprozesses werden die oben genannten Ma e aus Beobachtungsdaten gesch tzt. Dies ist anf llig für statistische Fehler, die zu Verzerrungen in gr  eren Modellen führen. So entstehen Situationen, wobei die Ergebnisse v llig zuf llig sind und somit weitere Analysen st ren. Wir korrigieren dieses Verhalten mit Methoden aus der statistischen Lerntheorie. Insbesondere schlagen wir regularisierte Sch tzer vor, die unter der Hypothese der Unabh ngigkeit nicht verzerrt sind und somit zu einer robusten Sch tzung aus begrenzten Datenstichproben und willkürlichen-Dimensionalit ten führen. Darüber hinaus wenden wir dies für Modelle an, die sowohl aus diskreten als auch aus kontinuierlichen Variablen bestehen. Um die besten Modelle zu entdecken, leiten wir effektive Optimierungsalgorithmen mit verschiedenen Garantien ab. Diese Algorithmen basieren auf speziellen Begrenzungsfunktionen der vorgeschlagenen Sch tzer und erlauben es den Suchraum stark einzuschr nken. Insgesamt sind die Produkte dieser Arbeit sehr effektiv für die Wissensentdeckung. Letztere Aussage wurde von Materialwissenschaftlern best tigt

    Entropic Inequalities and Marginal Problems

    Full text link
    A marginal problem asks whether a given family of marginal distributions for some set of random variables arises from some joint distribution of these variables. Here we point out that the existence of such a joint distribution imposes non-trivial conditions already on the level of Shannon entropies of the given marginals. These entropic inequalities are necessary (but not sufficient) criteria for the existence of a joint distribution. For every marginal problem, a list of such Shannon-type entropic inequalities can be calculated by Fourier-Motzkin elimination, and we offer a software interface to a Fourier-Motzkin solver for doing so. For the case that the hypergraph of given marginals is a cycle graph, we provide a complete analytic solution to the problem of classifying all relevant entropic inequalities, and use this result to bound the decay of correlations in stochastic processes. Furthermore, we show that Shannon-type inequalities for differential entropies are not relevant for continuous-variable marginal problems; non-Shannon-type inequalities are, both in the discrete and in the continuous case. In contrast to other approaches, our general framework easily adapts to situations where one has additional (conditional) independence requirements on the joint distribution, as in the case of graphical models. We end with a list of open problems. A complementary article discusses applications to quantum nonlocality and contextuality.Comment: 26 pages, 3 figure

    Total positivity in exponential families with application to binary variables

    Full text link
    We study exponential families of distributions that are multivariate totally positive of order 2 (MTP2), show that these are convex exponential families, and derive conditions for existence of the MLE. Quadratic exponential familes of MTP2 distributions contain attractive Gaussian graphical models and ferromagnetic Ising models as special examples. We show that these are defined by intersecting the space of canonical parameters with a polyhedral cone whose faces correspond to conditional independence relations. Hence MTP2 serves as an implicit regularizer for quadratic exponential families and leads to sparsity in the estimated graphical model. We prove that the maximum likelihood estimator (MLE) in an MTP2 binary exponential family exists if and only if both of the sign patterns (1,1)(1,-1) and (1,1)(-1,1) are represented in the sample for every pair of variables; in particular, this implies that the MLE may exist with n=dn=d observations, in stark contrast to unrestricted binary exponential families where 2d2^d observations are required. Finally, we provide a novel and globally convergent algorithm for computing the MLE for MTP2 Ising models similar to iterative proportional scaling and apply it to the analysis of data from two psychological disorders

    Causal discovery beyond Markov equivalence

    Get PDF
    The focus of the dissertation is on learning causal diagrams beyond Markov equivalence. The baseline assumptions in causal structure learning are the acyclicity of the underlying structure and causal sufficiency, which requires that there are no unobserved confounder variables in the system. Under these assumptions, conditional independence relationships contain all the information in the distribution that can be used for structure learning. Therefore, the causal diagram can be identified only up to Markov equivalence, which is the set of structures reflecting the same conditional independence relationships. Hence, for many ground truth structures, the direction of a large portion of the edges will remain unidentified. Hence, in order to learn the structure beyond Markov equivalence, generating or having access to extra joint distributions from the perturbed causal system is required. There are two main scenarios for acquiring the extra joint distributions. The first and main scenario is when an experimenter is directly performing a sequence of interventions on subsets of the variables of the system to generate interventional distributions. We refer to the task of causal discovery from such interventional data as interventional causal structure learning. In this setting, the key question is determining which variables should be intervened on to gain the most information. This is the first focus of this dissertation. The second scenario for acquiring the extra joint distributions is when a subset of causal mechanisms, and consequently the joint distribution of the system, have varied or evolved due to reasons beyond the control of the experimenter. In this case, it is not even a priori known to the experimenter which causal mechanisms have varied. We refer to the task of causal discovery from such multi-domain data as multi-domain causal structure learning. In this setup the main question is how one can take the most advantage of the changes across domains for the task of causal discovery. This is the second focus of this dissertation. Next, we consider cases under which conditional independency may not reflect all the information in the distribution that can be used to identify the underlying structure. One such case is when cycles are allowed in the underlying structure. Unfortunately, a suitable characterization for equivalence for the case of cyclic directed graphs has been unknown so far. The third focus of this dissertation is on bridging the gap between cyclic and acyclic directed graphs by introducing a general approach for equivalence characterization and structure learning. Another case in which conditional independency may not reflect all the information in the distribution is when there are extra assumptions on the generating causal modules. A seminal result in this direction is that a linear model with non-Gaussian exogenous variables is uniquely identifiable. As the forth focus of this dissertation, we consider this setup, yet go one step further and allow for violation of causal sufficiency, and investigate how this generalization affects the identifiability
    corecore