104 research outputs found
Information Bottlenecks, Causal States, and Statistical Relevance Bases: How to Represent Relevant Information in Memoryless Transduction
Discovering relevant, but possibly hidden, variables is a key step in
constructing useful and predictive theories about the natural world. This brief
note explains the connections between three approaches to this problem: the
recently introduced information-bottleneck method, the computational mechanics
approach to inferring optimal models, and Salmon's statistical relevance basis.Comment: 3 pages, no figures, submitted to PRE as a "brief report". Revision:
added an acknowledgements section originally omitted by a LaTeX bu
Beyond Word N-Grams
We describe, analyze, and evaluate experimentally a new probabilistic model
for word-sequence prediction in natural language based on prediction suffix
trees (PSTs). By using efficient data structures, we extend the notion of PST
to unbounded vocabularies. We also show how to use a Bayesian approach based on
recursive priors over all possible PSTs to efficiently maintain tree mixtures.
These mixtures have provably and practically better performance than almost any
single model. We evaluate the model on several corpora. The low perplexity
achieved by relatively small PST mixture models suggests that they may be an
advantageous alternative, both theoretically and practically, to the widely
used n-gram models.Comment: 15 pages, one PostScript figure, uses psfig.sty and fullname.sty.
Revised version of a paper in the Proceedings of the Third Workshop on Very
Large Corpora, MIT, 199
Objective Classification of Galaxy Spectra using the Information Bottleneck Method
A new method for classification of galaxy spectra is presented, based on a
recently introduced information theoretical principle, the `Information
Bottleneck'. For any desired number of classes, galaxies are classified such
that the information content about the spectra is maximally preserved. The
result is classes of galaxies with similar spectra, where the similarity is
determined via a measure of information. We apply our method to approximately
6000 galaxy spectra from the ongoing 2dF redshift survey, and a mock-2dF
catalogue produced by a Cold Dark Matter-based semi-analytic model of galaxy
formation. We find a good match between the mean spectra of the classes found
in the data and in the models. For the mock catalogue, we find that the classes
produced by our algorithm form an intuitively sensible sequence in terms of
physical properties such as colour, star formation activity, morphology, and
internal velocity dispersion. We also show the correlation of the classes with
the projections resulting from a Principal Component Analysis.Comment: submitted to MNRAS, 17 pages, Latex, with 14 figures embedde
Using state space differential geometry for nonlinear blind source separation
Given a time series of multicomponent measurements of an evolving stimulus,
nonlinear blind source separation (BSS) seeks to find a "source" time series,
comprised of statistically independent combinations of the measured components.
In this paper, we seek a source time series with local velocity cross
correlations that vanish everywhere in stimulus state space. However, in an
earlier paper the local velocity correlation matrix was shown to constitute a
metric on state space. Therefore, nonlinear BSS maps onto a problem of
differential geometry: given the metric observed in the measurement coordinate
system, find another coordinate system in which the metric is diagonal
everywhere. We show how to determine if the observed data are separable in this
way, and, if they are, we show how to construct the required transformation to
the source coordinate system, which is essentially unique except for an unknown
rotation that can be found by applying the methods of linear BSS. Thus, the
proposed technique solves nonlinear BSS in many situations or, at least,
reduces it to linear BSS, without the use of probabilistic, parametric, or
iterative procedures. This paper also describes a generalization of this
methodology that performs nonlinear independent subspace separation. In every
case, the resulting decomposition of the observed data is an intrinsic property
of the stimulus' evolution in the sense that it does not depend on the way the
observer chooses to view it (e.g., the choice of the observing machine's
sensors). In other words, the decomposition is a property of the evolution of
the "real" stimulus that is "out there" broadcasting energy to the observer.
The technique is illustrated with analytic and numerical examples.Comment: Contains 14 pages and 3 figures. For related papers, see
http://www.geocities.com/dlevin2001/ . New version is identical to original
version except for URL in the bylin
A Bivariate Measure of Redundant Information
We define a measure of redundant information based on projections in the
space of probability distributions. Redundant information between random
variables is information that is shared between those variables. But in
contrast to mutual information, redundant information denotes information that
is shared about the outcome of a third variable. Formalizing this concept, and
being able to measure it, is required for the non-negative decomposition of
mutual information into redundant and synergistic information. Previous
attempts to formalize redundant or synergistic information struggle to capture
some desired properties. We introduce a new formalism for redundant information
and prove that it satisfies all the properties necessary outlined in earlier
work, as well as an additional criterion that we propose to be necessary to
capture redundancy. We also demonstrate the behaviour of this new measure for
several examples, compare it to previous measures and apply it to the
decomposition of transfer entropy.Comment: 16 pages, 15 figures, 1 table, added citation to Griffith et al 2012,
Maurer et al 199
Machine learning and the physical sciences
Machine learning encompasses a broad range of algorithms and modeling tools
used for a vast array of data processing tasks, which has entered most
scientific disciplines in recent years. We review in a selective way the recent
research on the interface between machine learning and physical sciences. This
includes conceptual developments in machine learning (ML) motivated by physical
insights, applications of machine learning techniques to several domains in
physics, and cross-fertilization between the two fields. After giving basic
notion of machine learning methods and principles, we describe examples of how
statistical physics is used to understand methods in ML. We then move to
describe applications of ML methods in particle physics and cosmology, quantum
many body physics, quantum computing, and chemical and material physics. We
also highlight research and development into novel computing architectures
aimed at accelerating ML. In each of the sections we describe recent successes
as well as domain-specific methodology and challenges
Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes
Information-theoretic principles for learning and acting have been proposed
to solve particular classes of Markov Decision Problems. Mathematically, such
approaches are governed by a variational free energy principle and allow
solving MDP planning problems with information-processing constraints expressed
in terms of a Kullback-Leibler divergence with respect to a reference
distribution. Here we consider a generalization of such MDP planners by taking
model uncertainty into account. As model uncertainty can also be formalized as
an information-processing constraint, we can derive a unified solution from a
single generalized variational principle. We provide a generalized value
iteration scheme together with a convergence proof. As limit cases, this
generalized scheme includes standard value iteration with a known model,
Bayesian MDP planning, and robust planning. We demonstrate the benefits of this
approach in a grid world simulation.Comment: 16 pages, 3 figure
Shannon Meets Carnot: Generalized Second Thermodynamic Law
The classical thermodynamic laws fail to capture the behavior of systems with
energy Hamiltonian which is an explicit function of the temperature. Such
Hamiltonian arises, for example, in modeling information processing systems,
like communication channels, as thermal systems. Here we generalize the second
thermodynamic law to encompass systems with temperature-dependent energy
levels, , where denotes averaging over
the Boltzmann distribution and reveal a new definition to the basic notion of
temperature. This generalization enables to express, for instance, the mutual
information of the Gaussian channel as a consequence of the fundamental laws of
nature - the laws of thermodynamics
Causal blankets : Theory and algorithmic framework
Funding Information: F.R. was supported by the Ad Astra Chandaria foundation. P.M. was funded by the Wellcome Trust (grant no. 210920/Z/18/Z). M.B. was supported by a grant from Tem-pleton World Charity Foundation, Inc. (TWCF). The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of TWCF. Publisher Copyright: © 2020, Springer Nature Switzerland AG. This is a post-peer-review, pre-copyedit version of Rosas, F. E., Mediano, P. A. M., Biehl, M., Chandaria, S., & Polani, D. (2020). Causal blankets: Theory and algorithmic framework. In T. Verbelen, P. Lanillos, C. L. Buckley, & C. De Boom (Eds.), Active Inference - First International Workshop, IWAI 2020, Co-located with ECML/PKDD 2020, Proceedings (pp. 187-198). (Communications in Computer and Information Science; Vol. 1326). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-64919-7_19We introduce a novel framework to identify perception-action loops (PALOs) directly from data based on the principles of computational mechanics. Our approach is based on the notion of causal blanket, which captures sensory and active variables as dynamical sufficient statistics—i.e. as the “differences that make a difference.” Furthermore, our theory provides a broadly applicable procedure to construct PALOs that requires neither a steady-state nor Markovian dynamics. Using our theory, we show that every bipartite stochastic process has a causal blanket, but the extent to which this leads to an effective PALO formulation varies depending on the integrated information of the bipartition
- …