2,737 research outputs found
The equivalence of information-theoretic and likelihood-based methods for neural dimensionality reduction
Stimulus dimensionality-reduction methods in neuroscience seek to identify a
low-dimensional space of stimulus features that affect a neuron's probability
of spiking. One popular method, known as maximally informative dimensions
(MID), uses an information-theoretic quantity known as "single-spike
information" to identify this space. Here we examine MID from a model-based
perspective. We show that MID is a maximum-likelihood estimator for the
parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical
single-spike information corresponds to the normalized log-likelihood under a
Poisson model. This equivalence implies that MID does not necessarily find
maximally informative stimulus dimensions when spiking is not well described as
Poisson. We provide several examples to illustrate this shortcoming, and derive
a lower bound on the information lost when spiking is Bernoulli in discrete
time bins. To overcome this limitation, we introduce model-based dimensionality
reduction methods for neurons with non-Poisson firing statistics, and show that
they can be framed equivalently in likelihood-based or information-theoretic
terms. Finally, we show how to overcome practical limitations on the number of
stimulus dimensions that MID can estimate by constraining the form of the
non-parametric nonlinearity in an LNP model. We illustrate these methods with
simulations and data from primate visual cortex
Correction: The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction
Notice of republication:
This article was republished on April 23, 2015, to correct errors in the equations that were introduced during the typesetting process. The publisher apologizes for the errors. Please download this article again to view the correct version
Transfer Entropy as a Log-likelihood Ratio
Transfer entropy, an information-theoretic measure of time-directed
information transfer between joint processes, has steadily gained popularity in
the analysis of complex stochastic dynamics in diverse fields, including the
neurosciences, ecology, climatology and econometrics. We show that for a broad
class of predictive models, the log-likelihood ratio test statistic for the
null hypothesis of zero transfer entropy is a consistent estimator for the
transfer entropy itself. For finite Markov chains, furthermore, no explicit
model is required. In the general case, an asymptotic chi-squared distribution
is established for the transfer entropy estimator. The result generalises the
equivalence in the Gaussian case of transfer entropy and Granger causality, a
statistical notion of causal influence based on prediction via vector
autoregression, and establishes a fundamental connection between directed
information transfer and causality in the Wiener-Granger sense
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
Parametric inference in the large data limit using maximally informative models
Motivated by data-rich experiments in transcriptional regulation and sensory
neuroscience, we consider the following general problem in statistical
inference. When exposed to a high-dimensional signal S, a system of interest
computes a representation R of that signal which is then observed through a
noisy measurement M. From a large number of signals and measurements, we wish
to infer the "filter" that maps S to R. However, the standard method for
solving such problems, likelihood-based inference, requires perfect a priori
knowledge of the "noise function" mapping R to M. In practice such noise
functions are usually known only approximately, if at all, and using an
incorrect noise function will typically bias the inferred filter. Here we show
that, in the large data limit, this need for a pre-characterized noise function
can be circumvented by searching for filters that instead maximize the mutual
information I[M;R] between observed measurements and predicted representations.
Moreover, if the correct filter lies within the space of filters being
explored, maximizing mutual information becomes equivalent to simultaneously
maximizing every dependence measure that satisfies the Data Processing
Inequality. It is important to note that maximizing mutual information will
typically leave a small number of directions in parameter space unconstrained.
We term these directions "diffeomorphic modes" and present an equation that
allows these modes to be derived systematically. The presence of diffeomorphic
modes reflects a fundamental and nontrivial substructure within parameter
space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio
A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities
The hidden metric space behind complex network topologies is a fervid topic
in current network science and the hyperbolic space is one of the most studied,
because it seems associated to the structural organization of many real complex
systems. The Popularity-Similarity-Optimization (PSO) model simulates how
random geometric graphs grow in the hyperbolic space, reproducing strong
clustering and scale-free degree distribution, however it misses to reproduce
an important feature of real complex networks, which is the community
organization. The Geometrical-Preferential-Attachment (GPA) model was recently
developed to confer to the PSO also a community structure, which is obtained by
forcing different angular regions of the hyperbolic disk to have variable level
of attractiveness. However, the number and size of the communities cannot be
explicitly controlled in the GPA, which is a clear limitation for real
applications. Here, we introduce the nonuniform PSO (nPSO) model that,
differently from GPA, forces heterogeneous angular node attractiveness by
sampling the angular coordinates from a tailored nonuniform probability
distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in
other three aspects: it allows to explicitly fix the number and size of
communities; it allows to tune their mixing property through the network
temperature; it is efficient to generate networks with high clustering. After
several tests we propose the nPSO as a valid and efficient model to generate
networks with communities in the hyperbolic space, which can be adopted as a
realistic benchmark for different tasks such as community detection and link
prediction
- …