7 research outputs found
Minimum Description Length codes are critical
In the Minimum Description Length (MDL) principle, learning from the data is
equivalent to an optimal coding problem. We show that the codes that achieve
optimal compression in MDL are critical in a very precise sense. First, when
they are taken as generative models of samples, they generate samples with
broad empirical distributions and with a high value of the relevance, defined
as the entropy of the empirical frequencies. These results are derived for
different statistical models (Dirichlet model, independent and pairwise
dependent spin models, and restricted Boltzmann machines). Second, MDL codes
sit precisely at a second order phase transition point where the symmetry
between the sampled outcomes is spontaneously broken. The order parameter
controlling the phase transition is the coding cost of the samples. The phase
transition is a manifestation of the optimality of MDL codes, and it arises
because codes that achieve a higher compression do not exist. These results
suggest a clear interpretation of the widespread occurrence of statistical
criticality as a characterization of samples which are maximally informative on
the underlying generative process.Comment: 23 pages, 5 figures; Corrected the author name, revised Section 2.2
(Large Deviations of the Universal Codes Exhibit Phase Transitions),
corrected Eq. (89
Multiscale relevance and informative encoding in neuronal spike trains
Neuronal responses to complex stimuli and tasks can encompass a wide range of
time scales. Understanding these responses requires measures that characterize
how the information on these response patterns are represented across multiple
temporal resolutions. In this paper we propose a metric -- which we call
multiscale relevance (MSR) -- to capture the dynamical variability of the
activity of single neurons across different time scales. The MSR is a
non-parametric, fully featureless indicator in that it uses only the time
stamps of the firing activity without resorting to any a priori covariate or
invoking any specific structure in the tuning curve for neural activity. When
applied to neural data from the mEC and from the ADn and PoS regions of
freely-behaving rodents, we found that neurons having low MSR tend to have low
mutual information and low firing sparsity across the correlates that are
believed to be encoded by the region of the brain where the recordings were
made. In addition, neurons with high MSR contain significant information on
spatial navigation and allow to decode spatial position or head direction as
efficiently as those neurons whose firing activity has high mutual information
with the covariate to be decoded and significantly better than the set of
neurons with high local variations in their interspike intervals. Given these
results, we propose that the MSR can be used as a measure to rank and select
neurons for their information content without the need to appeal to any a
priori covariate.Comment: 38 pages, 16 figure
Filtering Statistics on Networks
We explored the statistics of filtering of simple patterns on a number of
deterministic and random graphs as a tractable simple example of information
processing in complex systems. In this problem, multiple inputs map to the same
output, and the statistics of filtering is represented by the distribution of
this degeneracy. For a few simple filter patterns on a ring we obtained an
exact solution of the problem and described numerically more difficult filter
setups. For each of the filter patterns and networks we found a few numbers
essentially describing the statistics of filtering and compared them for
different networks. Our results for networks with diverse architectures appear
to be essentially determined by two factors: whether the graphs structure is
deterministic or random, and the vertex degree. We find that filtering in
random graphs produces a much richer statistics than in deterministic graphs.
This statistical richness is reduced by increasing the graph's degree.Comment: 21 pages, 8 figures, 3 table
Quality assessment and community detection methods for anonymized mobility data in the Italian Covid context
We discuss how to assess the reliability of partial, anonymized mobility data and compare two different methods to identify spatial communities based on movements: Greedy Modularity Clustering (GMC) and the novel Critical Variable Selection (CVS). These capture different aspects of mobility: direct population fluxes (GMC) and the probability for individuals to move between two nodes (CVS). As a test case, we consider movements of Italians before and during the SARS-Cov2 pandemic, using Facebook users’ data and publicly available information from the Italian National Institute of Statistics (Istat) to construct daily mobility networks at the interprovincial level. Using the Perron-Frobenius (PF) theorem, we show how the mean stochastic network has a stationary population density state comparable with data from Istat, and how this ceases to be the case if even a moderate amount of pruning is applied to the network. We then identify the first two national lockdowns through temporal clustering of the mobility networks, define two representative graphs for the lockdown and non-lockdown conditions and perform optimal spatial community identification on both graphs using the GMC and CVS approaches. Despite the fundamental differences in the methods, the variation of information (VI) between them assesses that they return similar partitions of the Italian provincial networks in both situations. The information provided can be used to inform policy, for example, to define an optimal scale for lockdown measures. Our approach is general and can be applied to other countries or geographical scales