67,641 research outputs found
Learning and Communications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling
In this paper, we consider a remote inference system, where a neural network
is used to infer a time-varying target (e.g., robot movement), based on
features (e.g., video clips) that are progressively received from a sensing
node (e.g., a camera). Each feature is a temporal sequence of sensory data. The
learning performance of the system is determined by (i) the timeliness and (ii)
the temporal sequence length of the features, where we use Age of Information
(AoI) as a metric for timeliness. While a longer feature can typically provide
better learning performance, it often requires more channel resources for
sending the feature. To minimize the time-averaged inference error, we study a
learning and communication co-design problem that jointly optimizes feature
length selection and transmission scheduling. When there is a single
sensor-predictor pair and a single channel, we develop low-complexity optimal
co-designs for both the cases of time-invariant and time-variant feature
length. When there are multiple sensor-predictor pairs and multiple channels,
the co-design problem becomes a restless multi-arm multi-action bandit problem
that is PSPACE-hard. For this setting, we design a low-complexity algorithm to
solve the problem. Trace-driven evaluations suggest that the proposed
co-designs can significantly reduce the time-averaged inference error of remote
inference systems.Comment: 41 pages, 8 figures. The manuscript has been submitted to IEEE
Journal on Selected Areas in Information Theor
A Logical Characterization of Constraint-Based Causal Discovery
We present a novel approach to constraint-based causal discovery, that takes
the form of straightforward logical inference, applied to a list of simple,
logical statements about causal relations that are derived directly from
observed (in)dependencies. It is both sound and complete, in the sense that all
invariant features of the corresponding partial ancestral graph (PAG) are
identified, even in the presence of latent variables and selection bias. The
approach shows that every identifiable causal relation corresponds to one of
just two fundamental forms. More importantly, as the basic building blocks of
the method do not rely on the detailed (graphical) structure of the
corresponding PAG, it opens up a range of new opportunities, including more
robust inference, detailed accountability, and application to large models
The Stochastic complexity of spin models: Are pairwise models really simple?
Models can be simple for different reasons: because they yield a simple and
computationally efficient interpretation of a generic dataset (e.g. in terms of
pairwise dependences) - as in statistical learning - or because they capture
the essential ingredients of a specific phenomenon - as e.g. in physics -
leading to non-trivial falsifiable predictions. In information theory and
Bayesian inference, the simplicity of a model is precisely quantified in the
stochastic complexity, which measures the number of bits needed to encode its
parameters. In order to understand how simple models look like, we study the
stochastic complexity of spin models with interactions of arbitrary order. We
highlight the existence of invariances with respect to bijections within the
space of operators, which allow us to partition the space of all models into
equivalence classes, in which models share the same complexity. We thus found
that the complexity (or simplicity) of a model is not determined by the order
of the interactions, but rather by their mutual arrangements. Models where
statistical dependencies are localized on non-overlapping groups of few
variables (and that afford predictions on independencies that are easy to
falsify) are simple. On the contrary, fully connected pairwise models, which
are often used in statistical learning, appear to be highly complex, because of
their extended set of interactions
Emergence of Invariance and Disentanglement in Deep Representations
Using established principles from Statistics and Information Theory, we show
that invariance to nuisance factors in a deep neural network is equivalent to
information minimality of the learned representation, and that stacking layers
and injecting noise during training naturally bias the network towards learning
invariant representations. We then decompose the cross-entropy loss used during
training and highlight the presence of an inherent overfitting term. We propose
regularizing the loss by bounding such a term in two equivalent ways: One with
a Kullbach-Leibler term, which relates to a PAC-Bayes perspective; the other
using the information in the weights as a measure of complexity of a learned
model, yielding a novel Information Bottleneck for the weights. Finally, we
show that invariance and independence of the components of the representation
learned by the network are bounded above and below by the information in the
weights, and therefore are implicitly optimized during training. The theory
enables us to quantify and predict sharp phase transitions between underfitting
and overfitting of random labels when using our regularized loss, which we
verify in experiments, and sheds light on the relation between the geometry of
the loss function, invariance properties of the learned representation, and
generalization error.Comment: Deep learning, neural network, representation, flat minima,
information bottleneck, overfitting, generalization, sufficiency, minimality,
sensitivity, information complexity, stochastic gradient descent,
regularization, total correlation, PAC-Baye
- …