548 research outputs found
Hierarchical Temporal Representation in Linear Reservoir Computing
Recently, studies on deep Reservoir Computing (RC) highlighted the role of
layering in deep recurrent neural networks (RNNs). In this paper, the use of
linear recurrent units allows us to bring more evidence on the intrinsic
hierarchical temporal representation in deep RNNs through frequency analysis
applied to the state signals. The potentiality of our approach is assessed on
the class of Multiple Superimposed Oscillator tasks. Furthermore, our
investigation provides useful insights to open a discussion on the main aspects
that characterize the deep learning framework in the temporal domain.Comment: This is a pre-print of the paper submitted to the 27th Italian
Workshop on Neural Networks, WIRN 201
Learning activation functions from data using cubic spline interpolation
Neural networks require a careful design in order to perform properly on a
given task. In particular, selecting a good activation function (possibly in a
data-dependent fashion) is a crucial step, which remains an open problem in the
research community. Despite a large amount of investigations, most current
implementations simply select one fixed function from a small set of
candidates, which is not adapted during training, and is shared among all
neurons throughout the different layers. However, neither two of these
assumptions can be supposed optimal in practice. In this paper, we present a
principled way to have data-dependent adaptation of the activation functions,
which is performed independently for each neuron. This is achieved by
leveraging over past and present advances on cubic spline interpolation,
allowing for local adaptation of the functions around their regions of use. The
resulting algorithm is relatively cheap to implement, and overfitting is
counterbalanced by the inclusion of a novel damping criterion, which penalizes
unwanted oscillations from a predefined shape. Experimental results validate
the proposal over two well-known benchmarks.Comment: Submitted to the 27th Italian Workshop on Neural Networks (WIRN 2017
Learning to Learn with Variational Information Bottleneck for Domain Generalization
Domain generalization models learn to generalize to previously unseen
domains, but suffer from prediction uncertainty and domain shift. In this
paper, we address both problems. We introduce a probabilistic meta-learning
model for domain generalization, in which classifier parameters shared across
domains are modeled as distributions. This enables better handling of
prediction uncertainty on unseen domains. To deal with domain shift, we learn
domain-invariant representations by the proposed principle of meta variational
information bottleneck, we call MetaVIB. MetaVIB is derived from novel
variational bounds of mutual information, by leveraging the meta-learning
setting of domain generalization. Through episodic training, MetaVIB learns to
gradually narrow domain gaps to establish domain-invariant representations,
while simultaneously maximizing prediction accuracy. We conduct experiments on
three benchmarks for cross-domain visual recognition. Comprehensive ablation
studies validate the benefits of MetaVIB for domain generalization. The
comparison results demonstrate our method outperforms previous approaches
consistently.Comment: 15 pages, 4 figures, ECCV202
Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decision Theory
Decision theory formally solves the problem of rational agents in uncertain
worlds if the true environmental probability distribution is known.
Solomonoff's theory of universal induction formally solves the problem of
sequence prediction for unknown distribution. We unify both theories and give
strong arguments that the resulting universal AIXI model behaves optimal in any
computable environment. The major drawback of the AIXI model is that it is
uncomputable. To overcome this problem, we construct a modified algorithm
AIXI^tl, which is still superior to any other time t and space l bounded agent.
The computation time of AIXI^tl is of the order t x 2^l.Comment: 8 two-column pages, latex2e, 1 figure, submitted to ijca
Meta-Tracker: Fast and Robust Online Adaptation for Visual Object Trackers
This paper improves state-of-the-art visual object trackers that use online
adaptation. Our core contribution is an offline meta-learning-based method to
adjust the initial deep networks used in online adaptation-based tracking. The
meta learning is driven by the goal of deep networks that can quickly be
adapted to robustly model a particular target in future frames. Ideally the
resulting models focus on features that are useful for future frames, and avoid
overfitting to background clutter, small parts of the target, or noise. By
enforcing a small number of update iterations during meta-learning, the
resulting networks train significantly faster. We demonstrate this approach on
top of the high performance tracking approaches: tracking-by-detection based
MDNet and the correlation based CREST. Experimental results on standard
benchmarks, OTB2015 and VOT2016, show that our meta-learned versions of both
trackers improve speed, accuracy, and robustness.Comment: Code: https://github.com/silverbottlep/meta_tracker
Evaluating Two-Stream CNN for Video Classification
Videos contain very rich semantic information. Traditional hand-crafted
features are known to be inadequate in analyzing complex video semantics.
Inspired by the huge success of the deep learning methods in analyzing image,
audio and text data, significant efforts are recently being devoted to the
design of deep nets for video analytics. Among the many practical needs,
classifying videos (or video clips) based on their major semantic categories
(e.g., "skiing") is useful in many applications. In this paper, we conduct an
in-depth study to investigate important implementation options that may affect
the performance of deep nets on video classification. Our evaluations are
conducted on top of a recent two-stream convolutional neural network (CNN)
pipeline, which uses both static frames and motion optical flows, and has
demonstrated competitive performance against the state-of-the-art methods. In
order to gain insights and to arrive at a practical guideline, many important
options are studied, including network architectures, model fusion, learning
parameters and the final prediction methods. Based on the evaluations, very
competitive results are attained on two popular video classification
benchmarks. We hope that the discussions and conclusions from this work can
help researchers in related fields to quickly set up a good basis for further
investigations along this very promising direction.Comment: ACM ICMR'1
Self-Modification of Policy and Utility Function in Rational Agents
Any agent that is part of the environment it interacts with and has versatile
actuators (such as arms and fingers), will in principle have the ability to
self-modify -- for example by changing its own source code. As we continue to
create more and more intelligent agents, chances increase that they will learn
about this ability. The question is: will they want to use it? For example,
highly intelligent systems may find ways to change their goals to something
more easily achievable, thereby `escaping' the control of their designers. In
an important paper, Omohundro (2008) argued that goal preservation is a
fundamental drive of any intelligent system, since a goal is more likely to be
achieved if future versions of the agent strive towards the same goal. In this
paper, we formalise this argument in general reinforcement learning, and
explore situations where it fails. Our conclusion is that the self-modification
possibility is harmless if and only if the value function of the agent
anticipates the consequences of self-modifications and use the current utility
function when evaluating the future.Comment: Artificial General Intelligence (AGI) 201
Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
Good old on-line back-propagation for plain multi-layer perceptrons yields a
very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All
we need to achieve this best result so far are many hidden layers, many neurons
per layer, numerous deformed training images, and graphics cards to greatly
speed up learning.Comment: 14 pages, 2 figures, 4 listing
Information theoretic approach to interactive learning
The principles of statistical mechanics and information theory play an
important role in learning and have inspired both theory and the design of
numerous machine learning algorithms. The new aspect in this paper is a focus
on integrating feedback from the learner. A quantitative approach to
interactive learning and adaptive behavior is proposed, integrating model- and
decision-making into one theoretical framework. This paper follows simple
principles by requiring that the observer's world model and action policy
should result in maximal predictive power at minimal complexity. Classes of
optimal action policies and of optimal models are derived from an objective
function that reflects this trade-off between prediction and complexity. The
resulting optimal models then summarize, at different levels of abstraction,
the process's causal organization in the presence of the learner's actions. A
fundamental consequence of the proposed principle is that the learner's optimal
action policies balance exploration and control as an emerging property.
Interestingly, the explorative component is present in the absence of policy
randomness, i.e. in the optimal deterministic behavior. This is a direct result
of requiring maximal predictive power in the presence of feedback.Comment: 6 page
Fuzzy Gravitons From Uncertain Spacetime
The recently proposed remarkable mechanism explaining ``stringy exclusion
principle" on an Anti de Sitter space is shown to be another beautiful
manifestation of spacetime uncertainty principle in string theory as well as in
M theory. Put in another way, once it is realized that the graviton of a given
angular momentum is represented by a spherical brane, we deduce the maximal
angular momentum directly from either the relation
in M theory or \Delta t\Delta x>\ap in string theory. We also show that the
result of hep-th/0003075 is similar to results on D2-branes in SU(2) WZW model.
Using the dual D2-brane representation of a membrane, we obtain the
quantization condition for the size of the membrane.Comment: 10 pages, harvmac. v2: a ref. and a note added; v3: A remark and one
more ref. adde
- …