756 research outputs found
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
In meta-learning an agent extracts knowledge from observed tasks, aiming to
facilitate learning of novel future tasks. Under the assumption that future
tasks are 'related' to previous tasks, the accumulated knowledge should be
learned in a way which captures the common structure across learned tasks,
while allowing the learner sufficient flexibility to adapt to novel aspects of
new tasks. We present a framework for meta-learning that is based on
generalization error bounds, allowing us to extend various PAC-Bayes bounds to
meta-learning. Learning takes place through the construction of a distribution
over hypotheses based on the observed tasks, and its utilization for learning a
new task. Thus, prior knowledge is incorporated through setting an
experience-dependent prior for novel tasks. We develop a gradient-based
algorithm which minimizes an objective function derived from the bounds and
demonstrate its effectiveness numerically with deep neural networks. In
addition to establishing the improved performance available through
meta-learning, we demonstrate the intuitive way by which prior information is
manifested at different levels of the network.Comment: Accepted to ICML 201
A neural network walks into a lab: towards using deep nets as models for human behavior
What might sound like the beginning of a joke has become an attractive
prospect for many cognitive scientists: the use of deep neural network models
(DNNs) as models of human behavior in perceptual and cognitive tasks. Although
DNNs have taken over machine learning, attempts to use them as models of human
behavior are still in the early stages. Can they become a versatile model class
in the cognitive scientist's toolbox? We first argue why DNNs have the
potential to be interesting models of human behavior. We then discuss how that
potential can be more fully realized. On the one hand, we argue that the cycle
of training, testing, and revising DNNs needs to be revisited through the lens
of the cognitive scientist's goals. Specifically, we argue that methods for
assessing the goodness of fit between DNN models and human behavior have to
date been impoverished. On the other hand, cognitive science might have to
start using more complex tasks (including richer stimulus spaces), but doing so
might be beneficial for DNN-independent reasons as well. Finally, we highlight
avenues where traditional cognitive process models and DNNs may show productive
synergy
The Open World of Micro-Videos
Micro-videos are six-second videos popular on social media networks with
several unique properties. Firstly, because of the authoring process, they
contain significantly more diversity and narrative structure than existing
collections of video "snippets". Secondly, because they are often captured by
hand-held mobile cameras, they contain specialized viewpoints including
third-person, egocentric, and self-facing views seldom seen in traditional
produced video. Thirdly, due to to their continuous production and publication
on social networks, aggregate micro-video content contains interesting
open-world dynamics that reflects the temporal evolution of tag topics. These
aspects make micro-videos an appealing well of visual data for developing
large-scale models for video understanding. We analyze a novel dataset of
micro-videos labeled with 58 thousand tags. To analyze this data, we introduce
viewpoint-specific and temporally-evolving models for video understanding,
defined over state-of-the-art motion and deep visual features. We conclude that
our dataset opens up new research opportunities for large-scale video analysis,
novel viewpoints, and open-world dynamics
An Adaptive Online HDP-HMM for Segmentation and Classification of Sequential Data
In the recent years, the desire and need to understand sequential data has
been increasing, with particular interest in sequential contexts such as
patient monitoring, understanding daily activities, video surveillance, stock
market and the like. Along with the constant flow of data, it is critical to
classify and segment the observations on-the-fly, without being limited to a
rigid number of classes. In addition, the model needs to be capable of updating
its parameters to comply with possible evolutions. This interesting problem,
however, is not adequately addressed in the literature since many studies focus
on offline classification over a pre-defined class set. In this paper, we
propose a principled solution to this gap by introducing an adaptive online
system based on Markov switching models with hierarchical Dirichlet process
priors. This infinite adaptive online approach is capable of segmenting and
classifying the sequential data over unlimited number of classes, while meeting
the memory and delay constraints of streaming contexts. The model is further
enhanced by introducing a learning rate, responsible for balancing the extent
to which the model sustains its previous learning (parameters) or adapts to the
new streaming observations. Experimental results on several variants of
stationary and evolving synthetic data and two video datasets, TUM Assistive
Kitchen and collatedWeizmann, show remarkable performance in segmentation and
classification, particularly for evolutionary sequences with changing
distributions and/or containing new, unseen classes.Comment: 23 pages, 9 figures and 4 table
Uncertainty-based Modulation for Lifelong Learning
The creation of machine learning algorithms for intelligent agents capable of
continuous, lifelong learning is a critical objective for algorithms being
deployed on real-life systems in dynamic environments. Here we present an
algorithm inspired by neuromodulatory mechanisms in the human brain that
integrates and expands upon Stephen Grossberg\'s ground-breaking Adaptive
Resonance Theory proposals. Specifically, it builds on the concept of
uncertainty, and employs a series of neuromodulatory mechanisms to enable
continuous learning, including self-supervised and one-shot learning. Algorithm
components were evaluated in a series of benchmark experiments that demonstrate
stable learning without catastrophic forgetting. We also demonstrate the
critical role of developing these systems in a closed-loop manner where the
environment and the agent\'s behaviors constrain and guide the learning
process. To this end, we integrated the algorithm into an embodied simulated
drone agent. The experiments show that the algorithm is capable of continuous
learning of new tasks and under changed conditions with high classification
accuracy (greater than 94 percent) in a virtual environment, without
catastrophic forgetting. The algorithm accepts high dimensional inputs from any
state-of-the-art detection and feature extraction algorithms, making it a
flexible addition to existing systems. We also describe future development
efforts focused on imbuing the algorithm with mechanisms to seek out new
knowledge as well as employ a broader range of neuromodulatory processes
Meta-learners' learning dynamics are unlike learners'
Meta-learning is a tool that allows us to build sample-efficient learning
systems. Here we show that, once meta-trained, LSTM Meta-Learners aren't just
faster learners than their sample-inefficient deep learning (DL) and
reinforcement learning (RL) brethren, but that they actually pursue
fundamentally different learning trajectories. We study their learning dynamics
on three sets of structured tasks for which the corresponding learning dynamics
of DL and RL systems have been previously described: linear regression (Saxe et
al., 2013), nonlinear regression (Rahaman et al., 2018; Xu et al., 2018), and
contextual bandits (Schaul et al., 2019). In each case, while
sample-inefficient DL and RL Learners uncover the task structure in a staggered
manner, meta-trained LSTM Meta-Learners uncover almost all task structure
concurrently, congruent with the patterns expected from Bayes-optimal inference
algorithms. This has implications for research areas wherever the learning
behaviour itself is of interest, such as safety, curriculum design, and
human-in-the-loop machine learning.Comment: 26 pages, 23 figure
Nested LSTMs
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple
levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to
stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell,
which has its own inner memory cell. Specifically, instead of computing the
value of the (outer) memory cell as , NLSTM memory cells use the concatenation as input to an inner LSTM (or NLSTM) memory cell, and set
= . Nested LSTMs outperform both stacked and
single-layer LSTMs with similar numbers of parameters in our experiments on
various character-level language modeling tasks, and the inner memories of an
LSTM learn longer term dependencies compared with the higher-level units of a
stacked LSTM.Comment: Accepted at ACML 201
Detection and Tracking of General Movable Objects in Large 3D Maps
This paper studies the problem of detection and tracking of general objects
with long-term dynamics, observed by a mobile robot moving in a large
environment. A key problem is that due to the environment scale, it can only
observe a subset of the objects at any given time. Since some time passes
between observations of objects in different places, the objects might be moved
when the robot is not there. We propose a model for this movement in which the
objects typically only move locally, but with some small probability they jump
longer distances, through what we call global motion. For filtering, we
decompose the posterior over local and global movements into two linked
processes. The posterior over the global movements and measurement associations
is sampled, while we track the local movement analytically using Kalman
filters. This novel filter is evaluated on point cloud data gathered
autonomously by a mobile robot over an extended period of time. We show that
tracking jumping objects is feasible, and that the proposed probabilistic
treatment outperforms previous methods when applied to real world data. The key
to efficient probabilistic tracking in this scenario is focused sampling of the
object posteriors.Comment: Submitted for peer revie
Induction Networks for Few-Shot Text Classification
Text classification tends to struggle when data is deficient or when it needs
to adapt to unseen classes. In such challenging scenarios, recent studies have
used meta-learning to simulate the few-shot task, in which new queries are
compared to a small support set at the sample-wise level. However, this
sample-wise comparison may be severely disturbed by the various expressions in
the same class. Therefore, we should be able to learn a general representation
of each class in the support set and then compare it to new queries. In this
paper, we propose a novel Induction Network to learn such a generalized
class-wise representation, by innovatively leveraging the dynamic routing
algorithm in meta-learning. In this way, we find the model is able to induce
and generalize better. We evaluate the proposed model on a well-studied
sentiment classification dataset (English) and a real-world dialogue intent
classification dataset (Chinese). Experiment results show that on both
datasets, the proposed model significantly outperforms the existing
state-of-the-art approaches, proving the effectiveness of class-wise
generalization in few-shot text classification.Comment: 7 pages, 3 figure
Provable Guarantees for Gradient-Based Meta-Learning
We study the problem of meta-learning through the lens of online convex
optimization, developing a meta-algorithm bridging the gap between popular
gradient-based meta-learning and classical regularization-based multi-task
transfer methods. Our method is the first to simultaneously satisfy good sample
efficiency guarantees in the convex setting, with generalization bounds that
improve with task-similarity, while also being computationally scalable to
modern deep learning architectures and the many-task setting. Despite its
simplicity, the algorithm matches, up to a constant factor, a lower bound on
the performance of any such parameter-transfer method under natural task
similarity assumptions. We use experiments in both convex and deep learning
settings to verify and demonstrate the applicability of our theory.Comment: ICML 201
- β¦