226,174 research outputs found
Pattern Anomaly Detection based on Sequence-to-Sequence Regularity Learning
Anomaly detection in traffic surveillance videos is a challenging task due to the ambiguity of anomaly definition and the complexity of scenes. In this paper, we propose to detect anomalous trajectories for vehicle behavior analysis via learning regularities in data. First, we train a sequence-to-sequence model under the autoencoder architecture and propose a new reconstruction error function for model optimization and anomaly evaluation. As such, the model is forced to learn the regular trajectory patterns in an unsupervised manner. Then, at the inference stage, we use the learned model to encode the test trajectory sample into a compact representation and generate a new trajectory sequence in the learned regular pattern. An anomaly score is computed based on the deviation of the generated trajectory from the test sample. Finally, we can find out the anomalous trajectories with an adaptive threshold. We evaluate the proposed method on two real-world traffic datasets and the experiments show favorable results against state-of-the-art algorithms. This paper\u27s research on sequence-to-sequence regularity learning can provide theoretical and practical support for pattern anomaly detection
Backward Feature Correction: How Deep Learning Performs Deep Learning
How does a 110-layer ResNet learn a high-complexity classifier using
relatively few training examples and short training time? We present a theory
towards explaining this in terms of hierarchical learning. We refer
hierarchical learning as the learner learns to represent a complicated target
function by decomposing it into a sequence of simpler functions to reduce
sample and time complexity. This paper formally analyzes how multi-layer neural
networks can perform such hierarchical learning efficiently and automatically
by applying SGD.
On the conceptual side, we present, to the best of our knowledge, the FIRST
theory result indicating how deep neural networks can be sample and time
efficient on certain hierarchical learning tasks, when NO KNOWN
non-hierarchical algorithms (such as kernel method, linear regression over
feature mappings, tensor decomposition, sparse coding, and their simple
combinations) are efficient. We establish a principle called "backward feature
correction", where training higher layers in the network can improve the
features of lower level ones. We believe this is the key to understand the deep
learning process in multi-layer neural networks.
On the technical side, we show for every input dimension , there is a
concept class consisting of degree multi-variate polynomials so
that, using -layer neural networks as learners, SGD can learn any
target function from this class in time using
samples to any error, through
learning to represent it as a composition of layers of quadratic
functions. In contrast, we present lower bounds stating that several
non-hierarchical learners, including any kernel methods, neural tangent
kernels, must suffer from sample or time complexity to learn
this concept class even to error.Comment: V2 adds more experiments, V3 polishes writing and improves
experiments, V4 makes minor fixes to the figure
What can a Single Attention Layer Learn? A Study Through the Random Features Lens
Attention layers -- which map a sequence of inputs to a sequence of outputs
-- are core building blocks of the Transformer architecture which has achieved
significant breakthroughs in modern artificial intelligence. This paper
presents a rigorous theoretical study on the learning and generalization of a
single multi-head attention layer, with a sequence of key vectors and a
separate query vector as input. We consider the random feature setting where
the attention layer has a large number of heads, with randomly sampled frozen
query and key matrices, and trainable value matrices. We show that such a
random-feature attention layer can express a broad class of target functions
that are permutation invariant to the key vectors. We further provide
quantitative excess risk bounds for learning these target functions from finite
samples, using random feature attention with finitely many heads.
Our results feature several implications unique to the attention structure
compared with existing random features theory for neural networks, such as (1)
Advantages in the sample complexity over standard two-layer random-feature
networks; (2) Concrete and natural classes of functions that can be learned
efficiently by a random-feature attention layer; and (3) The effect of the
sampling distribution of the query-key weight matrix (the product of the query
and key matrix), where Gaussian random weights with a non-zero mean result in
better sample complexities over the zero-mean counterpart for learning certain
natural target functions. Experiments on simulated data corroborate our
theoretical findings and further illustrate the interplay between the sample
size and the complexity of the target function.Comment: 41pages, 5 figure
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Model-free deep reinforcement learning algorithms have been shown to be
capable of learning a wide range of robotic skills, but typically require a
very large number of samples to achieve good performance. Model-based
algorithms, in principle, can provide for much more efficient learning, but
have proven difficult to extend to expressive, high-capacity models such as
deep neural networks. In this work, we demonstrate that medium-sized neural
network models can in fact be combined with model predictive control (MPC) to
achieve excellent sample complexity in a model-based reinforcement learning
algorithm, producing stable and plausible gaits to accomplish various complex
locomotion tasks. We also propose using deep neural network dynamics models to
initialize a model-free learner, in order to combine the sample efficiency of
model-based approaches with the high task-specific performance of model-free
methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure
model-based approach trained on just random action data can follow arbitrary
trajectories with excellent sample efficiency, and that our hybrid algorithm
can accelerate model-free learning on high-speed benchmark tasks, achieving
sample efficiency gains of 3-5x on swimmer, cheetah, hopper, and ant agents.
Videos can be found at https://sites.google.com/view/mbm
Learning by stochastic serializations
Complex structures are typical in machine learning. Tailoring learning
algorithms for every structure requires an effort that may be saved by defining
a generic learning procedure adaptive to any complex structure. In this paper,
we propose to map any complex structure onto a generic form, called
serialization, over which we can apply any sequence-based density estimator. We
then show how to transfer the learned density back onto the space of original
structures. To expose the learning procedure to the structural particularities
of the original structures, we take care that the serializations reflect
accurately the structures' properties. Enumerating all serializations is
infeasible. We propose an effective way to sample representative serializations
from the complete set of serializations which preserves the statistics of the
complete set. Our method is competitive or better than state of the art
learning algorithms that have been specifically designed for given structures.
In addition, since the serialization involves sampling from a combinatorial
process it provides considerable protection from overfitting, which we clearly
demonstrate on a number of experiments.Comment: Submission to NeurIPS 201
- …