114 research outputs found
Learning noise-induced transitions by multi-scaling reservoir computing
Noise is usually regarded as adversarial to extract the effective dynamics
from time series, such that the conventional data-driven approaches usually aim
at learning the dynamics by mitigating the noisy effect. However, noise can
have a functional role of driving transitions between stable states underlying
many natural and engineered stochastic dynamics. To capture such stochastic
transitions from data, we find that leveraging a machine learning model,
reservoir computing as a type of recurrent neural network, can learn
noise-induced transitions. We develop a concise training protocol for tuning
hyperparameters, with a focus on a pivotal hyperparameter controlling the time
scale of the reservoir dynamics. The trained model generates accurate
statistics of transition time and the number of transitions. The approach is
applicable to a wide class of systems, including a bistable system under a
double-well potential, with either white noise or colored noise. It is also
aware of the asymmetry of the double-well potential, the rotational dynamics
caused by non-detailed balance, and transitions in multi-stable systems. For
the experimental data of protein folding, it learns the transition time between
folded states, providing a possibility of predicting transition statistics from
a small dataset. The results demonstrate the capability of machine-learning
methods in capturing noise-induced phenomena
SeCo: Exploring Sequence Supervision for Unsupervised Representation Learning
A steady momentum of innovations and breakthroughs has convincingly pushed
the limits of unsupervised image representation learning. Compared to static 2D
images, video has one more dimension (time). The inherent supervision existing
in such sequential structure offers a fertile ground for building unsupervised
learning models. In this paper, we compose a trilogy of exploring the basic and
generic supervision in the sequence from spatial, spatiotemporal and sequential
perspectives. We materialize the supervisory signals through determining
whether a pair of samples is from one frame or from one video, and whether a
triplet of samples is in the correct temporal order. We uniquely regard the
signals as the foundation in contrastive learning and derive a particular form
named Sequence Contrastive Learning (SeCo). SeCo shows superior results under
the linear protocol on action recognition (Kinetics), untrimmed activity
recognition (ActivityNet) and object tracking (OTB-100). More remarkably, SeCo
demonstrates considerable improvements over recent unsupervised pre-training
techniques, and leads the accuracy by 2.96% and 6.47% against fully-supervised
ImageNet pre-training in action recognition task on UCF101 and HMDB51,
respectively. Source code is available at
\url{https://github.com/YihengZhang-CV/SeCo-Sequence-Contrastive-Learning}.Comment: AAAI 2021; Code is publicly available at:
https://github.com/YihengZhang-CV/SeCo-Sequence-Contrastive-Learnin
Learning Spatio-Temporal Representation with Local and Global Diffusion
Convolutional Neural Networks (CNN) have been regarded as a powerful class of
models for visual recognition problems. Nevertheless, the convolutional filters
in these networks are local operations while ignoring the large-range
dependency. Such drawback becomes even worse particularly for video
recognition, since video is an information-intensive media with complex
temporal variations. In this paper, we present a novel framework to boost the
spatio-temporal representation learning by Local and Global Diffusion (LGD).
Specifically, we construct a novel neural network architecture that learns the
local and global representations in parallel. The architecture is composed of
LGD blocks, where each block updates local and global features by modeling the
diffusions between these two representations. Diffusions effectively interact
two aspects of information, i.e., localized and holistic, for more powerful way
of representation learning. Furthermore, a kernelized classifier is introduced
to combine the representations from two aspects for video recognition. Our LGD
networks achieve clear improvements on the large-scale Kinetics-400 and
Kinetics-600 video classification datasets against the best competitors by 3.5%
and 0.7%. We further examine the generalization of both the global and local
representations produced by our pre-trained LGD networks on four different
benchmarks for video action recognition and spatio-temporal action detection
tasks. Superior performances over several state-of-the-art techniques on these
benchmarks are reported. Code is available at:
https://github.com/ZhaofanQiu/local-and-global-diffusion-networks.Comment: CVPR 201
- …