9,894 research outputs found
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
We propose a weakly supervised temporal action localization algorithm on
untrimmed videos using convolutional neural networks. Our algorithm learns from
video-level class labels and predicts temporal intervals of human actions with
no requirement of temporal localization annotations. We design our network to
identify a sparse subset of key segments associated with target actions in a
video using an attention module and fuse the key segments through adaptive
temporal pooling. Our loss function is comprised of two terms that minimize the
video-level action classification error and enforce the sparsity of the segment
selection. At inference time, we extract and score temporal proposals using
temporal class activations and class-agnostic attentions to estimate the time
intervals that correspond to target actions. The proposed algorithm attains
state-of-the-art results on the THUMOS14 dataset and outstanding performance on
ActivityNet1.3 even with its weak supervision.Comment: Accepted to CVPR 201
Deep learning for time series classification: a review
Time Series Classification (TSC) is an important and challenging problem in
data mining. With the increase of time series data availability, hundreds of
TSC algorithms have been proposed. Among these methods, only a few have
considered Deep Neural Networks (DNNs) to perform this task. This is surprising
as deep learning has seen very successful applications in the last years. DNNs
have indeed revolutionized the field of computer vision especially with the
advent of novel deeper architectures such as Residual and Convolutional Neural
Networks. Apart from images, sequential data such as text and audio can also be
processed with DNNs to reach state-of-the-art performance for document
classification and speech recognition. In this article, we study the current
state-of-the-art performance of deep learning algorithms for TSC by presenting
an empirical study of the most recent DNN architectures for TSC. We give an
overview of the most successful deep learning applications in various time
series domains under a unified taxonomy of DNNs for TSC. We also provide an
open source deep learning framework to the TSC community where we implemented
each of the compared approaches and evaluated them on a univariate TSC
benchmark (the UCR/UEA archive) and 12 multivariate time series datasets. By
training 8,730 deep learning models on 97 time series datasets, we propose the
most exhaustive study of DNNs for TSC to date.Comment: Accepted at Data Mining and Knowledge Discover
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
- …