105,236 research outputs found
Exploring Temporal Preservation Networks for Precise Temporal Action Localization
Temporal action localization is an important task of computer vision. Though
a variety of methods have been proposed, it still remains an open question how
to predict the temporal boundaries of action segments precisely. Most works use
segment-level classifiers to select video segments pre-determined by action
proposal or dense sliding windows. However, in order to achieve more precise
action boundaries, a temporal localization system should make dense predictions
at a fine granularity. A newly proposed work exploits
Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the
predictions of 3D ConvNets, making it possible to perform per-frame action
predictions and achieving promising performance in terms of temporal action
localization. However, CDC network loses temporal information partially due to
the temporal downsampling operation. In this paper, we propose an elegant and
powerful Temporal Preservation Convolutional (TPC) Network that equips 3D
ConvNets with TPC filters. TPC network can fully preserve temporal resolution
and downsample the spatial resolution simultaneously, enabling frame-level
granularity action localization. TPC network can be trained in an end-to-end
manner. Experiment results on public datasets show that TPC network achieves
significant improvement on per-frame action prediction and competing results on
segment-level temporal action localization
Convolutional Drift Networks for Video Classification
Analyzing spatio-temporal data like video is a challenging task that requires
processing visual and temporal information effectively. Convolutional Neural
Networks have shown promise as baseline fixed feature extractors through
transfer learning, a technique that helps minimize the training cost on visual
information. Temporal information is often handled using hand-crafted features
or Recurrent Neural Networks, but this can be overly specific or prohibitively
complex. Building a fully trainable system that can efficiently analyze
spatio-temporal data without hand-crafted features or complex training is an
open challenge. We present a new neural network architecture to address this
challenge, the Convolutional Drift Network (CDN). Our CDN architecture combines
the visual feature extraction power of deep Convolutional Neural Networks with
the intrinsically efficient temporal processing provided by Reservoir
Computing. In this introductory paper on the CDN, we provide a very simple
baseline implementation tested on two egocentric (first-person) video activity
datasets.We achieve video-level activity classification results on-par with
state-of-the art methods. Notably, performance on this complex spatio-temporal
task was produced by only training a single feed-forward layer in the CDN.Comment: Published in IEEE Rebooting Computin
Stable Electromyographic Sequence Prediction During Movement Transitions using Temporal Convolutional Networks
Transient muscle movements influence the temporal structure of myoelectric
signal patterns, often leading to unstable prediction behavior from
movement-pattern classification methods. We show that temporal convolutional
network sequential models leverage the myoelectric signal's history to discover
contextual temporal features that aid in correctly predicting movement
intentions, especially during interclass transitions. We demonstrate
myoelectric classification using temporal convolutional networks to effect 3
simultaneous hand and wrist degrees-of-freedom in an experiment involving nine
human-subjects. Temporal convolutional networks yield significant
performance improvements over other state-of-the-art methods in terms of both
classification accuracy and stability.Comment: 4 pages, 5 figures, accepted for Neural Engineering (NER) 2019
Conferenc
Topical Behavior Prediction from Massive Logs
In this paper, we study the topical behavior in a large scale. We use the
network logs where each entry contains the entity ID, the timestamp, and the
meta data about the activity. Both the temporal and the spatial relationships
of the behavior are explored with the deep learning architectures combing the
recurrent neural network (RNN) and the convolutional neural network (CNN). To
make the behavioral data appropriate for the spatial learning in the CNN, we
propose several reduction steps to form the topical metrics and to place them
homogeneously like pixels in the images. The experimental result shows both
temporal and spatial gains when compared against a multilayer perceptron (MLP)
network. A new learning framework called the spatially connected convolutional
networks (SCCN) is introduced to predict the topical metrics more efficiently
Short-Term Forecasting of Passenger Demand under On-Demand Ride Services: A Spatio-Temporal Deep Learning Approach
Short-term passenger demand forecasting is of great importance to the
on-demand ride service platform, which can incentivize vacant cars moving from
over-supply regions to over-demand regions. The spatial dependences, temporal
dependences, and exogenous dependences need to be considered simultaneously,
however, which makes short-term passenger demand forecasting challenging. We
propose a novel deep learning (DL) approach, named the fusion convolutional
long short-term memory network (FCL-Net), to address these three dependences
within one end-to-end learning architecture. The model is stacked and fused by
multiple convolutional long short-term memory (LSTM) layers, standard LSTM
layers, and convolutional layers. The fusion of convolutional techniques and
the LSTM network enables the proposed DL approach to better capture the
spatio-temporal characteristics and correlations of explanatory variables. A
tailored spatially aggregated random forest is employed to rank the importance
of the explanatory variables. The ranking is then used for feature selection.
The proposed DL approach is applied to the short-term forecasting of passenger
demand under an on-demand ride service platform in Hangzhou, China.
Experimental results, validated on real-world data provided by DiDi Chuxing,
show that the FCL-Net achieves better predictive performance than traditional
approaches including both classical time-series prediction models and neural
network based algorithms (e.g., artificial neural network and LSTM). This paper
is one of the first DL studies to forecast the short-term passenger demand of
an on-demand ride service platform by examining the spatio-temporal
correlations.Comment: 39 pages, 10 figure
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection
We address the problem of activity detection in continuous, untrimmed video
streams. This is a difficult task that requires extracting meaningful
spatio-temporal features to capture activities, accurately localizing the start
and end times of each activity. We introduce a new model, Region Convolutional
3D Network (R-C3D), which encodes the video streams using a three-dimensional
fully convolutional network, then generates candidate temporal regions
containing activities, and finally classifies selected regions into specific
activities. Computation is saved due to the sharing of convolutional features
between the proposal and the classification pipelines. The entire model is
trained end-to-end with jointly optimized localization and classification
losses. R-C3D is faster than existing methods (569 frames per second on a
single Titan X Maxwell GPU) and achieves state-of-the-art results on THUMOS'14.
We further demonstrate that our model is a general activity detection framework
that does not rely on assumptions about particular dataset properties by
evaluating our approach on ActivityNet and Charades. Our code is available at
http://ai.bu.edu/r-c3d/.Comment: ICCV 2017 Camera Ready Versio
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment
In this work, we address the task of weakly-supervised human action
segmentation in long, untrimmed videos. Recent methods have relied on expensive
learning models, such as Recurrent Neural Networks (RNN) and Hidden Markov
Models (HMM). However, these methods suffer from expensive computational cost,
thus are unable to be deployed in large scale. To overcome the limitations, the
keys to our design are efficiency and scalability. We propose a novel action
modeling framework, which consists of a new temporal convolutional network,
named Temporal Convolutional Feature Pyramid Network (TCFPN), for predicting
frame-wise action labels, and a novel training strategy for weakly-supervised
sequence modeling, named Iterative Soft Boundary Assignment (ISBA), to align
action sequences and update the network in an iterative fashion. The proposed
framework is evaluated on two benchmark datasets, Breakfast and Hollywood
Extended, with four different evaluation metrics. Extensive experimental
results show that our methods achieve competitive or superior performance to
state-of-the-art methods.Comment: CVPR 201
- …
