16,446 research outputs found
Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks
Skeleton based action recognition distinguishes human actions using the
trajectories of skeleton joints, which provide a very good representation for
describing actions. Considering that recurrent neural networks (RNNs) with Long
Short-Term Memory (LSTM) can learn feature representations and model long-term
temporal dependencies automatically, we propose an end-to-end fully connected
deep LSTM network for skeleton based action recognition. Inspired by the
observation that the co-occurrences of the joints intrinsically characterize
human actions, we take the skeleton as the input at each time slot and
introduce a novel regularization scheme to learn the co-occurrence features of
skeleton joints. To train the deep LSTM network effectively, we propose a new
dropout algorithm which simultaneously operates on the gates, cells, and output
responses of the LSTM neurons. Experimental results on three human action
recognition datasets consistently demonstrate the effectiveness of the proposed
model.Comment: AAAI 2016 conferenc
Patent Citation Dynamics Modeling via Multi-Attention Recurrent Networks
Modeling and forecasting forward citations to a patent is a central task for
the discovery of emerging technologies and for measuring the pulse of inventive
progress. Conventional methods for forecasting these forward citations cast the
problem as analysis of temporal point processes which rely on the conditional
intensity of previously received citations. Recent approaches model the
conditional intensity as a chain of recurrent neural networks to capture memory
dependency in hopes of reducing the restrictions of the parametric form of the
intensity function. For the problem of patent citations, we observe that
forecasting a patent's chain of citations benefits from not only the patent's
history itself but also from the historical citations of assignees and
inventors associated with that patent. In this paper, we propose a
sequence-to-sequence model which employs an attention-of-attention mechanism to
capture the dependencies of these multiple time sequences. Furthermore, the
proposed model is able to forecast both the timestamp and the category of a
patent's next citation. Extensive experiments on a large patent citation
dataset collected from USPTO demonstrate that the proposed model outperforms
state-of-the-art models at forward citation forecasting
Interacting Attention-gated Recurrent Networks for Recommendation
Capturing the temporal dynamics of user preferences over items is important
for recommendation. Existing methods mainly assume that all time steps in
user-item interaction history are equally relevant to recommendation, which
however does not apply in real-world scenarios where user-item interactions can
often happen accidentally. More importantly, they learn user and item dynamics
separately, thus failing to capture their joint effects on user-item
interactions. To better model user and item dynamics, we present the
Interacting Attention-gated Recurrent Network (IARN) which adopts the attention
model to measure the relevance of each time step. In particular, we propose a
novel attention scheme to learn the attention scores of user and item history
in an interacting way, thus to account for the dependencies between user and
item dynamics in shaping user-item interactions. By doing so, IARN can
selectively memorize different time steps of a user's history when predicting
her preferences over different items. Our model can therefore provide
meaningful interpretations for recommendation results, which could be further
enhanced by auxiliary features. Extensive validation on real-world datasets
shows that IARN consistently outperforms state-of-the-art methods.Comment: Accepted by ACM International Conference on Information and Knowledge
Management (CIKM), 201
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
Modeling The Intensity Function Of Point Process Via Recurrent Neural Networks
Event sequence, asynchronously generated with random timestamp, is ubiquitous
among applications. The precise and arbitrary timestamp can carry important
clues about the underlying dynamics, and has lent the event data fundamentally
different from the time-series whereby series is indexed with fixed and equal
time interval. One expressive mathematical tool for modeling event is point
process. The intensity functions of many point processes involve two
components: the background and the effect by the history. Due to its inherent
spontaneousness, the background can be treated as a time series while the other
need to handle the history events. In this paper, we model the background by a
Recurrent Neural Network (RNN) with its units aligned with time series indexes
while the history effect is modeled by another RNN whose units are aligned with
asynchronous events to capture the long-range dynamics. The whole model with
event type and timestamp prediction output layers can be trained end-to-end.
Our approach takes an RNN perspective to point process, and models its
background and history effect. For utility, our method allows a black-box
treatment for modeling the intensity which is often a pre-defined parametric
form in point processes. Meanwhile end-to-end training opens the venue for
reusing existing rich techniques in deep network for point process modeling. We
apply our model to the predictive maintenance problem using a log dataset by
more than 1000 ATMs from a global bank headquartered in North America.Comment: Accepted at Thirty-First AAAI Conference on Artificial Intelligence
(AAAI17
- …