54,705 research outputs found
Learning Rich Event Representations and Interactions for Temporal Relation Classification
International audienceMost existing systems for identifying temporal relations between events heavily rely on hand-crafted features derived from event words and explicit temporal markers. Besides, less attention has been given to automatically learning con-textualized event representations or to finding complex interactions between events. This paper fills this gap in showing that a combination of rich event representations and interaction learning is essential to more accurate temporal relation classification. Specifically, we propose a method in which i) Recurrent Neural Networks (RNN) extract contextual information ii) character embeddings capture morpho-semantic features (e.g. tense, mood, aspect), and iii) a deep Convolutional Neu-ral Network (CNN) finds out intricate interactions between events. We show that the proposed approach outperforms most existing systems on the commonly used dataset while using fully automatic feature extraction and simple local inference
Learning Rich Event Representations and Interactions for Temporal Relation Classification
International audienceMost existing systems for identifying temporal relations between events heavily rely on hand-crafted features derived from event words and explicit temporal markers. Besides, less attention has been given to automatically learning con-textualized event representations or to finding complex interactions between events. This paper fills this gap in showing that a combination of rich event representations and interaction learning is essential to more accurate temporal relation classification. Specifically, we propose a method in which i) Recurrent Neural Networks (RNN) extract contextual information ii) character embeddings capture morpho-semantic features (e.g. tense, mood, aspect), and iii) a deep Convolutional Neu-ral Network (CNN) finds out intricate interactions between events. We show that the proposed approach outperforms most existing systems on the commonly used dataset while using fully automatic feature extraction and simple local inference
A Trio Neural Model for Dynamic Entity Relatedness Ranking
Measuring entity relatedness is a fundamental task for many natural language
processing and information retrieval applications. Prior work often studies
entity relatedness in static settings and an unsupervised manner. However,
entities in real-world are often involved in many different relationships,
consequently entity-relations are very dynamic over time. In this work, we
propose a neural networkbased approach for dynamic entity relatedness,
leveraging the collective attention as supervision. Our model is capable of
learning rich and different entity representations in a joint framework.
Through extensive experiments on large-scale datasets, we demonstrate that our
method achieves better results than competitive baselines.Comment: In Proceedings of CoNLL 201
Appearance-and-Relation Networks for Video Classification
Spatiotemporal feature learning in videos is a fundamental problem in
computer vision. This paper presents a new architecture, termed as
Appearance-and-Relation Network (ARTNet), to learn video representation in an
end-to-end manner. ARTNets are constructed by stacking multiple generic
building blocks, called as SMART, whose goal is to simultaneously model
appearance and relation from RGB input in a separate and explicit manner.
Specifically, SMART blocks decouple the spatiotemporal learning module into an
appearance branch for spatial modeling and a relation branch for temporal
modeling. The appearance branch is implemented based on the linear combination
of pixels or filter responses in each frame, while the relation branch is
designed based on the multiplicative interactions between pixels or filter
responses across multiple frames. We perform experiments on three action
recognition benchmarks: Kinetics, UCF101, and HMDB51, demonstrating that SMART
blocks obtain an evident improvement over 3D convolutions for spatiotemporal
feature learning. Under the same training setting, ARTNets achieve superior
performance on these three datasets to the existing state-of-the-art methods.Comment: CVPR18 camera-ready version. Code & models available at
https://github.com/wanglimin/ARTNe
Action Recognition by Hierarchical Mid-level Action Elements
Realistic videos of human actions exhibit rich spatiotemporal structures at
multiple levels of granularity: an action can always be decomposed into
multiple finer-grained elements in both space and time. To capture this
intuition, we propose to represent videos by a hierarchy of mid-level action
elements (MAEs), where each MAE corresponds to an action-related spatiotemporal
segment in the video. We introduce an unsupervised method to generate this
representation from videos. Our method is capable of distinguishing
action-related segments from background segments and representing actions at
multiple spatiotemporal resolutions. Given a set of spatiotemporal segments
generated from the training data, we introduce a discriminative clustering
algorithm that automatically discovers MAEs at multiple levels of granularity.
We develop structured models that capture a rich set of spatial, temporal and
hierarchical relations among the segments, where the action label and multiple
levels of MAE labels are jointly inferred. The proposed model achieves
state-of-the-art performance in multiple action recognition benchmarks.
Moreover, we demonstrate the effectiveness of our model in real-world
applications such as action recognition in large-scale untrimmed videos and
action parsing
- …