263 research outputs found
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Insufficient or even unavailable training data of emerging classes is a big
challenge of many classification tasks, including text classification.
Recognising text documents of classes that have never been seen in the learning
stage, so-called zero-shot text classification, is therefore difficult and only
limited previous works tackled this problem. In this paper, we propose a
two-phase framework together with data augmentation and feature augmentation to
solve this problem. Four kinds of semantic knowledge (word embeddings, class
descriptions, class hierarchy, and a general knowledge graph) are incorporated
into the proposed framework to deal with instances of unseen classes
effectively. Experimental results show that each and the combination of the two
phases achieve the best overall accuracy compared with baselines and recent
approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201
Semantic Image Synthesis via Adversarial Learning
In this paper, we propose a way of synthesizing realistic images directly
with natural language description, which has many useful applications, e.g.
intelligent image manipulation. We attempt to accomplish such synthesis: given
a source image and a target text description, our model synthesizes images to
meet two requirements: 1) being realistic while matching the target text
description; 2) maintaining other image features that are irrelevant to the
text description. The model should be able to disentangle the semantic
information from the two modalities (image and text), and generate new images
from the combined semantics. To achieve this, we proposed an end-to-end neural
architecture that leverages adversarial learning to automatically learn
implicit loss functions, which are optimized to fulfill the aforementioned two
requirements. We have evaluated our model by conducting experiments on
Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated
that our model is capable of synthesizing realistic images that match the given
descriptions, while still maintain other features of original images.Comment: Accepted to ICCV 201
A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion
Multi-person motion capture can be challenging due to ambiguities caused by
severe occlusion, fast body movement, and complex interactions. Existing
frameworks build on 2D pose estimations and triangulate to 3D coordinates via
reasoning the appearance, trajectory, and geometric consistencies among
multi-camera observations. However, 2D joint detection is usually incomplete
and with wrong identity assignments due to limited observation angle, which
leads to noisy 3D triangulation results. To overcome this issue, we propose to
explore the short-range autoregressive characteristics of skeletal motion using
transformer. First, we propose an adaptive, identity-aware triangulation module
to reconstruct 3D joints and identify the missing joints for each identity. To
generate complete 3D skeletal motion, we then propose a Dual-Masked
Auto-Encoder (D-MAE) which encodes the joint status with both
skeletal-structural and temporal position encoding for trajectory completion.
D-MAE's flexible masking and encoding mechanism enable arbitrary skeleton
definitions to be conveniently deployed under the same framework. In order to
demonstrate the proposed model's capability in dealing with severe data loss
scenarios, we contribute a high-accuracy and challenging motion capture dataset
of multi-person interactions with severe occlusion. Evaluations on both
benchmark and our new dataset demonstrate the efficiency of our proposed model,
as well as its advantage against the other state-of-the-art methods
- …