462 research outputs found
Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding
Abstractive community detection is an important spoken language understanding
task, whose goal is to group utterances in a conversation according to whether
they can be jointly summarized by a common abstractive sentence. This paper
provides a novel approach to this task. We first introduce a neural contextual
utterance encoder featuring three types of self-attention mechanisms. We then
train it using the siamese and triplet energy-based meta-architectures.
Experiments on the AMI corpus show that our system outperforms multiple
energy-based and non-energy based baselines from the state-of-the-art. Code and
data are publicly available.Comment: Update baseline
Self-Supervised One-Shot Learning for Automatic Segmentation of StyleGAN Images
We propose a framework for the automatic one-shot segmentation of synthetic
images generated by a StyleGAN. Our framework is based on the observation that
the multi-scale hidden features in the GAN generator hold useful semantic
information that can be utilized for automatic on-the-fly segmentation of the
generated images. Using these features, our framework learns to segment
synthetic images using a self-supervised contrastive clustering algorithm that
projects the hidden features into a compact space for per-pixel classification.
This contrastive learner is based on using a novel data augmentation strategy
and a pixel-wise swapped prediction loss that leads to faster learning of the
feature vectors for one-shot segmentation. We have tested our implementation on
five standard benchmarks to yield a segmentation performance that not only
outperforms the semi-supervised baselines by an average wIoU margin of 1.02 %
but also improves the inference speeds by a factor of 4.5. Finally, we also
show the results of using the proposed one-shot learner in implementing BagGAN,
a framework for producing annotated synthetic baggage X-ray scans for threat
detection. This framework was trained and tested on the PIDRay baggage
benchmark to yield a performance comparable to its baseline segmenter based on
manual annotations
Sketch-based Video Object Segmentation: Benchmark and Analysis
Reference-based video object segmentation is an emerging topic which aims to
segment the corresponding target object in each video frame referred by a given
reference, such as a language expression or a photo mask. However, language
expressions can sometimes be vague in conveying an intended concept and
ambiguous when similar objects in one frame are hard to distinguish by
language. Meanwhile, photo masks are costly to annotate and less practical to
provide in a real application. This paper introduces a new task of sketch-based
video object segmentation, an associated benchmark, and a strong baseline. Our
benchmark includes three datasets, Sketch-DAVIS16, Sketch-DAVIS17 and
Sketch-YouTube-VOS, which exploit human-drawn sketches as an informative yet
low-cost reference for video object segmentation. We take advantage of STCN, a
popular baseline of semi-supervised VOS task, and evaluate what the most
effective design for incorporating a sketch reference is. Experimental results
show sketch is more effective yet annotation-efficient than other references,
such as photo masks, language and scribble.Comment: BMVC 202
- …