69,139 research outputs found
XL-NBT: A Cross-lingual Neural Belief Tracking Framework
Task-oriented dialog systems are becoming pervasive, and many companies
heavily rely on them to complement human agents for customer service in call
centers. With globalization, the need for providing cross-lingual customer
support becomes more urgent than ever. However, cross-lingual support poses
great challenges---it requires a large amount of additional annotated data from
native speakers. In order to bypass the expensive human annotation and achieve
the first step towards the ultimate goal of building a universal dialog system,
we set out to build a cross-lingual state tracking framework. Specifically, we
assume that there exists a source language with dialog belief tracking
annotations while the target languages have no annotated dialog data of any
form. Then, we pre-train a state tracker for the source language as a teacher,
which is able to exploit easy-to-access parallel data. We then distill and
transfer its own knowledge to the student state tracker in target languages. We
specifically discuss two types of common parallel resources: bilingual corpus
and bilingual dictionary, and design different transfer learning strategies
accordingly. Experimentally, we successfully use English state tracker as the
teacher to transfer its knowledge to both Italian and German trackers and
achieve promising results.Comment: 13 pages, 5 figures, 3 tables, accepted to EMNLP 2018 conferenc
Coarse-to-Fine Adaptive People Detection for Video Sequences by Maximizing Mutual Information
Applying people detectors to unseen data is challenging since patterns distributions, such
as viewpoints, motion, poses, backgrounds, occlusions and people sizes, may significantly differ
from the ones of the training dataset. In this paper, we propose a coarse-to-fine framework to adapt
frame by frame people detectors during runtime classification, without requiring any additional
manually labeled ground truth apart from the offline training of the detection model. Such adaptation
make use of multiple detectors mutual information, i.e., similarities and dissimilarities of detectors
estimated and agreed by pair-wise correlating their outputs. Globally, the proposed adaptation
discriminates between relevant instants in a video sequence, i.e., identifies the representative frames
for an adaptation of the system. Locally, the proposed adaptation identifies the best configuration
(i.e., detection threshold) of each detector under analysis, maximizing the mutual information to
obtain the detection threshold of each detector. The proposed coarse-to-fine approach does not
require training the detectors for each new scenario and uses standard people detector outputs, i.e.,
bounding boxes. The experimental results demonstrate that the proposed approach outperforms
state-of-the-art detectors whose optimal threshold configurations are previously determined and
fixed from offline training dataThis work has been partially supported by the Spanish government under the project TEC2014-53176-R
(HAVideo
Online Domain Adaptation for Multi-Object Tracking
Automatically detecting, labeling, and tracking objects in videos depends
first and foremost on accurate category-level object detectors. These might,
however, not always be available in practice, as acquiring high-quality large
scale labeled training datasets is either too costly or impractical for all
possible real-world application scenarios. A scalable solution consists in
re-using object detectors pre-trained on generic datasets. This work is the
first to investigate the problem of on-line domain adaptation of object
detectors for causal multi-object tracking (MOT). We propose to alleviate the
dataset bias by adapting detectors from category to instances, and back: (i) we
jointly learn all target models by adapting them from the pre-trained one, and
(ii) we also adapt the pre-trained model on-line. We introduce an on-line
multi-task learning algorithm to efficiently share parameters and reduce drift,
while gradually improving recall. Our approach is applicable to any linear
object detector, and we evaluate both cheap "mini-Fisher Vectors" and expensive
"off-the-shelf" ConvNet features. We quantitatively measure the benefit of our
domain adaptation strategy on the KITTI tracking benchmark and on a new dataset
(PASCAL-to-KITTI) we introduce to study the domain mismatch problem in MOT.Comment: To appear at BMVC 201
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation
We address the problem of semi-supervised video object segmentation (VOS),
where the masks of objects of interests are given in the first frame of an
input video. To deal with challenging cases where objects are occluded or
missing, previous work relies on greedy data association strategies that make
decisions for each frame individually. In this paper, we propose a novel
approach to defer the decision making for a target object in each frame, until
a global view can be established with the entire video being taken into
consideration. Our approach is in the same spirit as Multiple Hypotheses
Tracking (MHT) methods, making several critical adaptations for the VOS
problem. We employ the bounding box (bbox) hypothesis for tracking tree
formation, and the multiple hypotheses are spawned by propagating the preceding
bbox into the detected bbox proposals within a gated region starting from the
initial object mask in the first frame. The gated region is determined by a
gating scheme which takes into account a more comprehensive motion model rather
than the simple Kalman filtering model in traditional MHT. To further design
more customized algorithms tailored for VOS, we develop a novel mask
propagation score instead of the appearance similarity score that could be
brittle due to large deformations. The mask propagation score, together with
the motion score, determines the affinity between the hypotheses during tree
pruning. Finally, a novel mask merging strategy is employed to handle mask
conflicts between objects. Extensive experiments on challenging datasets
demonstrate the effectiveness of the proposed method, especially in the case of
object missing.Comment: accepted to CVPR 2019 as oral presentatio
- …