73,486 research outputs found
Simple Online and Realtime Tracking with a Deep Association Metric
Simple Online and Realtime Tracking (SORT) is a pragmatic approach to
multiple object tracking with a focus on simple, effective algorithms. In this
paper, we integrate appearance information to improve the performance of SORT.
Due to this extension we are able to track objects through longer periods of
occlusions, effectively reducing the number of identity switches. In spirit of
the original framework we place much of the computational complexity into an
offline pre-training stage where we learn a deep association metric on a
large-scale person re-identification dataset. During online application, we
establish measurement-to-track associations using nearest neighbor queries in
visual appearance space. Experimental evaluation shows that our extensions
reduce the number of identity switches by 45%, achieving overall competitive
performance at high frame rates.Comment: 5 pages, 1 figur
A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects
Tracking humans that are interacting with the other subjects or environment
remains unsolved in visual tracking, because the visibility of the human of
interests in videos is unknown and might vary over time. In particular, it is
still difficult for state-of-the-art human trackers to recover complete human
trajectories in crowded scenes with frequent human interactions. In this work,
we consider the visibility status of a subject as a fluent variable, whose
change is mostly attributed to the subject's interaction with the surrounding,
e.g., crossing behind another object, entering a building, or getting into a
vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the
causal-effect relations between an object's visibility fluent and its
activities, and develop a probabilistic graph model to jointly reason the
visibility fluent change (e.g., from visible to invisible) and track humans in
videos. We formulate this joint task as an iterative search of a feasible
causal graph structure that enables fast search algorithm, e.g., dynamic
programming method. We apply the proposed method on challenging video sequences
to evaluate its capabilities of estimating visibility fluent changes of
subjects and tracking subjects of interests over time. Results with comparisons
demonstrate that our method outperforms the alternative trackers and can
recover complete trajectories of humans in complicated scenarios with frequent
human interactions.Comment: accepted by CVPR 201
MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation
We address the problem of semi-supervised video object segmentation (VOS),
where the masks of objects of interests are given in the first frame of an
input video. To deal with challenging cases where objects are occluded or
missing, previous work relies on greedy data association strategies that make
decisions for each frame individually. In this paper, we propose a novel
approach to defer the decision making for a target object in each frame, until
a global view can be established with the entire video being taken into
consideration. Our approach is in the same spirit as Multiple Hypotheses
Tracking (MHT) methods, making several critical adaptations for the VOS
problem. We employ the bounding box (bbox) hypothesis for tracking tree
formation, and the multiple hypotheses are spawned by propagating the preceding
bbox into the detected bbox proposals within a gated region starting from the
initial object mask in the first frame. The gated region is determined by a
gating scheme which takes into account a more comprehensive motion model rather
than the simple Kalman filtering model in traditional MHT. To further design
more customized algorithms tailored for VOS, we develop a novel mask
propagation score instead of the appearance similarity score that could be
brittle due to large deformations. The mask propagation score, together with
the motion score, determines the affinity between the hypotheses during tree
pruning. Finally, a novel mask merging strategy is employed to handle mask
conflicts between objects. Extensive experiments on challenging datasets
demonstrate the effectiveness of the proposed method, especially in the case of
object missing.Comment: accepted to CVPR 2019 as oral presentatio
Poisson multi-Bernoulli mixture trackers: continuity through random finite sets of trajectories
The Poisson multi-Bernoulli mixture (PMBM) is an unlabelled multi-target
distribution for which the prediction and update are closed. It has a Poisson
birth process, and new Bernoulli components are generated on each new
measurement as a part of the Bayesian measurement update. The PMBM filter is
similar to the multiple hypothesis tracker (MHT), but seemingly does not
provide explicit continuity between time steps. This paper considers a recently
developed formulation of the multi-target tracking problem as a random finite
set (RFS) of trajectories, and derives two trajectory RFS filters, called PMBM
trackers. The PMBM trackers efficiently estimate the set of trajectories, and
share hypothesis structure with the PMBM filter. By showing that the prediction
and update in the PMBM filter can be viewed as an efficient method for
calculating the time marginals of the RFS of trajectories, continuity in the
same sense as MHT is established for the PMBM filter
Presenting GECO : an eyetracking corpus of monolingual and bilingual sentence reading
This paper introduces GECO, the Ghent Eye-tracking Corpus, a monolingual and bilingual corpus of eye-tracking data of participants reading a complete novel. English monolinguals and Dutch-English bilinguals read an entire novel, which was presented in paragraphs on the screen. The bilinguals read half of the novel in their first language, and the other half in their second language. In this paper we describe the distributions and descriptive statistics of the most important reading time measures for the two groups of participants. This large eye-tracking corpus is perfectly suited for both exploratory purposes as well as more directed hypothesis testing, and it can guide the formulation of ideas and theories about naturalistic reading processes in a meaningful context. Most importantly, this corpus has the potential to evaluate the generalizability of monolingual and bilingual language theories and models to reading of long texts and narratives
Online Visual Robot Tracking and Identification using Deep LSTM Networks
Collaborative robots working on a common task are necessary for many
applications. One of the challenges for achieving collaboration in a team of
robots is mutual tracking and identification. We present a novel pipeline for
online visionbased detection, tracking and identification of robots with a
known and identical appearance. Our method runs in realtime on the limited
hardware of the observer robot. Unlike previous works addressing robot tracking
and identification, we use a data-driven approach based on recurrent neural
networks to learn relations between sequential inputs and outputs. We formulate
the data association problem as multiple classification problems. A deep LSTM
network was trained on a simulated dataset and fine-tuned on small set of real
data. Experiments on two challenging datasets, one synthetic and one real,
which include long-term occlusions, show promising results.Comment: IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), Vancouver, Canada, 2017. IROS RoboCup Best Paper Awar
Mediating between AI and highly specialized users
We report part of the design experience gained in X-Media, a system for knowledge management and sharing. Consolidated techniques of interaction design (scenario-based design) had to be revisited to capture the richness and complexity of intelligent interactive systems. We show that the design of intelligent systems requires methodologies (faceted scenarios) that support the investigation of intelligent features and usability factors simultaneously. Interaction designers become mediators between intelligent technology and users, and have to facilitate reciprocal understanding
- …