185 research outputs found
Evaluation of trackers for Pan-Tilt-Zoom Scenarios
Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in
computer vision for many years. Compared to tracking with a still camera, the
images captured with a PTZ camera are highly dynamic in nature because the
camera can perform large motion resulting in quickly changing capture
conditions. Furthermore, tracking with a PTZ camera involves camera control to
position the camera on the target. For successful tracking and camera control,
the tracker must be fast enough, or has to be able to predict accurately the
next position of the target. Therefore, standard benchmarks do not allow to
assess properly the quality of a tracker for the PTZ scenario. In this work, we
use a virtual PTZ framework to evaluate different tracking algorithms and
compare their performances. We also extend the framework to add target position
prediction for the next frame, accounting for camera motion and processing
delays. By doing this, we can assess if predicting can make long-term tracking
more robust as it may help slower algorithms for keeping the target in the
field of view of the camera. Results confirm that both speed and robustness are
required for tracking under the PTZ scenario.Comment: 6 pages, 2 figures, International Conference on Pattern Recognition
and Artificial Intelligence 201
Improving Multiple Object Tracking with Optical Flow and Edge Preprocessing
In this paper, we present a new method for detecting road users in an urban
environment which leads to an improvement in multiple object tracking. Our
method takes as an input a foreground image and improves the object detection
and segmentation. This new image can be used as an input to trackers that use
foreground blobs from background subtraction. The first step is to create
foreground images for all the frames in an urban video. Then, starting from the
original blobs of the foreground image, we merge the blobs that are close to
one another and that have similar optical flow. The next step is extracting the
edges of the different objects to detect multiple objects that might be very
close (and be merged in the same blob) and to adjust the size of the original
blobs. At the same time, we use the optical flow to detect occlusion of objects
that are moving in opposite directions. Finally, we make a decision on which
information we keep in order to construct a new foreground image with blobs
that can be used for tracking. The system is validated on four videos of an
urban traffic dataset. Our method improves the recall and precision metrics for
the object detection task compared to the vanilla background subtraction method
and improves the CLEAR MOT metrics in the tracking tasks for most videos
Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices
In this paper, an automatic method is proposed to perform image registration
in visible and infrared pair of video sequences for multiple targets. In
multimodal image analysis like image fusion systems, color and IR sensors are
placed close to each other and capture a same scene simultaneously, but the
videos are not properly aligned by default because of different fields of view,
image capturing information, working principle and other camera specifications.
Because the scenes are usually not planar, alignment needs to be performed
continuously by extracting relevant common information. In this paper, we
approximate the shape of the targets by polygons and use affine transformation
for aligning the two video sequences. After background subtraction, keypoints
on the contour of the foreground blobs are detected using DCE (Discrete Curve
Evolution)technique. These keypoints are then described by the local shape at
each point of the obtained polygon. The keypoints are matched based on the
convexity of polygon's vertices and Euclidean distance between them. Only good
matches for each local shape polygon in a frame, are kept. To achieve a global
affine transformation that maximises the overlapping of infrared and visible
foreground pixels, the matched keypoints of each local shape polygon are stored
temporally in a buffer for a few number of frames. The matrix is evaluated at
each frame using the temporal buffer and the best matrix is selected, based on
an overlapping ratio criterion. Our experimental results demonstrate that this
method can provide highly accurate registered images and that we outperform a
previous related method
Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling
Conventional methods of estimating latent behaviour generally use attitudinal
questions which are subjective and these survey questions may not always be
available. We hypothesize that an alternative approach can be used for latent
variable estimation through an undirected graphical models. For instance,
non-parametric artificial neural networks. In this study, we explore the use of
generative non-parametric modelling methods to estimate latent variables from
prior choice distribution without the conventional use of measurement
indicators. A restricted Boltzmann machine is used to represent latent
behaviour factors by analyzing the relationship information between the
observed choices and explanatory variables. The algorithm is adapted for latent
behaviour analysis in discrete choice scenario and we use a graphical approach
to evaluate and understand the semantic meaning from estimated parameter vector
values. We illustrate our methodology on a financial instrument choice dataset
and perform statistical analysis on parameter sensitivity and stability. Our
findings show that through non-parametric statistical tests, we can extract
useful latent information on the behaviour of latent constructs through machine
learning methods and present strong and significant influence on the choice
process. Furthermore, our modelling framework shows robustness in input
variability through sampling and validation
Tracking in Urban Traffic Scenes from Background Subtraction and Object Detection
In this paper, we propose to combine detections from background subtraction
and from a multiclass object detector for multiple object tracking (MOT) in
urban traffic scenes. These objects are associated across frames using spatial,
colour and class label information, and trajectory prediction is evaluated to
yield the final MOT outputs. The proposed method was tested on the Urban
tracker dataset and shows competitive performances compared to state-of-the-art
approaches. Results show that the integration of different detection inputs
remains a challenging task that greatly affects the MOT performance
Background subtraction based on Local Shape
We present a novel approach to background subtraction that is based on the
local shape of small image regions. In our approach, an image region centered
on a pixel is mod-eled using the local self-similarity descriptor. We aim at
obtaining a reliable change detection based on local shape change in an image
when foreground objects are moving. The method first builds a background model
and compares the local self-similarities between the background model and the
subsequent frames to distinguish background and foreground objects.
Post-processing is then used to refine the boundaries of moving objects.
Results show that this approach is promising as the foregrounds obtained are
com-plete, although they often include shadows.Comment: 4 pages, 5 figures, 3 tabl
Video Prediction by Efficient Transformers
Video prediction is a challenging computer vision task that has a wide range
of applications. In this work, we present a new family of Transformer-based
models for video prediction. Firstly, an efficient local spatial-temporal
separation attention mechanism is proposed to reduce the complexity of standard
Transformers. Then, a full autoregressive model, a partial autoregressive model
and a non-autoregressive model are developed based on the new efficient
Transformer. The partial autoregressive model has a similar performance with
the full autoregressive model but a faster inference speed. The
non-autoregressive model not only achieves a faster inference speed but also
mitigates the quality degradation problem of the autoregressive counterparts,
but it requires additional parameters and loss function for learning. Given the
same attention mechanism, we conducted a comprehensive study to compare the
proposed three video prediction variants. Experiments show that the proposed
video prediction models are competitive with more complex state-of-the-art
convolutional-LSTM based models. The source code is available at
https://github.com/XiYe20/VPTR.Comment: Accepted by Image and Vision Computing. arXiv admin note: text
overlap with arXiv:2203.1583
TopTrack: Tracking Objects By Their Top
In recent years, the joint detection-and-tracking paradigm has been a very
popular way of tackling the multi-object tracking (MOT) task. Many of the
methods following this paradigm use the object center keypoint for detection.
However, we argue that the center point is not optimal since it is often not
visible in crowded scenarios, which results in many missed detections when the
objects are partially occluded. We propose TopTrack, a joint
detection-and-tracking method that uses the top of the object as a keypoint for
detection instead of the center because it is more often visible. Furthermore,
TopTrack processes consecutive frames in separate streams in order to
facilitate training. We performed experiments to show that using the object top
as a keypoint for detection can reduce the amount of missed detections, which
in turn leads to more complete trajectories and less lost trajectories.
TopTrack manages to achieve competitive results with other state-of-the-art
trackers on two MOT benchmarks.Comment: 14 pages, 7 figures, submitted to Machine Vision and Application
Road User Detection in Videos
Successive frames of a video are highly redundant, and the most popular
object detection methods do not take advantage of this fact. Using multiple
consecutive frames can improve detection of small objects or difficult examples
and can improve speed and detection consistency in a video sequence, for
instance by interpolating features between frames. In this work, a novel
approach is introduced to perform online video object detection using two
consecutive frames of video sequences involving road users. Two new models,
RetinaNet-Double and RetinaNet-Flow, are proposed, based respectively on the
concatenation of a target frame with a preceding frame, and the concatenation
of the optical flow with the target frame. The models are trained and evaluated
on three public datasets. Experiments show that using a preceding frame
improves performance over single frame detectors, but using explicit optical
flow usually does not
- …