1,520 research outputs found
MOT16: A Benchmark for Multi-Object Tracking
Standardized benchmarks are crucial for the majority of computer vision
applications. Although leaderboards and ranking tables should not be
over-claimed, benchmarks often provide the most objective measure of
performance and are therefore important guides for reseach.
Recently, a new benchmark for Multiple Object Tracking, MOTChallenge, was
launched with the goal of collecting existing and new data and creating a
framework for the standardized evaluation of multiple object tracking methods.
The first release of the benchmark focuses on multiple people tracking, since
pedestrians are by far the most studied object in the tracking community. This
paper accompanies a new release of the MOTChallenge benchmark. Unlike the
initial release, all videos of MOT16 have been carefully annotated following a
consistent protocol. Moreover, it not only offers a significant increase in the
number of labeled boxes, but also provides multiple object classes beside
pedestrians and the level of visibility for every single object of interest.Comment: arXiv admin note: substantial text overlap with arXiv:1504.0194
Review on Computer Vision Techniques in Emergency Situation
In emergency situations, actions that save lives and limit the impact of
hazards are crucial. In order to act, situational awareness is needed to decide
what to do. Geolocalized photos and video of the situations as they evolve can
be crucial in better understanding them and making decisions faster. Cameras
are almost everywhere these days, either in terms of smartphones, installed
CCTV cameras, UAVs or others. However, this poses challenges in big data and
information overflow. Moreover, most of the time there are no disasters at any
given location, so humans aiming to detect sudden situations may not be as
alert as needed at any point in time. Consequently, computer vision tools can
be an excellent decision support. The number of emergencies where computer
vision tools has been considered or used is very wide, and there is a great
overlap across related emergency research. Researchers tend to focus on
state-of-the-art systems that cover the same emergency as they are studying,
obviating important research in other fields. In order to unveil this overlap,
the survey is divided along four main axes: the types of emergencies that have
been studied in computer vision, the objective that the algorithms can address,
the type of hardware needed and the algorithms used. Therefore, this review
provides a broad overview of the progress of computer vision covering all sorts
of emergencies.Comment: 25 page
Multiple Object Tracking: A Literature Review
Multiple Object Tracking (MOT) is an important computer vision problem which
has gained increasing attention due to its academic and commercial potential.
Although different kinds of approaches have been proposed to tackle this
problem, it still remains challenging due to factors like abrupt appearance
changes and severe object occlusions. In this work, we contribute the first
comprehensive and most recent review on this problem. We inspect the recent
advances in various aspects and propose some interesting directions for future
research. To the best of our knowledge, there has not been any extensive review
on this topic in the community. We endeavor to provide a thorough review on the
development of this problem in recent decades. The main contributions of this
review are fourfold: 1) Key aspects in a multiple object tracking system,
including formulation, categorization, key principles, evaluation of an MOT are
discussed. 2) Instead of enumerating individual works, we discuss existing
approaches according to various aspects, in each of which methods are divided
into different groups and each group is discussed in detail for the principles,
advances and drawbacks. 3) We examine experiments of existing publications and
summarize results on popular datasets to provide quantitative comparisons. We
also point to some interesting discoveries by analyzing these results. 4) We
provide a discussion about issues of MOT research, as well as some interesting
directions which could possibly become potential research effort in the future
Tracking the Trackers: An Analysis of the State of the Art in Multiple Object Tracking
Standardized benchmarks are crucial for the majority of computer vision
applications. Although leaderboards and ranking tables should not be
over-claimed, benchmarks often provide the most objective measure of
performance and are therefore important guides for research. We present a
benchmark for Multiple Object Tracking launched in the late 2014, with the goal
of creating a framework for the standardized evaluation of multiple object
tracking methods. This paper collects the two releases of the benchmark made so
far, and provides an in-depth analysis of almost 50 state-of-the-art trackers
that were tested on over 11000 frames. We show the current trends and
weaknesses of multiple people tracking methods, and provide pointers of what
researchers should be focusing on to push the field forward
MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking
In the recent past, the computer vision community has developed centralized
benchmarks for the performance evaluation of a variety of tasks, including
generic object and pedestrian detection, 3D reconstruction, optical flow,
single-object short-term tracking, and stereo estimation. Despite potential
pitfalls of such benchmarks, they have proved to be extremely helpful to
advance the state of the art in the respective area. Interestingly, there has
been rather limited work on the standardization of quantitative benchmarks for
multiple target tracking. One of the few exceptions is the well-known PETS
dataset, targeted primarily at surveillance applications. Despite being widely
used, it is often applied inconsistently, for example involving using different
subsets of the available data, different ways of training the models, or
differing evaluation scripts. This paper describes our work toward a novel
multiple object tracking benchmark aimed to address such issues. We discuss the
challenges of creating such a framework, collecting existing and new data,
gathering state-of-the-art methods to be tested on the datasets, and finally
creating a unified evaluation system. With MOTChallenge we aim to pave the way
toward a unified evaluation framework for a more meaningful quantification of
multi-target tracking
An equalised global graphical model-based approach for multi-camera object tracking
Non-overlapping multi-camera visual object tracking typically consists of two
steps: single camera object tracking and inter-camera object tracking. Most of
tracking methods focus on single camera object tracking, which happens in the
same scene, while for real surveillance scenes, inter-camera object tracking is
needed and single camera tracking methods can not work effectively. In this
paper, we try to improve the overall multi-camera object tracking performance
by a global graph model with an improved similarity metric. Our method treats
the similarities of single camera tracking and inter-camera tracking
differently and obtains the optimization in a global graph model. The results
show that our method can work better even in the condition of poor single
camera object tracking.Comment: 13 pages, 17 figure
Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readily
provide fine-grained groundtruth for diverse tasks. However, it is not clear
whether the synthetically generated data has enough resemblance to the
real-world images to improve the performance of computer vision models in
practice. We present experiments assessing the effectiveness on real-world data
of systems trained on synthetic RGB images that are extracted from a video
game. We collected over 60000 synthetic samples from a modern video game with
similar conditions to the real-world CamVid and Cityscapes datasets. We provide
several experiments to demonstrate that the synthetically generated RGB images
can be used to improve the performance of deep neural networks on both image
segmentation and depth estimation. These results show that a convolutional
network trained on synthetic data achieves a similar test error to a network
that is trained on real-world data for dense image classification. Furthermore,
the synthetically generated RGB images can provide similar or better results
compared to the real-world datasets if a simple domain adaptation technique is
applied. Our results suggest that collaboration with game developers for an
accessible interface to gather data is potentially a fruitful direction for
future work in computer vision.Comment: To appear in the British Machine Vision Conference (BMVC), September
2016. -v2: fixed a typo in the reference
Crowd Management in Open Spaces
Crowd analysis and management is a challenging problem to ensure public
safety and security. For this purpose, many techniques have been proposed to
cope with various problems. However, the generalization capabilities of these
techniques is limited due to ignoring the fact that the density of crowd
changes from low to extreme high depending on the scene under observation. We
propose robust feature based approach to deal with the problem of crowd
management for people safety and security. We have evaluated our method using a
benchmark dataset and have presented details analysis
Crowded Scene Analysis: A Survey
Automated scene analysis has been a topic of great interest in computer
vision and cognitive science. Recently, with the growth of crowd phenomena in
the real world, crowded scene analysis has attracted much attention. However,
the visual occlusions and ambiguities in crowded scenes, as well as the complex
behaviors and scene semantics, make the analysis a challenging task. In the
past few years, an increasing number of works on crowded scene analysis have
been reported, covering different aspects including crowd motion pattern
learning, crowd behavior and activity analysis, and anomaly detection in
crowds. This paper surveys the state-of-the-art techniques on this topic. We
first provide the background knowledge and the available features related to
crowded scenes. Then, existing models, popular algorithms, evaluation
protocols, as well as system performance are provided corresponding to
different aspects of crowded scene analysis. We also outline the available
datasets for performance evaluation. Finally, some research problems and
promising future directions are presented with discussions.Comment: 20 pages in IEEE Transactions on Circuits and Systems for Video
Technology, 201
Learning Discriminative Features for Person Re-Identification
For fulfilling the requirements of public safety in modern cities, more and more large-scale surveillance camera systems are deployed, resulting in an enormous amount of visual data. Automatically processing and interpreting these data promote the development and application of visual data analytic technologies. As one of the important research topics in surveillance systems, person re-identification (re-id) aims at retrieving the target person across non-overlapping camera-views that are implemented in a number of distributed space-time locations. It is a fundamental problem for many practical surveillance applications, eg, person search, cross-camera tracking, multi-camera human behavior analysis and prediction, and it received considerable attentions nowadays from both academic and industrial domains.
Learning discriminative feature representation is an essential task in person re-id. Although many methodologies have been proposed, discriminative re-id feature extraction is still a challenging problem due to: (1) Intra- and inter-personal variations. The intrinsic properties of the camera deployment in surveillance system lead to various changes in person poses, view-points, illumination conditions etc. This may result in the large intra-personal variations and/or small inter-personal variations, thus incurring problems in matching person images. (2) Domain variations. The domain variations between different datasets give rise to the problem of generalization capability of re-id model. Directly applying a re-id model trained on one dataset to another one usually causes a large performance degradation. (3) Difficulties in data creation and annotation. Existing person re-id methods, especially deep re-id methods, rely mostly on a large set of inter-camera identity labelled training data, requiring a tedious data collection and annotation process. This leads to poor scalability in practical person re-id applications.
Corresponding to the challenges in learning discriminative re-id features, this thesis contributes to the re-id domain by proposing three related methodologies and one new re-id setting:
(1) Gaussian mixture importance estimation. Handcrafted features are usually not discriminative enough for person re-id because of noisy information, such as background clutters. To precisely evaluate the similarities between person images, the main task of distance metric learning is to filter out the noisy information. Keep It Simple and Straightforward MEtric (KISSME) is an effective method in person re-id. However, it is sensitive to the feature dimensionality and cannot capture the multi-modes in dataset. To this end, a Gaussian Mixture Importance Estimation re-id approach is proposed, which exploits the Gaussian Mixture Models for estimating the observed commonalities of similar and dissimilar person pairs in the feature space.
(2) Unsupervised domain-adaptive person re-id based on pedestrian attributes. In person re-id, person identities are usually not overlapped among different domains (or datasets) and this raises the difficulties in generalizing re-id models. Different from person identity, pedestrian attributes, eg., hair length, clothes type and color, are consistent across different domains (or datasets). However, most of re-id datasets lack attribute annotations. On the other hand, in the field of pedestrian attribute recognition, there is a number of datasets labeled with attributes. Exploiting such data for re-id purpose can alleviate the shortage of attribute annotations in re-id domain and improve the generalization capability of re-id model. To this end, an unsupervised domain-adaptive re-id feature learning framework is proposed to make full use of attribute annotations. Specifically, an existing unsupervised domain adaptation method has been extended to transfer attribute-based features from attribute recognition domain to the re-id domain. With the proposed re-id feature learning framework, the domain invariant feature representations can be effectively extracted.
(3) Intra-camera supervised person re-id. Annotating the large-scale re-id datasets requires a tedious data collection and annotation process and therefore leads to poor scalability in practical person re-id applications. To overcome this fundamental limitation, a new person re-id setting is considered without inter-camera identity association but only with identity labels independently annotated within each camera-view. This eliminates the most time-consuming and tedious inter-camera identity association annotating process and thus significantly reduces the amount of human efforts required during annotation. It hence gives rise to a more scalable and more feasible learning scenario, which is named as Intra-Camera Supervised (ICS) person re-id. Under this ICS setting, a new re-id method, i.e., Multi-task Mulit-label (MATE) learning method, is formulated. Given no inter-camera association,
MATE is specially designed for self-discovering the inter-camera identity correspondence. This is achieved by inter-camera multi-label learning under a joint multi-task inference framework. In addition, MATE can also efficiently learn the discriminative re-id feature representations using the available identity labels within each camera-view
- …