2,285 research outputs found
Full Reference Objective Quality Assessment for Reconstructed Background Images
With an increased interest in applications that require a clean background
image, such as video surveillance, object tracking, street view imaging and
location-based services on web-based maps, multiple algorithms have been
developed to reconstruct a background image from cluttered scenes.
Traditionally, statistical measures and existing image quality techniques have
been applied for evaluating the quality of the reconstructed background images.
Though these quality assessment methods have been widely used in the past,
their performance in evaluating the perceived quality of the reconstructed
background image has not been verified. In this work, we discuss the
shortcomings in existing metrics and propose a full reference Reconstructed
Background image Quality Index (RBQI) that combines color and structural
information at multiple scales using a probability summation model to predict
the perceived quality in the reconstructed background image given a reference
image. To compare the performance of the proposed quality index with existing
image quality assessment measures, we construct two different datasets
consisting of reconstructed background images and corresponding subjective
scores. The quality assessment measures are evaluated by correlating their
objective scores with human subjective ratings. The correlation results show
that the proposed RBQI outperforms all the existing approaches. Additionally,
the constructed datasets and the corresponding subjective scores provide a
benchmark to evaluate the performance of future metrics that are developed to
evaluate the perceived quality of reconstructed background images.Comment: Associated source code: https://github.com/ashrotre/RBQI, Associated
Database:
https://drive.google.com/drive/folders/1bg8YRPIBcxpKIF9BIPisULPBPcA5x-Bk?usp=sharing
(Email for permissions at: ashrotreasuedu
A fast and robust hand-driven 3D mouse
The development of new interaction paradigms requires a natural interaction. This means that people should be able to interact with technology with the same models used to interact with everyday real life, that is through gestures, expressions, voice. Following this idea, in this paper we propose a non intrusive vision based tracking system able to capture hand motion and simple hand gestures. The proposed device allows to use the hand as a "natural" 3D mouse, where the forefinger tip or the palm centre are used to identify a 3D marker and the hand gesture can be used to simulate the mouse buttons. The approach is based on a monoscopic tracking algorithm which is computationally fast and robust against noise and cluttered backgrounds. Two image streams are processed in parallel exploiting multi-core architectures, and their results are combined to obtain a constrained stereoscopic problem. The system has been implemented and thoroughly tested in an experimental environment where the 3D hand mouse has been used to interact with objects in a virtual reality application. We also provide results about the performances of the tracker, which demonstrate precision and robustness of the proposed syste
Towards Benchmarking Scene Background Initialization
Given a set of images of a scene taken at different times, the availability
of an initial background model that describes the scene without foreground
objects is the prerequisite for a wide range of applications, ranging from
video surveillance to computational photography. Even though several methods
have been proposed for scene background initialization, the lack of a common
groundtruthed dataset and of a common set of metrics makes it difficult to
compare their performance. To move first steps towards an easy and fair
comparison of these methods, we assembled a dataset of sequences frequently
adopted for background initialization, selected or created ground truths for
quantitative evaluation through a selected suite of metrics, and compared
results obtained by some existing methods, making all the material publicly
available.Comment: 6 pages, SBI dataset, SBMI2015 Worksho
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
A Framework for Symmetric Part Detection in Cluttered Scenes
The role of symmetry in computer vision has waxed and waned in importance
during the evolution of the field from its earliest days. At first figuring
prominently in support of bottom-up indexing, it fell out of favor as shape
gave way to appearance and recognition gave way to detection. With a strong
prior in the form of a target object, the role of the weaker priors offered by
perceptual grouping was greatly diminished. However, as the field returns to
the problem of recognition from a large database, the bottom-up recovery of the
parts that make up the objects in a cluttered scene is critical for their
recognition. The medial axis community has long exploited the ubiquitous
regularity of symmetry as a basis for the decomposition of a closed contour
into medial parts. However, today's recognition systems are faced with
cluttered scenes, and the assumption that a closed contour exists, i.e. that
figure-ground segmentation has been solved, renders much of the medial axis
community's work inapplicable. In this article, we review a computational
framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009,
2013), that bridges the representation power of the medial axis and the need to
recover and group an object's parts in a cluttered scene. Our framework is
rooted in the idea that a maximally inscribed disc, the building block of a
medial axis, can be modeled as a compact superpixel in the image. We evaluate
the method on images of cluttered scenes.Comment: 10 pages, 8 figure
General Dynamic Scene Reconstruction from Multiple View Video
This paper introduces a general approach to dynamic scene reconstruction from
multiple moving cameras without prior knowledge or limiting constraints on the
scene structure, appearance, or illumination. Existing techniques for dynamic
scene reconstruction from multiple wide-baseline camera views primarily focus
on accurate reconstruction in controlled environments, where the cameras are
fixed and calibrated and background is known. These approaches are not robust
for general dynamic scenes captured with sparse moving cameras. Previous
approaches for outdoor dynamic scene reconstruction assume prior knowledge of
the static background appearance and structure. The primary contributions of
this paper are twofold: an automatic method for initial coarse dynamic scene
segmentation and reconstruction without prior knowledge of background
appearance or structure; and a general robust approach for joint segmentation
refinement and dense reconstruction of dynamic scenes from multiple
wide-baseline static or moving cameras. Evaluation is performed on a variety of
indoor and outdoor scenes with cluttered backgrounds and multiple dynamic
non-rigid objects such as people. Comparison with state-of-the-art approaches
demonstrates improved accuracy in both multiple view segmentation and dense
reconstruction. The proposed approach also eliminates the requirement for prior
knowledge of scene structure and appearance
- âŚ