753 research outputs found
A Fusion Framework for Camouflaged Moving Foreground Detection in the Wavelet Domain
Detecting camouflaged moving foreground objects has been known to be
difficult due to the similarity between the foreground objects and the
background. Conventional methods cannot distinguish the foreground from
background due to the small differences between them and thus suffer from
under-detection of the camouflaged foreground objects. In this paper, we
present a fusion framework to address this problem in the wavelet domain. We
first show that the small differences in the image domain can be highlighted in
certain wavelet bands. Then the likelihood of each wavelet coefficient being
foreground is estimated by formulating foreground and background models for
each wavelet band. The proposed framework effectively aggregates the
likelihoods from different wavelet bands based on the characteristics of the
wavelet transform. Experimental results demonstrated that the proposed method
significantly outperformed existing methods in detecting camouflaged foreground
objects. Specifically, the average F-measure for the proposed algorithm was
0.87, compared to 0.71 to 0.8 for the other state-of-the-art methods.Comment: 13 pages, accepted by IEEE TI
ReFu: Refine and Fuse the Unobserved View for Detail-Preserving Single-Image 3D Human Reconstruction
Single-image 3D human reconstruction aims to reconstruct the 3D textured
surface of the human body given a single image. While implicit function-based
methods recently achieved reasonable reconstruction performance, they still
bear limitations showing degraded quality in both surface geometry and texture
from an unobserved view. In response, to generate a realistic textured surface,
we propose ReFu, a coarse-to-fine approach that refines the projected backside
view image and fuses the refined image to predict the final human body. To
suppress the diffused occupancy that causes noise in projection images and
reconstructed meshes, we propose to train occupancy probability by
simultaneously utilizing 2D and 3D supervisions with occupancy-based volume
rendering. We also introduce a refinement architecture that generates
detail-preserving backside-view images with front-to-back warping. Extensive
experiments demonstrate that our method achieves state-of-the-art performance
in 3D human reconstruction from a single image, showing enhanced geometry and
texture quality from an unobserved view.Comment: Accepted at ACM MM 202
Visual Perception of Garments for their Robotic Manipulation
TĂ©matem pĹ™edloĹľenĂ© práce je strojovĂ© vnĂmánĂ textiliĂ zaloĹľenĂ© na obrazovĂ© informaci a vyuĹľitĂ© pro jejich robotickou manipulaci. Práce studuje nÄ›kolik reprezentativnĂch textiliĂ v běžnĂ˝ch kognitivnÄ›-manipulaÄŤnĂch Ăşlohách, jako je napĹ™Ăklad tĹ™ĂdÄ›nĂ neznámĂ˝ch odÄ›vĹŻ podle typu nebo jejich skládánĂ. NÄ›kterĂ© z tÄ›chto ÄŤinnostĂ by v budoucnu mohly bĂ˝t vykonávány domácĂmi robotickĂ˝mi pomocnĂky. Strojová manipulace s textiliemi je poptávaná takĂ© v prĹŻmyslu. HlavnĂ vĂ˝zvou Ĺ™ešenĂ©ho problĂ©mu je mÄ›kkost a s tĂm souvisejĂcĂ vysoká deformovatelnost textiliĂ, kterĂ© se tak mohou nacházet v bezpoÄŤtu vizuálnÄ› velmi odlišnĂ˝ch stavĹŻ.The presented work addresses the visual perception of garments applied for their robotic manipulation. Various types of garments are considered in the typical perception and manipulation tasks, including their classification, folding or unfolding. Our work is motivated by the possibility of having humanoid household robots performing these tasks for us in the future, as well as by the industrial applications. The main challenge is the high deformability of garments, which can be posed in infinitely many configurations with a significantly varying appearance
Re-identification and semantic retrieval of pedestrians in video surveillance scenarios
Person re-identification consists of recognizing individuals across different sensors of a camera
network. Whereas clothing appearance cues are widely used, other modalities could
be exploited as additional information sources, like anthropometric measures and gait. In
this work we investigate whether the re-identification accuracy of clothing appearance descriptors
can be improved by fusing them with anthropometric measures extracted from
depth data, using RGB-Dsensors, in unconstrained settings. We also propose a dissimilaritybased
framework for building and fusing multi-modal descriptors of pedestrian images for
re-identification tasks, as an alternative to the widely used score-level fusion. The experimental
evaluation is carried out on two data sets including RGB-D data, one of which is a
novel, publicly available data set that we acquired using Kinect sensors.
In this dissertation we also consider a related task, named semantic retrieval of pedestrians
in video surveillance scenarios, which consists of searching images of individuals using
a textual description of clothing appearance as a query, given by a Boolean combination of
predefined attributes. This can be useful in applications like forensic video analysis, where
the query can be obtained froma eyewitness report. We propose a general method for implementing
semantic retrieval as an extension of a given re-identification system that uses any
multiple part-multiple component appearance descriptor. Additionally, we investigate on
deep learning techniques to improve both the accuracy of attribute detectors and generalization
capabilities. Finally, we experimentally evaluate our methods on several benchmark
datasets originally built for re-identification task
- …