2,556 research outputs found
Analysis of Hand Segmentation in the Wild
A large number of works in egocentric vision have concentrated on action and
object recognition. Detection and segmentation of hands in first-person videos,
however, has less been explored. For many applications in this domain, it is
necessary to accurately segment not only hands of the camera wearer but also
the hands of others with whom he is interacting. Here, we take an in-depth look
at the hand segmentation problem. In the quest for robust hand segmentation
methods, we evaluated the performance of the state of the art semantic
segmentation methods, off the shelf and fine-tuned, on existing datasets. We
fine-tune RefineNet, a leading semantic segmentation method, for hand
segmentation and find that it does much better than the best contenders.
Existing hand segmentation datasets are collected in the laboratory settings.
To overcome this limitation, we contribute by collecting two new datasets: a)
EgoYouTubeHands including egocentric videos containing hands in the wild, and
b) HandOverFace to analyze the performance of our models in presence of similar
appearance occlusions. We further explore whether conditional random fields can
help refine generated hand segmentations. To demonstrate the benefit of
accurate hand maps, we train a CNN for hand-based activity recognition and
achieve higher accuracy when a CNN was trained using hand maps produced by the
fine-tuned RefineNet. Finally, we annotate a subset of the EgoHands dataset for
fine-grained action recognition and show that an accuracy of 58.6% can be
achieved by just looking at a single hand pose which is much better than the
chance level (12.5%).Comment: Accepted at CVPR 201
Transparent Object Tracking with Enhanced Fusion Module
Accurate tracking of transparent objects, such as glasses, plays a critical
role in many robotic tasks such as robot-assisted living. Due to the adaptive
and often reflective texture of such objects, traditional tracking algorithms
that rely on general-purpose learned features suffer from reduced performance.
Recent research has proposed to instill transparency awareness into existing
general object trackers by fusing purpose-built features. However, with the
existing fusion techniques, the addition of new features causes a change in the
latent space making it impossible to incorporate transparency awareness on
trackers with fixed latent spaces. For example, many of the current days
transformer-based trackers are fully pre-trained and are sensitive to any
latent space perturbations. In this paper, we present a new feature fusion
technique that integrates transparency information into a fixed feature space,
enabling its use in a broader range of trackers. Our proposed fusion module,
composed of a transformer encoder and an MLP module, leverages key query-based
transformations to embed the transparency information into the tracking
pipeline. We also present a new two-step training strategy for our fusion
module to effectively merge transparency features. We propose a new tracker
architecture that uses our fusion techniques to achieve superior results for
transparent object tracking. Our proposed method achieves competitive results
with state-of-the-art trackers on TOTB, which is the largest transparent object
tracking benchmark recently released. Our results and the implementation of
code will be made publicly available at https://github.com/kalyan0510/TOTEM.Comment: IEEE IROS 202
TRansPose: Large-Scale Multispectral Dataset for Transparent Object
Transparent objects are encountered frequently in our daily lives, yet
recognizing them poses challenges for conventional vision sensors due to their
unique material properties, not being well perceived from RGB or depth cameras.
Overcoming this limitation, thermal infrared cameras have emerged as a
solution, offering improved visibility and shape information for transparent
objects. In this paper, we present TRansPose, the first large-scale
multispectral dataset that combines stereo RGB-D, thermal infrared (TIR)
images, and object poses to promote transparent object research. The dataset
includes 99 transparent objects, encompassing 43 household items, 27 recyclable
trashes, 29 chemical laboratory equivalents, and 12 non-transparent objects. It
comprises a vast collection of 333,819 images and 4,000,056 annotations,
providing instance-level segmentation masks, ground-truth poses, and completed
depth information. The data was acquired using a FLIR A65 thermal infrared
(TIR) camera, two Intel RealSense L515 RGB-D cameras, and a Franka Emika Panda
robot manipulator. Spanning 87 sequences, TRansPose covers various challenging
real-life scenarios, including objects filled with water, diverse lighting
conditions, heavy clutter, non-transparent or translucent containers, objects
in plastic bags, and multi-stacked objects. TRansPose dataset can be accessed
from the following link: https://sites.google.com/view/transpose-datasetComment: Under revie
- …