273 research outputs found
Alice Benchmarks: Connecting Real World Object Re-Identification with the Synthetic
For object re-identification (re-ID), learning from synthetic data has become
a promising strategy to cheaply acquire large-scale annotated datasets and
effective models, with few privacy concerns. Many interesting research problems
arise from this strategy, e.g., how to reduce the domain gap between synthetic
source and real-world target. To facilitate developing more new approaches in
learning from synthetic data, we introduce the Alice benchmarks, large-scale
datasets providing benchmarks as well as evaluation protocols to the research
community. Within the Alice benchmarks, two object re-ID tasks are offered:
person and vehicle re-ID. We collected and annotated two challenging real-world
target datasets: AlicePerson and AliceVehicle, captured under various
illuminations, image resolutions, etc. As an important feature of our real
target, the clusterability of its training set is not manually guaranteed to
make it closer to a real domain adaptation test scenario. Correspondingly, we
reuse existing PersonX and VehicleX as synthetic source domains. The primary
goal is to train models from synthetic data that can work effectively in the
real world. In this paper, we detail the settings of Alice benchmarks, provide
an analysis of existing commonly-used domain adaptation methods, and discuss
some interesting future directions. An online server will be set up for the
community to evaluate methods conveniently and fairly.Comment: 9 pages, 4 figures, 4 table
Deep Feature Learning and Adaptation for Computer Vision
We are living in times when a revolution of deep learning is taking place. In general, deep learning models have a backbone that extracts features from the input data followed by task-specific layers, e.g. for classification. This dissertation proposes various deep feature extraction and adaptation methods to improve task-specific learning, such as visual re-identification, tracking, and domain adaptation. The vehicle re-identification (VRID) task requires identifying a given vehicle among a set of vehicles under variations in viewpoint, illumination, partial occlusion, and background clutter. We propose a novel local graph aggregation module for feature extraction to improve VRID performance. We also utilize a class-balanced loss to compensate for the unbalanced class distribution in the training dataset. Overall, our framework achieves state-of-the-art (SOTA) performance in multiple VRID benchmarks. We further extend our VRID method for visual object tracking under occlusion conditions. We motivate visual object tracking from aerial platforms by conducting a benchmarking of tracking methods on aerial datasets. Our study reveals that the current techniques have limited capabilities to re-identify objects when fully occluded or out of view. The Siamese network based trackers perform well compared to others in overall tracking performance. We utilize our VRID work in visual object tracking and propose Siam-ReID, a novel tracking method using a Siamese network and VRID technique. In another approach, we propose SiamGauss, a novel Siamese network with a Gaussian Head for improved confuser suppression and real time performance. Our approach achieves SOTA performance on aerial visual object tracking datasets. A related area of research is developing deep learning based domain adaptation techniques. We propose continual unsupervised domain adaptation, a novel paradigm for domain adaptation in data constrained environments. We show that existing works fail to generalize when the target domain data are acquired in small batches. We propose to use a buffer to store samples that are previously seen by the network and a novel loss function to improve the performance of continual domain adaptation. We further extend our continual unsupervised domain adaptation research for gradually varying domains. Our method outperforms several SOTA methods even though they have the entire domain data available during adaptation
Cascaded Regression Tracking: Towards Online Hard Distractor Discrimination
Visual tracking can be easily disturbed by similar surrounding objects. Such
objects as hard distractors, even though being the minority among negative
samples, increase the risk of target drift and model corruption, which deserve
additional attention in online tracking and model update. To enhance the
tracking robustness, in this paper, we propose a cascaded regression tracker
with two sequential stages. In the first stage, we filter out abundant
easily-identified negative candidates via an efficient convolutional
regression. In the second stage, a discrete sampling based ridge regression is
designed to double-check the remaining ambiguous hard samples, which serves as
an alternative of fully-connected layers and benefits from the closed-form
solver for efficient learning. Extensive experiments are conducted on 11
challenging tracking benchmarks including OTB-2013, OTB-2015, VOT2018, VOT2019,
UAV123, Temple-Color, NfS, TrackingNet, LaSOT, UAV20L, and OxUvA. The proposed
method achieves state-of-the-art performance on prevalent benchmarks, while
running in a real-time speed.Comment: Accepted by IEEE TCSV
Visual Contact Pressure Estimation for Grippers in the Wild
Sensing contact pressure applied by a gripper can benefit autonomous and
teleoperated robotic manipulation, but adding tactile sensors to a gripper's
surface can be difficult or impractical. If a gripper visibly deforms, contact
pressure can be visually estimated using images from an external camera that
observes the gripper. While researchers have demonstrated this capability in
controlled laboratory settings, prior work has not addressed challenges
associated with visual pressure estimation in the wild, where lighting,
surfaces, and other factors vary widely. We present a model and associated
methods that enable visual pressure estimation under widely varying conditions.
Our model, Visual Pressure Estimation for Robots (ViPER), takes an image from
an eye-in-hand camera as input and outputs an image representing the pressure
applied by a soft gripper. Our key insight is that force/torque sensing can be
used as a weak label to efficiently collect training data in settings where
pressure measurements would be difficult to obtain. When trained on this weakly
labeled data combined with fully labeled data that includes pressure
measurements, ViPER outperforms prior methods, enables precision manipulation
in cluttered settings, and provides accurate estimates for unseen conditions
relevant to in-home use.Comment: Accepted for presentation at the 2023 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2023
- …