304 research outputs found
Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Recent work on scene classification still makes use of generic CNN features
in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline
built upon deep CNN features to harvest discriminative visual objects and parts
for scene classification. We first use a region proposal technique to generate
a set of high-quality patches potentially containing objects, and apply a
pre-trained CNN to extract generic deep features from these patches. Then we
perform both unsupervised and weakly supervised learning to screen these
patches and discover discriminative ones representing category-specific objects
and parts. We further apply discriminative clustering enhanced with local CNN
fine-tuning to aggregate similar objects and parts into groups, called meta
objects. A scene image representation is constructed by pooling the feature
response maps of all the learned meta objects at multiple spatial scales. We
have confirmed that the scene image representation obtained using this new
pipeline is capable of delivering state-of-the-art performance on two popular
scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and
Sun397~\cite{Sun397}Comment: To Appear in ICCV 201
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and Manipulation Tasks
Sim2Real (Simulation to Reality) techniques have gained prominence in robotic
manipulation and motion planning due to their ability to enhance success rates
by enabling agents to test and evaluate various policies and trajectories. In
this paper, we investigate the advantages of integrating Sim2Real into robotic
frameworks. We introduce the Triple Regression Sim2Real framework, which
constructs a real-time digital twin. This twin serves as a replica of reality
to simulate and evaluate multiple plans before their execution in real-world
scenarios. Our triple regression approach addresses the reality gap by: (1)
mitigating projection errors between real and simulated camera perspectives
through the first two regression models, and (2) detecting discrepancies in
robot control using the third regression model. Experiments on 6-DoF grasp and
manipulation tasks (where the gripper can approach from any direction)
highlight the effectiveness of our framework. Remarkably, with only RGB input
images, our method achieves state-of-the-art success rates. This research
advances efficient robot training methods and sets the stage for rapid
advancements in robotics and automation
- …