58,803 research outputs found
Segmentation via Manipulation
The motivation for this paper is the observation that a static scene that contains more than one object/part most of the time cannot be segmented only by vision or in general by any non-contact sensing. Exception to this is only the case when the objects/parts are physically separated so that the non-contact sensor can measure this separation or one knows a great deal of a priori knowledge about the objects (their geometry, material, etc.). We assume no such knowledge is available. Instead, we assume that the scene is reachable with a manipulator. Hence the problem represents a class of problems of segmentation that occur on an assembly line, bin picking, organizing a desk top, and the like. What are the typical properties of this class of problems?
1. The objects are rigid. Their size and weight is such that they are manipulable with an suitable end effector. Their numbers on the scene is such that in a reasonable time each piece can be examined and manipulated, i.e the complexity of the scene is bounded.
2. The scene is accessible to the sensors, i.e the whole scene is visible, although some parts may be occluded, and reachable by the manipulator.
3. There is a well defined goal which is detectable by the available sensors. Specifically the goal maybe: an empty scene, or an organized/ ordered scene
GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts
For years, researchers have been devoted to generalizable object perception
and manipulation, where cross-category generalizability is highly desired yet
underexplored. In this work, we propose to learn such cross-category skills via
Generalizable and Actionable Parts (GAParts). By identifying and defining 9
GAPart classes (lids, handles, etc.) in 27 object categories, we construct a
large-scale part-centric interactive dataset, GAPartNet, where we provide rich,
part-level annotations (semantics, poses) for 8,489 part instances on 1,166
objects. Based on GAPartNet, we investigate three cross-category tasks: part
segmentation, part pose estimation, and part-based object manipulation. Given
the significant domain gaps between seen and unseen object categories, we
propose a robust 3D segmentation method from the perspective of domain
generalization by integrating adversarial learning techniques. Our method
outperforms all existing methods by a large margin, no matter on seen or unseen
categories. Furthermore, with part segmentation and pose estimation results, we
leverage the GAPart pose definition to design part-based manipulation
heuristics that can generalize well to unseen object categories in both the
simulator and the real world. Our dataset, code, and demos are available on our
project page.Comment: To appear in CVPR 2023 (Highlight
Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries
With advanced image journaling tools, one can easily alter the semantic
meaning of an image by exploiting certain manipulation techniques such as
copy-clone, object splicing, and removal, which mislead the viewers. In
contrast, the identification of these manipulations becomes a very challenging
task as manipulated regions are not visually apparent. This paper proposes a
high-confidence manipulation localization architecture which utilizes
resampling features, Long-Short Term Memory (LSTM) cells, and encoder-decoder
network to segment out manipulated regions from non-manipulated ones.
Resampling features are used to capture artifacts like JPEG quality loss,
upsampling, downsampling, rotation, and shearing. The proposed network exploits
larger receptive fields (spatial maps) and frequency domain correlation to
analyze the discriminative characteristics between manipulated and
non-manipulated regions by incorporating encoder and LSTM network. Finally,
decoder network learns the mapping from low-resolution feature maps to
pixel-wise predictions for image tamper localization. With predicted mask
provided by final layer (softmax) of the proposed architecture, end-to-end
training is performed to learn the network parameters through back-propagation
using ground-truth masks. Furthermore, a large image splicing dataset is
introduced to guide the training process. The proposed method is capable of
localizing image manipulations at pixel level with high precision, which is
demonstrated through rigorous experimentation on three diverse datasets
Supervised Autonomous Locomotion and Manipulation for Disaster Response with a Centaur-like Robot
Mobile manipulation tasks are one of the key challenges in the field of
search and rescue (SAR) robotics requiring robots with flexible locomotion and
manipulation abilities. Since the tasks are mostly unknown in advance, the
robot has to adapt to a wide variety of terrains and workspaces during a
mission. The centaur-like robot Centauro has a hybrid legged-wheeled base and
an anthropomorphic upper body to carry out complex tasks in environments too
dangerous for humans. Due to its high number of degrees of freedom, controlling
the robot with direct teleoperation approaches is challenging and exhausting.
Supervised autonomy approaches are promising to increase quality and speed of
control while keeping the flexibility to solve unknown tasks. We developed a
set of operator assistance functionalities with different levels of autonomy to
control the robot for challenging locomotion and manipulation tasks. The
integrated system was evaluated in disaster response scenarios and showed
promising performance.Comment: In Proceedings of IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), Madrid, Spain, October 201
Improvised Salient Object Detection and Manipulation
In case of salient subject recognition, computer algorithms have been heavily
relied on scanning of images from top-left to bottom-right systematically and
apply brute-force when attempting to locate objects of interest. Thus, the
process turns out to be quite time consuming. Here a novel approach and a
simple solution to the above problem is discussed. In this paper, we implement
an approach to object manipulation and detection through segmentation map,
which would help to desaturate or, in other words, wash out the background of
the image. Evaluation for the performance is carried out using the Jaccard
index against the well-known Ground-truth target box technique.Comment: 7 page
LabelFusion: A Pipeline for Generating Ground Truth Labels for Real RGBD Data of Cluttered Scenes
Deep neural network (DNN) architectures have been shown to outperform
traditional pipelines for object segmentation and pose estimation using RGBD
data, but the performance of these DNN pipelines is directly tied to how
representative the training data is of the true data. Hence a key requirement
for employing these methods in practice is to have a large set of labeled data
for your specific robotic manipulation task, a requirement that is not
generally satisfied by existing datasets. In this paper we develop a pipeline
to rapidly generate high quality RGBD data with pixelwise labels and object
poses. We use an RGBD camera to collect video of a scene from multiple
viewpoints and leverage existing reconstruction techniques to produce a 3D
dense reconstruction. We label the 3D reconstruction using a human assisted
ICP-fitting of object meshes. By reprojecting the results of labeling the 3D
scene we can produce labels for each RGBD image of the scene. This pipeline
enabled us to collect over 1,000,000 labeled object instances in just a few
days. We use this dataset to answer questions related to how much training data
is required, and of what quality the data must be, to achieve high performance
from a DNN architecture
- …