1,745 research outputs found
Occlusion-Aware Multi-View Reconstruction of Articulated Objects for Manipulation
The goal of this research is to develop algorithms using multiple views to automatically recover complete 3D models of articulated objects in unstructured environments and thereby enable a robotic system to facilitate further manipulation of those objects. First, an algorithm called Procrustes-Lo-RANSAC (PLR) is presented. Structure-from-motion techniques are used to capture 3D point cloud models of an articulated object in two different configurations. Procrustes analysis, combined with a locally optimized RANSAC sampling strategy, facilitates a straightforward geometric approach to recovering the joint axes, as well as classifying them automatically as either revolute or prismatic. The algorithm does not require prior knowledge of the object, nor does it make any assumptions about the planarity of the object or scene. Second, with such a resulting articulated model, a robotic system is then able to manipulate the object either along its joint axes at a specified grasp point in order to exercise its degrees of freedom or move its end effector to a particular position even if the point is not visible in the current view. This is one of the main advantages of the occlusion-aware approach, because the models capture all sides of the object meaning that the robot has knowledge of parts of the object that are not visible in the current view. Experiments with a PUMA 500 robotic arm demonstrate the effectiveness of the approach on a variety of real-world objects containing both revolute and prismatic joints. Third, we improve the proposed approach by using a RGBD sensor (Microsoft Kinect) that yield a depth value for each pixel immediately by the sensor itself rather than requiring correspondence to establish depth. KinectFusion algorithm is applied to produce a single high-quality, geometrically accurate 3D model from which rigid links of the object are segmented and aligned, allowing the joint axes to be estimated using the geometric approach. The improved algorithm does not require artificial markers attached to objects, yields much denser 3D models and reduces the computation time
CHORD: Category-level Hand-held Object Reconstruction via Shape Deformation
In daily life, humans utilize hands to manipulate objects. Modeling the shape
of objects that are manipulated by the hand is essential for AI to comprehend
daily tasks and to learn manipulation skills. However, previous approaches have
encountered difficulties in reconstructing the precise shapes of hand-held
objects, primarily owing to a deficiency in prior shape knowledge and
inadequate data for training. As illustrated, given a particular type of tool,
such as a mug, despite its infinite variations in shape and appearance, humans
have a limited number of 'effective' modes and poses for its manipulation. This
can be attributed to the fact that humans have mastered the shape prior of the
'mug' category, and can quickly establish the corresponding relations between
different mug instances and the prior, such as where the rim and handle are
located. In light of this, we propose a new method, CHORD, for Category-level
Hand-held Object Reconstruction via shape Deformation. CHORD deforms a
categorical shape prior for reconstructing the intra-class objects. To ensure
accurate reconstruction, we empower CHORD with three types of awareness:
appearance, shape, and interacting pose. In addition, we have constructed a new
dataset, COMIC, of category-level hand-object interaction. COMIC contains a
rich array of object instances, materials, hand interactions, and viewing
directions. Extensive evaluation shows that CHORD outperforms state-of-the-art
approaches in both quantitative and qualitative measures. Code, model, and
datasets are available at https://kailinli.github.io/CHORD.Comment: To be presented at ICCV 2023, Pari
The Treachery of Images: Bayesian Scene Keypoints for Deep Policy Learning in Robotic Manipulation
In policy learning for robotic manipulation, sample efficiency is of
paramount importance. Thus, learning and extracting more compact
representations from camera observations is a promising avenue. However,
current methods often assume full observability of the scene and struggle with
scale invariance. In many tasks and settings, this assumption does not hold as
objects in the scene are often occluded or lie outside the field of view of the
camera, rendering the camera observation ambiguous with regard to their
location. To tackle this problem, we present BASK, a Bayesian approach to
tracking scale-invariant keypoints over time. Our approach successfully
resolves inherent ambiguities in images, enabling keypoint tracking on
symmetrical objects and occluded and out-of-view objects. We employ our method
to learn challenging multi-object robot manipulation tasks from wrist camera
observations and demonstrate superior utility for policy learning compared to
other representation learning techniques. Furthermore, we show outstanding
robustness towards disturbances such as clutter, occlusions, and noisy depth
measurements, as well as generalization to unseen objects both in simulation
and real-world robotic experiments
Doduo: Learning Dense Visual Correspondence from Unsupervised Semantic-Aware Flow
Dense visual correspondence plays a vital role in robotic perception. This
work focuses on establishing the dense correspondence between a pair of images
that captures dynamic scenes undergoing substantial transformations. We
introduce Doduo to learn general dense visual correspondence from in-the-wild
images and videos without ground truth supervision. Given a pair of images, it
estimates the dense flow field encoding the displacement of each pixel in one
image to its corresponding pixel in the other image. Doduo uses flow-based
warping to acquire supervisory signals for the training. Incorporating semantic
priors with self-supervised flow training, Doduo produces accurate dense
correspondence robust to the dynamic changes of the scenes. Trained on an
in-the-wild video dataset, Doduo illustrates superior performance on
point-level correspondence estimation over existing self-supervised
correspondence learning baselines. We also apply Doduo to articulation
estimation and zero-shot goal-conditioned manipulation, underlining its
practical applications in robotics. Code and additional visualizations are
available at https://ut-austin-rpl.github.io/DoduoComment: Project website: https://ut-austin-rpl.github.io/Dodu
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
- …