47 research outputs found
CARTO: Category and Joint Agnostic Reconstruction of ARTiculated Objects
We present CARTO, a novel approach for reconstructing multiple articulated
objects from a single stereo RGB observation. We use implicit object-centric
representations and learn a single geometry and articulation decoder for
multiple object categories. Despite training on multiple categories, our
decoder achieves a comparable reconstruction accuracy to methods that train
bespoke decoders separately for each category. Combined with our stereo image
encoder we infer the 3D shape, 6D pose, size, joint type, and the joint state
of multiple unknown objects in a single forward pass. Our method achieves a
20.4% absolute improvement in mAP 3D IOU50 for novel instances when compared to
a two-stage pipeline. Inference time is fast and can run on a NVIDIA TITAN XP
GPU at 1 HZ for eight or less objects present. While only trained on simulated
data, CARTO transfers to real-world object instances. Code and evaluation data
is available at: http://carto.cs.uni-freiburg.deComment: 20 pages, 11 figures, accepted at CVPR 202
Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery
Articulated objects are abundant in daily life. Discovering their parts,
joints, and kinematics is crucial for robots to interact with these objects. We
introduce Structure from Action (SfA), a framework that discovers the 3D part
geometry and joint parameters of unseen articulated objects via a sequence of
inferred interactions. Our key insight is that 3D interaction and perception
should be considered in conjunction to construct 3D articulated CAD models,
especially in the case of categories not seen during training. By selecting
informative interactions, SfA discovers parts and reveals initially occluded
surfaces, like the inside of a closed drawer. By aggregating visual
observations in 3D, SfA accurately segments multiple parts, reconstructs part
geometry, and infers all joint parameters in a canonical coordinate frame. Our
experiments demonstrate that a single SfA model trained in simulation can
generalize to many unseen object categories with unknown kinematic structures
and to real-world objects. Code and data will be publicly available
NARF22: Neural Articulated Radiance Fields for Configuration-Aware Rendering
Articulated objects pose a unique challenge for robotic perception and
manipulation. Their increased number of degrees-of-freedom makes tasks such as
localization computationally difficult, while also making the process of
real-world dataset collection unscalable. With the aim of addressing these
scalability issues, we propose Neural Articulated Radiance Fields (NARF22), a
pipeline which uses a fully-differentiable, configuration-parameterized Neural
Radiance Field (NeRF) as a means of providing high quality renderings of
articulated objects. NARF22 requires no explicit knowledge of the object
structure at inference time. We propose a two-stage parts-based training
mechanism which allows the object rendering models to generalize well across
the configuration space even if the underlying training data has as few as one
configuration represented. We demonstrate the efficacy of NARF22 by training
configurable renderers on a real-world articulated tool dataset collected via a
Fetch mobile manipulation robot. We show the applicability of the model to
gradient-based inference methods through a configuration estimation and 6
degree-of-freedom pose refinement task. The project webpage is available at:
https://progress.eecs.umich.edu/projects/narf/.Comment: Accepted to the 2022 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS). Contact: Stanley Lewis, [email protected]
NAISR: A 3D Neural Additive Model for Interpretable Shape Representation
Deep implicit functions (DIFs) have emerged as a powerful paradigm for many
computer vision tasks such as 3D shape reconstruction, generation,
registration, completion, editing, and understanding. However, given a set of
3D shapes with associated covariates there is at present no shape
representation method which allows to precisely represent the shapes while
capturing the individual dependencies on each covariate. Such a method would be
of high utility to researchers to discover knowledge hidden in a population of
shapes. We propose a 3D Neural Additive Model for Interpretable Shape
Representation (NAISR) which describes individual shapes by deforming a shape
atlas in accordance to the effect of disentangled covariates. Our approach
captures shape population trends and allows for patient-specific predictions
through shape transfer. NAISR is the first approach to combine the benefits of
deep implicit shape representations with an atlas deforming according to
specified covariates. Although our driving problem is the construction of an
airway atlas, NAISR is a general approach for modeling, representing, and
investigating shape populations. We evaluate NAISR with respect to shape
reconstruction, shape disentanglement, shape evolution, and shape transfer for
the pediatric upper airway. Our experiments demonstrate that NAISR achieves
competitive shape reconstruction performance while retaining interpretability.Comment: 20 page
Robust Scene Estimation for Goal-directed Robotic Manipulation in Unstructured Environments
To make autonomous robots "taskable" so that they function properly and interact fluently with human partners, they must be able to perceive and understand the semantic aspects of their environments. More specifically, they must know what objects exist and where they are in the unstructured human world. Progresses in robot perception, especially in deep learning, have greatly improved for detecting and localizing objects. However, it still remains a challenge for robots to perform a highly reliable scene estimation in unstructured environments that is determined by robustness, adaptability and scale. In this dissertation, we address the scene estimation problem under uncertainty, especially in unstructured environments. We enable robots to build a reliable object-oriented representation that describes objects present in the environment, as well as inter-object spatial relations. Specifically, we focus on addressing following challenges for reliable scene estimation: 1) robust perception under uncertainty results from noisy sensors, objects in clutter and perceptual aliasing, 2) adaptable perception in adverse conditions by combined deep learning and probabilistic generative methods, 3) scalable perception as the number of objects grows and the structure of objects becomes more complex (e.g. objects in dense clutter).
Towards realizing robust perception, our objective is to ground raw sensor observations into scene states while dealing with uncertainty from sensor measurements and actuator control . Scene states are represented as scene graphs, where scene graphs denote parameterized axiomatic statements that assert relationships between objects and their poses. To deal with the uncertainty, we present a pure generative approach, Axiomatic Scene Estimation (AxScEs). AxScEs estimates a probabilistic distribution across plausible scene graph hypotheses describing the configuration of objects. By maintaining a diverse set of possible states, the proposed approach demonstrates the robustness to the local minimum in the scene graph state space and effectiveness for manipulation-quality perception based on edit distance on scene graphs.
To scale up to more unstructured scenarios and be adaptable to adversarial scenarios, we present Sequential Scene Understanding and Manipulation (SUM), which estimates the scene as a collection of objects in cluttered environments. SUM is a two-stage method that leverages the accuracy and efficiency from convolutional neural networks (CNNs) with probabilistic inference methods. Despite the strength from CNNs, they are opaque in understanding how the decisions are made and fragile for generalizing beyond overfit training samples in adverse conditions (e.g., changes in illumination). The probabilistic generative method complements these weaknesses and provides an avenue for adaptable perception.
To scale up to densely cluttered environments where objects are physically touching with severe occlusions, we present GeoFusion, which fuses noisy observations from multiple frames by exploring geometric consistency at object level. Geometric consistency characterizes geometric compatibility between objects and geometric similarity between observations and objects. It reasons about geometry at the object-level, offering a fast and reliable way to be robust to semantic perceptual aliasing. The proposed approach demonstrates greater robustness and accuracy than the state-of-the-art pose estimation approach.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163060/1/zsui_1.pd
Advanced grasping with the Pisa/IIT softHand
This chapter presents the hardware, software and overall strategy used by the team UNIPI-IIT-QB to participate to the Robotic Grasping and Manipulation Competition. It relies on the PISA/IIT SoftHand, which is underactuated soft robotic hand that can adapt to the grasped object shape and is compliant with the environment. It was used for the hand-in-hand and for the simulation tracks, where the team reached first and third places respectively
Robust Recommender System: A Survey and Future Directions
With the rapid growth of information, recommender systems have become
integral for providing personalized suggestions and overcoming information
overload. However, their practical deployment often encounters "dirty" data,
where noise or malicious information can lead to abnormal recommendations.
Research on improving recommender systems' robustness against such dirty data
has thus gained significant attention. This survey provides a comprehensive
review of recent work on recommender systems' robustness. We first present a
taxonomy to organize current techniques for withstanding malicious attacks and
natural noise. We then explore state-of-the-art methods in each category,
including fraudster detection, adversarial training, certifiable robust
training against malicious attacks, and regularization, purification,
self-supervised learning against natural noise. Additionally, we summarize
evaluation metrics and common datasets used to assess robustness. We discuss
robustness across varying recommendation scenarios and its interplay with other
properties like accuracy, interpretability, privacy, and fairness. Finally, we
delve into open issues and future research directions in this emerging field.
Our goal is to equip readers with a holistic understanding of robust
recommender systems and spotlight pathways for future research and development