69 research outputs found
Teaching humanoid robotics by means of human teleoperation through RGB-D sensors
This paper presents a graduate course project on humanoid robotics offered by the University of Padova. The target is to safely lift an object by teleoperating a small humanoid. Students have to map human limbs into robot joints, guarantee the robot stability during the motion, and teleoperate the robot to perform the correct movement. We introduce the following innovative aspects with respect to classical robotic classes: i) the use of humanoid robots as teaching tools; ii) the simplification of the stable locomotion problem by exploiting the potential of teleoperation; iii) the adoption of a Project-Based Learning constructivist approach as teaching methodology. The learning objectives of both course and project are introduced and compared with the students\u2019 background. Design and constraints students have to deal with are reported, together with the amount of time they and their instructors dedicated to solve tasks. A set of evaluation results are provided in order to validate the authors\u2019 purpose, including the students\u2019 personal feedback. A discussion about possible future improvements is reported, hoping to encourage further spread of educational robotics in schools at all levels
Video-driven Neural Physically-based Facial Asset for Production
Production-level workflows for producing convincing 3D dynamic human faces
have long relied on an assortment of labor-intensive tools for geometry and
texture generation, motion capture and rigging, and expression synthesis.
Recent neural approaches automate individual components but the corresponding
latent representations cannot provide artists with explicit controls as in
conventional tools. In this paper, we present a new learning-based,
video-driven approach for generating dynamic facial geometries with
high-quality physically-based assets. For data collection, we construct a
hybrid multiview-photometric capture stage, coupling with ultra-fast video
cameras to obtain raw 3D facial assets. We then set out to model the facial
expression, geometry and physically-based textures using separate VAEs where we
impose a global MLP based expression mapping across the latent spaces of
respective networks, to preserve characteristics across respective attributes.
We also model the delta information as wrinkle maps for the physically-based
textures, achieving high-quality 4K dynamic textures. We demonstrate our
approach in high-fidelity performer-specific facial capture and cross-identity
facial motion retargeting. In addition, our multi-VAE-based neural asset, along
with the fast adaptation schemes, can also be deployed to handle in-the-wild
videos. Besides, we motivate the utility of our explicit facial disentangling
strategy by providing various promising physically-based editing results with
high realism. Comprehensive experiments show that our technique provides higher
accuracy and visual fidelity than previous video-driven facial reconstruction
and animation methods.Comment: For project page, see https://sites.google.com/view/npfa/ Notice: You
may not copy, reproduce, distribute, publish, display, perform, modify,
create derivative works, transmit, or in any way exploit any such content,
nor may you distribute any part of this content over any network, including a
local area network, sell or offer it for sale, or use such content to
construct any kind of databas
Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior
Creating believable motions for various characters has long been a goal in
computer graphics. Current learning-based motion synthesis methods depend on
extensive motion datasets, which are often challenging, if not impossible, to
obtain. On the other hand, pose data is more accessible, since static posed
characters are easier to create and can even be extracted from images using
recent advancements in computer vision. In this paper, we utilize this
alternative data source and introduce a neural motion synthesis approach
through retargeting. Our method generates plausible motions for characters that
have only pose data by transferring motion from an existing motion capture
dataset of another character, which can have drastically different skeletons.
Our experiments show that our method effectively combines the motion features
of the source character with the pose features of the target character, and
performs robustly with small or noisy pose data sets, ranging from a few
artist-created poses to noisy poses estimated directly from images.
Additionally, a conducted user study indicated that a majority of participants
found our retargeted motion to be more enjoyable to watch, more lifelike in
appearance, and exhibiting fewer artifacts. Project page:
https://cyanzhao42.github.io/pose2motionComment: Project page: https://cyanzhao42.github.io/pose2motio
MoDA: Modeling Deformable 3D Objects from Casual Videos
In this paper, we focus on the challenges of modeling deformable 3D objects
from casual videos. With the popularity of neural radiance fields (NeRF), many
works extend it to dynamic scenes with a canonical NeRF and a deformation model
that achieves 3D point transformation between the observation space and the
canonical space. Recent works rely on linear blend skinning (LBS) to achieve
the canonical-observation transformation. However, the linearly weighted
combination of rigid transformation matrices is not guaranteed to be rigid. As
a matter of fact, unexpected scale and shear factors often appear. In practice,
using LBS as the deformation model can always lead to skin-collapsing artifacts
for bending or twisting motions. To solve this problem, we propose neural dual
quaternion blend skinning (NeuDBS) to achieve 3D point deformation, which can
perform rigid transformation without skin-collapsing artifacts. In the endeavor
to register 2D pixels across different frames, we establish a correspondence
between canonical feature embeddings that encodes 3D points within the
canonical space, and 2D image features by solving an optimal transport problem.
Besides, we introduce a texture filtering approach for texture rendering that
effectively minimizes the impact of noisy colors outside target deformable
objects. Extensive experiments on real and synthetic datasets show that our
approach can reconstruct 3D models for humans and animals with better
qualitative and quantitative performance than state-of-the-art methods
Robot Learning by observing human actions
Nowadays, robotics is entering in our life. One can see robot in industries, offices and even in homes. The more robots are in contact with people, the more requests of new capabilities and new features increase, in order to make robots able to act in case of need, help humans or be a companion. Therefore, it becomes essential to have a quick and easy way to teach new skills to robots. That is the aim of Robot Learning from Demonstration. This paradigm allows to directly program new tasks in a robot through demonstrations.
This thesis proposes a novel approach to Robot Learning from Demonstration able to learn new skills from natural demonstrations carried out from naive users. To this aim, we introduce a novel Robot Learning from Demonstration framework by proposing novel approaches in all functional sub-units: from data acquisition to motion elaboration, from information modeling to robot control.
A novel method is explained to extract 3D motion flow information from both RGB and depth data acquired by using recently introduced consumer RGB-D cameras.
The motion data are computed over the time to recognize and classify human actions.
In this thesis, we describe new techniques to remap human motion to robotic joints. Our methods allow people to natural interact with robots by re-targeting the whole body movements in an intuitive way. We develop algorithm for both humanoids and manipulators motion and test them in different situations.
Finally, we improve modeling techniques by using a probabilistic method: the Donut Mixture Model. This model is able to manage several interpretations that different people can produce performing a task. The estimated model can also be updated directly by using new attempts carried out by the robot. This feature is very important to rapidly obtain correct robot trajectories by means of few human demonstrations.
A further contribution of this thesis is the creation of a number of new virtual models for the different robots we used to test our algorithms. All the developed models are compliant with ROS, so they can be used to foster research in the field from all the community of this very diffuse robotics framework. Moreover, a new 3D dataset is collected to compare different action recognition algorithms. The dataset contains both RGB-D information coming directly from the sensor and skeleton data provided by a skeleton tracker
Adaptive motion synthesis and motor invariant theory.
Generating natural-looking motion for virtual characters is a challenging research topic. It becomes even harder when adapting synthesized motion to interact with the environment. Current methods are tedious to use, computationally expensive and fail to capture natural looking features. These difficulties seem to suggest that artificial control techniques are inferior to their natural counterparts. Recent advances in biology research point to a new motor control principle: utilizing the natural dynamics. The interaction of body and environment forms some patterns, which work as primary elements for the motion repertoire: Motion Primitives. These elements serve as templates, tweaked by the neural system to satisfy environmental constraints or motion purposes. Complex motions are synthesized by connecting motion primitives together, just like connecting alphabets to form sentences. Based on such ideas, this thesis proposes a new dynamic motion synthesis method. A key contribution is the insight into dynamic reason behind motion primitives: template motions are stable and energy efficient. When synthesizing motions from templates, valuable properties like stability and efficiency should be perfectly preserved. The mathematical formalization of this idea is the Motor Invariant Theory and the preserved properties are motor invariant In the process of conceptualization, newmathematical tools are introduced to the research topic. The Invariant Theory, especially mathematical concepts of equivalence and symmetry, plays a crucial role. Motion adaptation is mathematically modelled as topological conjugacy: a transformation which maintains the topology and results in an analogous system. The Neural Oscillator and Symmetry Preserving Transformations are proposed for their computational efficiency. Even without reference motion data, this approach produces natural looking motion in real-time. Also the new motor invariant theory might shed light on the long time perception problem in biological research
Unsupervised human-to-robot motion retargeting via expressive latent space
This paper introduces a novel approach for human-to-robot motion retargeting,
enabling robots to mimic human motion with precision while preserving the
semantics of the motion. For that, we propose a deep learning method for direct
translation from human to robot motion. Our method does not require annotated
paired human-to-robot motion data, which reduces the effort when adopting new
robots. To this end, we first propose a cross-domain similarity metric to
compare the poses from different domains (i.e., human and robot). Then, our
method achieves the construction of a shared latent space via contrastive
learning and decodes latent representations to robot motion control commands.
The learned latent space exhibits expressiveness as it captures the motions
precisely and allows direct motion control in the latent space. We showcase how
to generate in-between motion through simple linear interpolation in the latent
space between two projected human poses. Additionally, we conducted a
comprehensive evaluation of robot control using diverse modality inputs, such
as texts, RGB videos, and key-poses, which enhances the ease of robot control
to users of all backgrounds. Finally, we compare our model with existing works
and quantitatively and qualitatively demonstrate the effectiveness of our
approach, enhancing natural human-robot communication and fostering trust in
integrating robots into daily life
- …