3,877 research outputs found
3LP: a linear 3D-walking model including torso and swing dynamics
In this paper, we present a new model of biped locomotion which is composed
of three linear pendulums (one per leg and one for the whole upper body) to
describe stance, swing and torso dynamics. In addition to double support, this
model has different actuation possibilities in the swing hip and stance ankle
which could be widely used to produce different walking gaits. Without the need
for numerical time-integration, closed-form solutions help finding periodic
gaits which could be simply scaled in certain dimensions to modulate the motion
online. Thanks to linearity properties, the proposed model can provide a
computationally fast platform for model predictive controllers to predict the
future and consider meaningful inequality constraints to ensure feasibility of
the motion. Such property is coming from describing dynamics with joint torques
directly and therefore, reflecting hardware limitations more precisely, even in
the very abstract high level template space. The proposed model produces
human-like torque and ground reaction force profiles and thus, compared to
point-mass models, it is more promising for precise control of humanoid robots.
Despite being linear and lacking many other features of human walking like CoM
excursion, knee flexion and ground clearance, we show that the proposed model
can predict one of the main optimality trends in human walking, i.e. nonlinear
speed-frequency relationship. In this paper, we mainly focus on describing the
model and its capabilities, comparing it with human data and calculating
optimal human gait variables. Setting up control problems and advanced
biomechanical analysis still remain for future works.Comment: Journal paper under revie
Push recovery with stepping strategy based on time-projection control
In this paper, we present a simple control framework for on-line push
recovery with dynamic stepping properties. Due to relatively heavy legs in our
robot, we need to take swing dynamics into account and thus use a linear model
called 3LP which is composed of three pendulums to simulate swing and torso
dynamics. Based on 3LP equations, we formulate discrete LQR controllers and use
a particular time-projection method to adjust the next footstep location
on-line during the motion continuously. This adjustment, which is found based
on both pelvis and swing foot tracking errors, naturally takes the swing
dynamics into account. Suggested adjustments are added to the Cartesian 3LP
gaits and converted to joint-space trajectories through inverse kinematics.
Fixed and adaptive foot lift strategies also ensure enough ground clearance in
perturbed walking conditions. The proposed structure is robust, yet uses very
simple state estimation and basic position tracking. We rely on the physical
series elastic actuators to absorb impacts while introducing simple laws to
compensate their tracking bias. Extensive experiments demonstrate the
functionality of different control blocks and prove the effectiveness of
time-projection in extreme push recovery scenarios. We also show self-produced
and emergent walking gaits when the robot is subject to continuous dragging
forces. These gaits feature dynamic walking robustness due to relatively soft
springs in the ankles and avoiding any Zero Moment Point (ZMP) control in our
proposed architecture.Comment: 20 pages journal pape
Variational Methods for Biomolecular Modeling
Structure, function and dynamics of many biomolecular systems can be
characterized by the energetic variational principle and the corresponding
systems of partial differential equations (PDEs). This principle allows us to
focus on the identification of essential energetic components, the optimal
parametrization of energies, and the efficient computational implementation of
energy variation or minimization. Given the fact that complex biomolecular
systems are structurally non-uniform and their interactions occur through
contact interfaces, their free energies are associated with various interfaces
as well, such as solute-solvent interface, molecular binding interface, lipid
domain interface, and membrane surfaces. This fact motivates the inclusion of
interface geometry, particular its curvatures, to the parametrization of free
energies. Applications of such interface geometry based energetic variational
principles are illustrated through three concrete topics: the multiscale
modeling of biomolecular electrostatics and solvation that includes the
curvature energy of the molecular surface, the formation of microdomains on
lipid membrane due to the geometric and molecular mechanics at the lipid
interface, and the mean curvature driven protein localization on membrane
surfaces. By further implicitly representing the interface using a phase field
function over the entire domain, one can simulate the dynamics of the interface
and the corresponding energy variation by evolving the phase field function,
achieving significant reduction of the number of degrees of freedom and
computational complexity. Strategies for improving the efficiency of
computational implementations and for extending applications to coarse-graining
or multiscale molecular simulations are outlined.Comment: 36 page
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation
Audio-Driven Face Animation is an eagerly anticipated technique for
applications such as VR/AR, games, and movie making. With the rapid development
of 3D engines, there is an increasing demand for driving 3D faces with audio.
However, currently available 3D face animation datasets are either
scale-limited or quality-unsatisfied, which hampers further developments of
audio-driven 3D face animation. To address this challenge, we propose MMFace4D,
a large-scale multi-modal 4D (3D sequence) face dataset consisting of 431
identities, 35,904 sequences, and 3.9 million frames. MMFace4D exhibits two
compelling characteristics: 1) a remarkably diverse set of subjects and corpus,
encompassing actors spanning ages 15 to 68, and recorded sentences with
durations ranging from 0.7 to 11.4 seconds. 2) It features synchronized audio
and 3D mesh sequences with high-resolution face details. To capture the subtle
nuances of 3D facial expressions, we leverage three synchronized RGBD cameras
during the recording process. Upon MMFace4D, we construct a non-autoregressive
framework for audio-driven 3D face animation. Our framework considers the
regional and composite natures of facial animations, and surpasses contemporary
state-of-the-art approaches both qualitatively and quantitatively. The code,
model, and dataset will be publicly available.Comment: 10 pages, 8 figures. This paper has been submitted to IEEE
Transaction on MultiMedia, which is the extension of our MM2023 paper
arXiv:2308.05428. The dataset is now publicly available, see Project page at
https://wuhaozhe.github.io/mmface4d
Speech-Driven 3D Face Animation with Composite and Regional Facial Movements
Speech-driven 3D face animation poses significant challenges due to the
intricacy and variability inherent in human facial movements. This paper
emphasizes the importance of considering both the composite and regional
natures of facial movements in speech-driven 3D face animation. The composite
nature pertains to how speech-independent factors globally modulate
speech-driven facial movements along the temporal dimension. Meanwhile, the
regional nature alludes to the notion that facial movements are not globally
correlated but are actuated by local musculature along the spatial dimension.
It is thus indispensable to incorporate both natures for engendering vivid
animation. To address the composite nature, we introduce an adaptive modulation
module that employs arbitrary facial movements to dynamically adjust
speech-driven facial movements across frames on a global scale. To accommodate
the regional nature, our approach ensures that each constituent of the facial
features for every frame focuses on the local spatial movements of 3D faces.
Moreover, we present a non-autoregressive backbone for translating audio to 3D
facial movements, which maintains high-frequency nuances of facial movements
and facilitates efficient inference. Comprehensive experiments and user studies
demonstrate that our method surpasses contemporary state-of-the-art approaches
both qualitatively and quantitatively.Comment: Accepted by MM 2023, 9 pages, 7 figures. arXiv admin note: text
overlap with arXiv:2303.0979
DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
Diffusion models have shown remarkable success in a variety of downstream
generative tasks, yet remain under-explored in the important and challenging
expressive talking head generation. In this work, we propose a DreamTalk
framework to fulfill this gap, which employs meticulous design to unlock the
potential of diffusion models in generating expressive talking heads.
Specifically, DreamTalk consists of three crucial components: a denoising
network, a style-aware lip expert, and a style predictor. The diffusion-based
denoising network is able to consistently synthesize high-quality audio-driven
face motions across diverse expressions. To enhance the expressiveness and
accuracy of lip motions, we introduce a style-aware lip expert that can guide
lip-sync while being mindful of the speaking styles. To eliminate the need for
expression reference video or text, an extra diffusion-based style predictor is
utilized to predict the target expression directly from the audio. By this
means, DreamTalk can harness powerful diffusion models to generate expressive
faces effectively and reduce the reliance on expensive style references.
Experimental results demonstrate that DreamTalk is capable of generating
photo-realistic talking faces with diverse speaking styles and achieving
accurate lip motions, surpassing existing state-of-the-art counterparts.Comment: Project Page: https://dreamtalk-project.github.i
Learning hybrid locomotion skills—Learn to exploit residual actions and modulate model-based gait control
This work has developed a hybrid framework that combines machine learning and control approaches for legged robots to achieve new capabilities of balancing against external perturbations. The framework embeds a kernel which is a model-based, full parametric closed-loop and analytical controller as the gait pattern generator. On top of that, a neural network with symmetric partial data augmentation learns to automatically adjust the parameters for the gait kernel, and also generate compensatory actions for all joints, thus significantly augmenting the stability under unexpected perturbations. Seven Neural Network policies with different configurations were optimized to validate the effectiveness and the combined use of the modulation of the kernel parameters and the compensation for the arms and legs using residual actions. The results validated that modulating kernel parameters alongside the residual actions have improved the stability significantly. Furthermore, The performance of the proposed framework was evaluated across a set of challenging simulated scenarios, and demonstrated considerable improvements compared to the baseline in recovering from large external forces (up to 118%). Besides, regarding measurement noise and model inaccuracies, the robustness of the proposed framework has been assessed through simulations, which demonstrated the robustness in the presence of these uncertainties. Furthermore, the trained policies were validated across a set of unseen scenarios and showed the generalization to dynamic walking
Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Vocal tract configurations play a vital role in generating distinguishable
speech sounds, by modulating the airflow and creating different resonant
cavities in speech production. They contain abundant information that can be
utilized to better understand the underlying speech production mechanism. As a
step towards automatic mapping of vocal tract shape geometry to acoustics, this
paper employs effective video action recognition techniques, like Long-term
Recurrent Convolutional Networks (LRCN) models, to identify different
vowel-consonant-vowel (VCV) sequences from dynamic shaping of the vocal tract.
Such a model typically combines a CNN based deep hierarchical visual feature
extractor with Recurrent Networks, that ideally makes the network
spatio-temporally deep enough to learn the sequential dynamics of a short video
clip for video classification tasks. We use a database consisting of 2D
real-time MRI of vocal tract shaping during VCV utterances by 17 speakers. The
comparative performances of this class of algorithms under various parameter
settings and for various classification tasks are discussed. Interestingly, the
results show a marked difference in the model performance in the context of
speech classification with respect to generic sequence or video classification
tasks.Comment: To appear in the INTERSPEECH 2018 Proceeding
- …