32,256 research outputs found
Estimation of Human Body Shape and Posture Under Clothing
Estimating the body shape and posture of a dressed human subject in motion
represented as a sequence of (possibly incomplete) 3D meshes is important for
virtual change rooms and security. To solve this problem, statistical shape
spaces encoding human body shape and posture variations are commonly used to
constrain the search space for the shape estimate. In this work, we propose a
novel method that uses a posture-invariant shape space to model body shape
variation combined with a skeleton-based deformation to model posture
variation. Our method can estimate the body shape and posture of both static
scans and motion sequences of dressed human body scans. In case of motion
sequences, our method takes advantage of motion cues to solve for a single body
shape estimate along with a sequence of posture estimates. We apply our
approach to both static scans and motion sequences and demonstrate that using
our method, higher fitting accuracy is achieved than when using a variant of
the popular SCAPE model as statistical model.Comment: 23 pages, 11 figure
Unsupervised Video Understanding by Reconciliation of Posture Similarities
Understanding human activity and being able to explain it in detail surpasses
mere action classification by far in both complexity and value. The challenge
is thus to describe an activity on the basis of its most fundamental
constituents, the individual postures and their distinctive transitions.
Supervised learning of such a fine-grained representation based on elementary
poses is very tedious and does not scale. Therefore, we propose a completely
unsupervised deep learning procedure based solely on video sequences, which
starts from scratch without requiring pre-trained networks, predefined body
models, or keypoints. A combinatorial sequence matching algorithm proposes
relations between frames from subsets of the training data, while a CNN is
reconciling the transitivity conflicts of the different subsets to learn a
single concerted pose embedding despite changes in appearance across sequences.
Without any manual annotation, the model learns a structured representation of
postures and their temporal development. The model not only enables retrieval
of similar postures but also temporal super-resolution. Additionally, based on
a recurrent formulation, next frames can be synthesized.Comment: Accepted by ICCV 201
Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction
Close human-robot cooperation is a key enabler for new developments in
advanced manufacturing and assistive applications. Close cooperation require
robots that can predict human actions and intent, and understand human
non-verbal cues. Recent approaches based on neural networks have led to
encouraging results in the human action prediction problem both in continuous
and discrete spaces. Our approach extends the research in this direction. Our
contributions are three-fold. First, we validate the use of gaze and body pose
cues as a means of predicting human action through a feature selection method.
Next, we address two shortcomings of existing literature: predicting multiple
and variable-length action sequences. This is achieved by introducing an
encoder-decoder recurrent neural network topology in the discrete action
prediction problem. In addition, we theoretically demonstrate the importance of
predicting multiple action sequences as a means of estimating the stochastic
reward in a human robot cooperation scenario. Finally, we show the ability to
effectively train the prediction model on a action prediction dataset,
involving human motion data, and explore the influence of the model's
parameters on its performance. Source code repository:
https://github.com/pschydlo/ActionAnticipationComment: IEEE International Conference on Robotics and Automation (ICRA) 2018,
Accepte
Wing and body motion during flight initiation in Drosophila revealed by automated visual tracking
The fruit fly Drosophila melanogaster is a widely used model organism in studies of genetics, developmental biology and biomechanics. One limitation for exploiting Drosophila as a model system for behavioral neurobiology is that measuring body kinematics during behavior is labor intensive and subjective. In order to quantify flight kinematics during different types of maneuvers, we have developed a visual tracking system that estimates the posture of the fly from multiple calibrated cameras. An accurate geometric fly model is designed using unit quaternions to capture complex body and wing rotations, which are automatically fitted to the images in each time frame. Our approach works across a range of flight behaviors, while also being robust to common environmental clutter. The tracking system is used in this paper to compare wing and body motion during both voluntary and escape take-offs. Using our automated algorithms, we are able to measure stroke amplitude, geometric angle of attack and other parameters important to a mechanistic understanding of flapping flight. When compared with manual tracking methods, the algorithm estimates body position within 4.4±1.3% of the body length, while body orientation is measured within 6.5±1.9 deg. (roll), 3.2±1.3 deg. (pitch) and 3.4±1.6 deg. (yaw) on average across six videos. Similarly, stroke amplitude and deviation are estimated within 3.3 deg. and 2.1 deg., while angle of attack is typically measured within 8.8 deg. comparing against a human digitizer. Using our automated tracker, we analyzed a total of eight voluntary and two escape take-offs. These sequences show that Drosophila melanogaster do not utilize clap and fling during take-off and are able to modify their wing kinematics from one wingstroke to the next. Our approach should enable biomechanists and ethologists to process much larger datasets than possible at present and, therefore, accelerate insight into the mechanisms of free-flight maneuvers of flying insects
Deep Haptic Model Predictive Control for Robot-Assisted Dressing
Robot-assisted dressing offers an opportunity to benefit the lives of many
people with disabilities, such as some older adults. However, robots currently
lack common sense about the physical implications of their actions on people.
The physical implications of dressing are complicated by non-rigid garments,
which can result in a robot indirectly applying high forces to a person's body.
We present a deep recurrent model that, when given a proposed action by the
robot, predicts the forces a garment will apply to a person's body. We also
show that a robot can provide better dressing assistance by using this model
with model predictive control. The predictions made by our model only use
haptic and kinematic observations from the robot's end effector, which are
readily attainable. Collecting training data from real world physical
human-robot interaction can be time consuming, costly, and put people at risk.
Instead, we train our predictive model using data collected in an entirely
self-supervised fashion from a physics-based simulation. We evaluated our
approach with a PR2 robot that attempted to pull a hospital gown onto the arms
of 10 human participants. With a 0.2s prediction horizon, our controller
succeeded at high rates and lowered applied force while navigating the garment
around a persons fist and elbow without getting caught. Shorter prediction
horizons resulted in significantly reduced performance with the sleeve catching
on the participants' fists and elbows, demonstrating the value of our model's
predictions. These behaviors of mitigating catches emerged from our deep
predictive model and the controller objective function, which primarily
penalizes high forces.Comment: 8 pages, 12 figures, 1 table, 2018 IEEE International Conference on
Robotics and Automation (ICRA
An original framework for understanding human actions and body language by using deep neural networks
The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour.
By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way.
These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively.
While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements;
both are essential tasks in many computer vision applications, including event recognition, and video surveillance.
In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided.
The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements.
All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
- …