47,587 research outputs found
A Three-level Motion Texture for Human Motion Modeling
Abstract- A three-level motion texture is proposed to model complex human motion that is statistically similar to the original motion data. The three-level structure, namely moton index, moton and moton distribution, is defined to synthesize motions. To describe the continuous and non-linear dynamics of human motion, the motion texture is modeled by a Non-Stationary Switching Linear Dynamic System (NS-SLDS), which improves the Switching Linear Dynamic System (SLDS) by non-stationary functions. A BSK-tree (Binary Key-pose Splitting Tree) retrieval method applied in motons supplies the ability to access data in frame-level. Thus the motion texture can be manipulated at three different levels, by retrieving key-frame in specific moton, by changing the details of a specific motion at the moton-level and by designing a new choreography at the distribution-level. In motion synthesis experiments, the proposed approach was proved flexible and effective. Index Terms- motion texture. moton. NS-SLDS. KBS-tree. I
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Automatic facial expression tracking for 4D range scans
This paper presents a fully automatic approach of spatio-temporal facial expression tracking for 4D range scans without any manual interventions (such as specifying landmarks). The approach consists of three steps: rigid registration, facial model reconstruction, and facial expression tracking. A Scaling Iterative Closest Points (SICP) algorithm is introduced to compute the optimal rigid registration between a template facial model and a range scan with consideration of the scale problem. A deformable model, physically based on thin shells, is proposed to faithfully reconstruct the facial surface and texture from that range data. And then the reconstructed facial model is used to track facial expressions presented in a sequence of range scans by the deformable model
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
Full Reference Objective Quality Assessment for Reconstructed Background Images
With an increased interest in applications that require a clean background
image, such as video surveillance, object tracking, street view imaging and
location-based services on web-based maps, multiple algorithms have been
developed to reconstruct a background image from cluttered scenes.
Traditionally, statistical measures and existing image quality techniques have
been applied for evaluating the quality of the reconstructed background images.
Though these quality assessment methods have been widely used in the past,
their performance in evaluating the perceived quality of the reconstructed
background image has not been verified. In this work, we discuss the
shortcomings in existing metrics and propose a full reference Reconstructed
Background image Quality Index (RBQI) that combines color and structural
information at multiple scales using a probability summation model to predict
the perceived quality in the reconstructed background image given a reference
image. To compare the performance of the proposed quality index with existing
image quality assessment measures, we construct two different datasets
consisting of reconstructed background images and corresponding subjective
scores. The quality assessment measures are evaluated by correlating their
objective scores with human subjective ratings. The correlation results show
that the proposed RBQI outperforms all the existing approaches. Additionally,
the constructed datasets and the corresponding subjective scores provide a
benchmark to evaluate the performance of future metrics that are developed to
evaluate the perceived quality of reconstructed background images.Comment: Associated source code: https://github.com/ashrotre/RBQI, Associated
Database:
https://drive.google.com/drive/folders/1bg8YRPIBcxpKIF9BIPisULPBPcA5x-Bk?usp=sharing
(Email for permissions at: ashrotreasuedu
A dynamic texture based approach to recognition of facial actions and their temporal models
In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set
- …