2,654 research outputs found
Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos
In recent years, heatmap regression based models have shown their
effectiveness in face alignment and pose estimation. However, Conventional
Heatmap Regression (CHR) is not accurate nor stable when dealing with
high-resolution facial videos, since it finds the maximum activated location in
heatmaps which are generated from rounding coordinates, and thus leads to
quantization errors when scaling back to the original high-resolution space. In
this paper, we propose a Fractional Heatmap Regression (FHR) for
high-resolution video-based face alignment. The proposed FHR can accurately
estimate the fractional part according to the 2D Gaussian function by sampling
three points in heatmaps. To further stabilize the landmarks among continuous
video frames while maintaining the precise at the same time, we propose a novel
stabilization loss that contains two terms to address time delay and non-smooth
issues, respectively. Experiments on 300W, 300-VW and Talking Face datasets
clearly demonstrate that the proposed method is more accurate and stable than
the state-of-the-art models.Comment: Accepted to AAAI 2019. 8 pages, 7 figure
HumanTOMATO: Text-aligned Whole-body Motion Generation
This work targets a novel text-driven whole-body motion generation task,
which takes a given textual description as input and aims at generating
high-quality, diverse, and coherent facial expressions, hand gestures, and body
motions simultaneously. Previous works on text-driven motion generation tasks
mainly have two limitations: they ignore the key role of fine-grained hand and
face controlling in vivid whole-body motion generation, and lack a good
alignment between text and motion. To address such limitations, we propose a
Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which
is the first attempt to our knowledge towards applicable holistic motion
generation in this research area. To tackle this challenging task, our solution
includes two key designs: (1) a Holistic Hierarchical VQ-VAE (aka HVQ) and
a Hierarchical-GPT for fine-grained body and hand motion reconstruction and
generation with two structured codebooks; and (2) a pre-trained
text-motion-alignment model to help generated motion align with the input
textual description explicitly. Comprehensive experiments verify that our model
has significant advantages in both the quality of generated motions and their
alignment with text.Comment: 31 pages, 15 figures, 16 tables. Project page:
https://lhchen.top/HumanTOMAT
Teaching Introductory Programming Concepts through a Gesture-Based Interface
Computer programming is an integral part of a technology driven society, so there is a tremendous need to teach programming to a wider audience. One of the challenges in meeting this demand for programmers is that most traditional computer programming classes are targeted to university/college students with strong math backgrounds. To expand the computer programming workforce, we need to encourage a wider range of students to learn about programming.
The goal of this research is to design and implement a gesture-driven interface to teach computer programming to young and non-traditional students. We designed our user interface based on the feedback from students attending the College of Engineering summer camps at the University of Arkansas. Our system uses the Microsoft Xbox Kinect to capture the movements of new programmers as they use our system. Our software then tracks and interprets student hand movements in order to recognize specific gestures which correspond to different programming constructs, and uses this information to create and execute programs using the Google Blockly visual programming framework.
We focus on various gesture recognition algorithms to interpret user data as specific gestures, including template matching, sector quantization, and supervised machine learning clustering algorithms
- …