1,063 research outputs found
Multi-view Face Detection Using Deep Convolutional Neural Networks
In this paper we consider the problem of multi-view face detection. While
there has been significant research on this problem, current state-of-the-art
approaches for this task require annotation of facial landmarks, e.g. TSM [25],
or annotation of face poses [28, 22]. They also require training dozens of
models to fully capture faces in all orientations, e.g. 22 models in HeadHunter
method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method
that does not require pose/landmark annotation and is able to detect faces in a
wide range of orientations using a single model based on deep convolutional
neural networks. The proposed method has minimal complexity; unlike other
recent deep learning object detection methods [9], it does not require
additional components such as segmentation, bounding-box regression, or SVM
classifiers. Furthermore, we analyzed scores of the proposed face detector for
faces in different orientations and found that 1) the proposed method is able
to detect faces from different angles and can handle occlusion to some extent,
2) there seems to be a correlation between dis- tribution of positive examples
in the training set and scores of the proposed face detector. The latter
suggests that the proposed methods performance can be further improved by using
better sampling strategies and more sophisticated data augmentation techniques.
Evaluations on popular face detection benchmark datasets show that our
single-model face detector algorithm has similar or better performance compared
to the previous methods, which are more complex and require annotations of
either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Yoga Pose Classification Using Deep Learning
Human pose estimation is a deep-rooted problem in computer vision that has exposed many challenges in the past. Analyzing human activities is beneficial in many fields like video- surveillance, biometrics, assisted living, at-home health monitoring etc. With our fast-paced lives these days, people usually prefer exercising at home but feel the need of an instructor to evaluate their exercise form. As these resources are not always available, human pose recognition can be used to build a self-instruction exercise system that allows people to learn and practice exercises correctly by themselves. This project lays the foundation for building such a system by discussing various machine learning and deep learning approaches to accurately classify yoga poses on prerecorded videos and also in real-time. The project also discusses various pose estimation and keypoint detection methods in detail and explains different deep learning models used for pose classification
In-the-wild Facial Expression Recognition in Extreme Poses
In the computer research area, facial expression recognition is a hot
research problem. Recent years, the research has moved from the lab environment
to in-the-wild circumstances. It is challenging, especially under extreme
poses. But current expression detection systems are trying to avoid the pose
effects and gain the general applicable ability. In this work, we solve the
problem in the opposite approach. We consider the head poses and detect the
expressions within special head poses. Our work includes two parts: detect the
head pose and group it into one pre-defined head pose class; do facial
expression recognize within each pose class. Our experiments show that the
recognition results with pose class grouping are much better than that of
direct recognition without considering poses. We combine the hand-crafted
features, SIFT, LBP and geometric feature, with deep learning feature as the
representation of the expressions. The handcrafted features are added into the
deep learning framework along with the high level deep learning features. As a
comparison, we implement SVM and random forest to as the prediction models. To
train and test our methodology, we labeled the face dataset with 6 basic
expressions.Comment: Published on ICGIP201
Pose-based Tremor Classification for Parkinson's Disease Diagnosis from Video
Parkinson's disease (PD) is a progressive neurodegenerative disorder that
results in a variety of motor dysfunction symptoms, including tremors,
bradykinesia, rigidity and postural instability. The diagnosis of PD mainly
relies on clinical experience rather than a definite medical test, and the
diagnostic accuracy is only about 73-84% since it is challenged by the
subjective opinions or experiences of different medical experts. Therefore, an
efficient and interpretable automatic PD diagnosis system is valuable for
supporting clinicians with more robust diagnostic decision-making. To this end,
we propose to classify Parkinson's tremor since it is one of the most
predominant symptoms of PD with strong generalizability. Different from other
computer-aided time and resource-consuming Parkinson's Tremor (PT)
classification systems that rely on wearable sensors, we propose SPAPNet, which
only requires consumer-grade non-intrusive video recording of camera-facing
human movements as input to provide undiagnosed patients with low-cost PT
classification results as a PD warning sign. For the first time, we propose to
use a novel attention module with a lightweight pyramidal
channel-squeezing-fusion architecture to extract relevant PT information and
filter the noise efficiently. This design aids in improving both classification
performance and system interpretability. Experimental results show that our
system outperforms state-of-the-arts by achieving a balanced accuracy of 90.9%
and an F1-score of 90.6% in classifying PT with the non-PT class.Comment: MICCAI 202
Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Deep Learning Pose Estimation
Objective: To apply deep learning pose estimation algorithms for vision-based
assessment of parkinsonism and levodopa-induced dyskinesia (LID). Methods: Nine
participants with Parkinson's disease (PD) and LID completed a levodopa
infusion protocol, where symptoms were assessed at regular intervals using the
Unified Dyskinesia Rating Scale (UDysRS) and Unified Parkinson's Disease Rating
Scale (UPDRS). A state-of-the-art deep learning pose estimation method was used
to extract movement trajectories from videos of PD assessments. Features of the
movement trajectories were used to detect and estimate the severity of
parkinsonism and LID using random forest. Communication and drinking tasks were
used to assess LID, while leg agility and toe tapping tasks were used to assess
parkinsonism. Feature sets from tasks were also combined to predict total
UDysRS and UPDRS Part III scores. Results: For LID, the communication task
yielded the best results for dyskinesia (severity estimation: r = 0.661,
detection: AUC = 0.930). For parkinsonism, leg agility had better results for
severity estimation (r = 0.618), while toe tapping was better for detection
(AUC = 0.773). UDysRS and UPDRS Part III scores were predicted with r = 0.741
and 0.530, respectively. Conclusion: This paper presents the first application
of deep learning for vision-based assessment of parkinsonism and LID and
demonstrates promising performance for the future translation of deep learning
to PD clinical practices. Significance: The proposed system provides insight
into the potential of computer vision and deep learning for clinical
application in PD.Comment: 8 pages, 1 figure. Under revie
- …