1,063 research outputs found

    Multi-view Face Detection Using Deep Convolutional Neural Networks

    Full text link
    In this paper we consider the problem of multi-view face detection. While there has been significant research on this problem, current state-of-the-art approaches for this task require annotation of facial landmarks, e.g. TSM [25], or annotation of face poses [28, 22]. They also require training dozens of models to fully capture faces in all orientations, e.g. 22 models in HeadHunter method [22]. In this paper we propose Deep Dense Face Detector (DDFD), a method that does not require pose/landmark annotation and is able to detect faces in a wide range of orientations using a single model based on deep convolutional neural networks. The proposed method has minimal complexity; unlike other recent deep learning object detection methods [9], it does not require additional components such as segmentation, bounding-box regression, or SVM classifiers. Furthermore, we analyzed scores of the proposed face detector for faces in different orientations and found that 1) the proposed method is able to detect faces from different angles and can handle occlusion to some extent, 2) there seems to be a correlation between dis- tribution of positive examples in the training set and scores of the proposed face detector. The latter suggests that the proposed methods performance can be further improved by using better sampling strategies and more sophisticated data augmentation techniques. Evaluations on popular face detection benchmark datasets show that our single-model face detector algorithm has similar or better performance compared to the previous methods, which are more complex and require annotations of either different poses or facial landmarks.Comment: in International Conference on Multimedia Retrieval 2015 (ICMR

    Discriminatively Trained Latent Ordinal Model for Video Classification

    Full text link
    We study the problem of video classification for facial analysis and human action recognition. We propose a novel weakly supervised learning method that models the video as a sequence of automatically mined, discriminative sub-events (eg. onset and offset phase for "smile", running and jumping for "highjump"). The proposed model is inspired by the recent works on Multiple Instance Learning and latent SVM/HCRF -- it extends such frameworks to model the ordinal aspect in the videos, approximately. We obtain consistent improvements over relevant competitive baselines on four challenging and publicly available video based facial analysis datasets for prediction of expression, clinical pain and intent in dyadic conversations and on three challenging human action datasets. We also validate the method with qualitative results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1604.0150

    Yoga Pose Classification Using Deep Learning

    Get PDF
    Human pose estimation is a deep-rooted problem in computer vision that has exposed many challenges in the past. Analyzing human activities is beneficial in many fields like video- surveillance, biometrics, assisted living, at-home health monitoring etc. With our fast-paced lives these days, people usually prefer exercising at home but feel the need of an instructor to evaluate their exercise form. As these resources are not always available, human pose recognition can be used to build a self-instruction exercise system that allows people to learn and practice exercises correctly by themselves. This project lays the foundation for building such a system by discussing various machine learning and deep learning approaches to accurately classify yoga poses on prerecorded videos and also in real-time. The project also discusses various pose estimation and keypoint detection methods in detail and explains different deep learning models used for pose classification

    In-the-wild Facial Expression Recognition in Extreme Poses

    Full text link
    In the computer research area, facial expression recognition is a hot research problem. Recent years, the research has moved from the lab environment to in-the-wild circumstances. It is challenging, especially under extreme poses. But current expression detection systems are trying to avoid the pose effects and gain the general applicable ability. In this work, we solve the problem in the opposite approach. We consider the head poses and detect the expressions within special head poses. Our work includes two parts: detect the head pose and group it into one pre-defined head pose class; do facial expression recognize within each pose class. Our experiments show that the recognition results with pose class grouping are much better than that of direct recognition without considering poses. We combine the hand-crafted features, SIFT, LBP and geometric feature, with deep learning feature as the representation of the expressions. The handcrafted features are added into the deep learning framework along with the high level deep learning features. As a comparison, we implement SVM and random forest to as the prediction models. To train and test our methodology, we labeled the face dataset with 6 basic expressions.Comment: Published on ICGIP201

    Pose-based Tremor Classification for Parkinson's Disease Diagnosis from Video

    Full text link
    Parkinson's disease (PD) is a progressive neurodegenerative disorder that results in a variety of motor dysfunction symptoms, including tremors, bradykinesia, rigidity and postural instability. The diagnosis of PD mainly relies on clinical experience rather than a definite medical test, and the diagnostic accuracy is only about 73-84% since it is challenged by the subjective opinions or experiences of different medical experts. Therefore, an efficient and interpretable automatic PD diagnosis system is valuable for supporting clinicians with more robust diagnostic decision-making. To this end, we propose to classify Parkinson's tremor since it is one of the most predominant symptoms of PD with strong generalizability. Different from other computer-aided time and resource-consuming Parkinson's Tremor (PT) classification systems that rely on wearable sensors, we propose SPAPNet, which only requires consumer-grade non-intrusive video recording of camera-facing human movements as input to provide undiagnosed patients with low-cost PT classification results as a PD warning sign. For the first time, we propose to use a novel attention module with a lightweight pyramidal channel-squeezing-fusion architecture to extract relevant PT information and filter the noise efficiently. This design aids in improving both classification performance and system interpretability. Experimental results show that our system outperforms state-of-the-arts by achieving a balanced accuracy of 90.9% and an F1-score of 90.6% in classifying PT with the non-PT class.Comment: MICCAI 202

    Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Deep Learning Pose Estimation

    Full text link
    Objective: To apply deep learning pose estimation algorithms for vision-based assessment of parkinsonism and levodopa-induced dyskinesia (LID). Methods: Nine participants with Parkinson's disease (PD) and LID completed a levodopa infusion protocol, where symptoms were assessed at regular intervals using the Unified Dyskinesia Rating Scale (UDysRS) and Unified Parkinson's Disease Rating Scale (UPDRS). A state-of-the-art deep learning pose estimation method was used to extract movement trajectories from videos of PD assessments. Features of the movement trajectories were used to detect and estimate the severity of parkinsonism and LID using random forest. Communication and drinking tasks were used to assess LID, while leg agility and toe tapping tasks were used to assess parkinsonism. Feature sets from tasks were also combined to predict total UDysRS and UPDRS Part III scores. Results: For LID, the communication task yielded the best results for dyskinesia (severity estimation: r = 0.661, detection: AUC = 0.930). For parkinsonism, leg agility had better results for severity estimation (r = 0.618), while toe tapping was better for detection (AUC = 0.773). UDysRS and UPDRS Part III scores were predicted with r = 0.741 and 0.530, respectively. Conclusion: This paper presents the first application of deep learning for vision-based assessment of parkinsonism and LID and demonstrates promising performance for the future translation of deep learning to PD clinical practices. Significance: The proposed system provides insight into the potential of computer vision and deep learning for clinical application in PD.Comment: 8 pages, 1 figure. Under revie
    • …
    corecore