5,721 research outputs found

    Learning Human Pose Estimation Features with Convolutional Networks

    Full text link
    This paper introduces a new architecture for human pose estimation using a multi- layer convolutional network architecture and a modified learning technique that learns low-level features and higher-level weak spatial models. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows significant improvement over the current state-of-the-art results. The main contribution of this paper is showing, for the first time, that a specific variation of deep learning is able to outperform all existing traditional architectures on this task. The paper also discusses several lessons learned while researching alternatives, most notably, that it is possible to learn strong low-level feature detectors on features that might even just cover a few pixels in the image. Higher-level spatial models improve somewhat the overall result, but to a much lesser extent then expected. Many researchers previously argued that the kinematic structure and top-down information is crucial for this domain, but with our purely bottom up, and weak spatial model, we could improve other more complicated architectures that currently produce the best results. This mirrors what many other researchers, like those in the speech recognition, object recognition, and other domains have experienced

    A Differential Approach for Gaze Estimation

    Full text link
    Non-invasive gaze estimation methods usually regress gaze directions directly from a single face or eye image. However, due to important variabilities in eye shapes and inner eye structures amongst individuals, universal models obtain limited accuracies and their output usually exhibit high variance as well as biases which are subject dependent. Therefore, increasing accuracy is usually done through calibration, allowing gaze predictions for a subject to be mapped to his/her actual gaze. In this paper, we introduce a novel image differential method for gaze estimation. We propose to directly train a differential convolutional neural network to predict the gaze differences between two eye input images of the same subject. Then, given a set of subject specific calibration images, we can use the inferred differences to predict the gaze direction of a novel eye sample. The assumption is that by allowing the comparison between two eye images, annoyance factors (alignment, eyelid closing, illumination perturbations) which usually plague single image prediction methods can be much reduced, allowing better prediction altogether. Experiments on 3 public datasets validate our approach which constantly outperforms state-of-the-art methods even when using only one calibration sample or when the latter methods are followed by subject specific gaze adaptation.Comment: Extension to our paper A differential approach for gaze estimation with calibration (BMVC 2018) Submitted to PAMI on Aug. 7th, 2018 Accepted by PAMI short on Dec. 2019, in IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Learning to Personalize in Appearance-Based Gaze Tracking

    Full text link
    Personal variations severely limit the performance of appearance-based gaze tracking. Adapting to these variations using standard neural network model adaptation methods is difficult. The problems range from overfitting, due to small amounts of training data, to underfitting, due to restrictive model architectures. We tackle these problems by introducing the SPatial Adaptive GaZe Estimator (SPAZE). By modeling personal variations as a low-dimensional latent parameter space, SPAZE provides just enough adaptability to capture the range of personal variations without being prone to overfitting. Calibrating SPAZE for a new person reduces to solving a small optimization problem. SPAZE achieves an error of 2.70 degrees with 9 calibration samples on MPIIGaze, improving on the state-of-the-art by 14 %. We contribute to gaze tracking research by empirically showing that personal variations are well-modeled as a 3-dimensional latent parameter space for each eye. We show that this low-dimensionality is expected by examining model-based approaches to gaze tracking. We also show that accurate head pose-free gaze tracking is possible
    corecore