24,368 research outputs found

    Unobtrusive and pervasive video-based eye-gaze tracking

    Get PDF
    Eye-gaze tracking has long been considered a desktop technology that finds its use inside the traditional office setting, where the operating conditions may be controlled. Nonetheless, recent advancements in mobile technology and a growing interest in capturing natural human behaviour have motivated an emerging interest in tracking eye movements within unconstrained real-life conditions, referred to as pervasive eye-gaze tracking. This critical review focuses on emerging passive and unobtrusive video-based eye-gaze tracking methods in recent literature, with the aim to identify different research avenues that are being followed in response to the challenges of pervasive eye-gaze tracking. Different eye-gaze tracking approaches are discussed in order to bring out their strengths and weaknesses, and to identify any limitations, within the context of pervasive eye-gaze tracking, that have yet to be considered by the computer vision community.peer-reviewe

    Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

    Get PDF
    Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio

    Robust pedestrian detection and tracking in crowded scenes

    Get PDF
    In this paper, a robust computer vision approach to detecting and tracking pedestrians in unconstrained crowded scenes is presented. Pedestrian detection is performed via a 3D clustering process within a region-growing framework. The clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. Pedestrian tracking is achieved by formulating the track matching process as a weighted bipartite graph and using a Weighted Maximum Cardinality Matching scheme. The approach is evaluated using both indoor and outdoor sequences, captured using a variety of different camera placements and orientations, that feature significant challenges in terms of the number of pedestrians present, their interactions and scene lighting conditions. The evaluation is performed against a manually generated groundtruth for all sequences. Results point to the extremely accurate performance of the proposed approach in all cases

    Height from Photometric Ratio with Model-based Light Source Selection

    Get PDF
    In this paper, we present a photometric stereo algorithm for estimating surface height. We follow recent work that uses photometric ratios to obtain a linear formulation relating surface gradients and image intensity. Using smoothed finite difference approximations for the surface gradient, we are able to express surface height recovery as a linear least squares problem that is large but sparse. In order to make the method practically useful, we combine it with a model-based approach that excludes observations which deviate from the assumptions made by the image formation model. Despite its simplicity, we show that our algorithm provides surface height estimates of a high quality even for objects with highly non-Lambertian appearance. We evaluate the method on both synthetic images with ground truth and challenging real images that contain strong specular reflections and cast shadows

    Face tracking and pose estimation with automatic three-dimensional model construction

    Get PDF
    A method for robustly tracking and estimating the face pose of a person using stereo vision is presented. The method is invariant to identity and does not require previous training. A face model is automatically initialised and constructed online: a fixed point distribution is superposed over the face when it is frontal to the cameras, and several appropriate points close to those locations are chosen for tracking. Using the stereo correspondence of the cameras, the three-dimensional (3D) coordinates of these points are extracted, and the 3D model is created. The 2D projections of the model points are tracked separately on the left and right images using SMAT. RANSAC and POSIT are used for 3D pose estimation. Head rotations up to ±45° are correctly estimated. The approach runs in real time. The purpose of this method is to serve as the basis of a driver monitoring system, and has been tested on sequences recorded in a moving car.Ministerio de Educación y CienciaComunidad de Madri

    3D Face Recognition

    Get PDF

    VGGFace2: A dataset for recognising faces across pose and age

    Full text link
    In this paper, we introduce a new large-scale face dataset named VGGFace2. The dataset contains 3.31 million images of 9131 subjects, with an average of 362.6 images for each subject. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession (e.g. actors, athletes, politicians). The dataset was collected with three goals in mind: (i) to have both a large number of identities and also a large number of images for each identity; (ii) to cover a large range of pose, age and ethnicity; and (iii) to minimize the label noise. We describe how the dataset was collected, in particular the automated and manual filtering stages to ensure a high accuracy for the images of each identity. To assess face recognition performance using the new dataset, we train ResNet-50 (with and without Squeeze-and-Excitation blocks) Convolutional Neural Networks on VGGFace2, on MS- Celeb-1M, and on their union, and show that training on VGGFace2 leads to improved recognition performance over pose and age. Finally, using the models trained on these datasets, we demonstrate state-of-the-art performance on all the IARPA Janus face recognition benchmarks, e.g. IJB-A, IJB-B and IJB-C, exceeding the previous state-of-the-art by a large margin. Datasets and models are publicly available.Comment: This paper has been accepted by IEEE Conference on Automatic Face and Gesture Recognition (F&G), 2018. (Oral
    corecore