3,962 research outputs found

    Multimedia information technology and the annotation of video

    Get PDF
    The state of the art in multimedia information technology has not progressed to the point where a single solution is available to meet all reasonable needs of documentalists and users of video archives. In general, we do not have an optimistic view of the usability of new technology in this domain, but digitization and digital power can be expected to cause a small revolution in the area of video archiving. The volume of data leads to two views of the future: on the pessimistic side, overload of data will cause lack of annotation capacity, and on the optimistic side, there will be enough data from which to learn selected concepts that can be deployed to support automatic annotation. At the threshold of this interesting era, we make an attempt to describe the state of the art in technology. We sample the progress in text, sound, and image processing, as well as in machine learning

    The first Facial Landmark Tracking in-the-Wild Challenge: benchmark and results

    Get PDF
    Detection and tracking of faces in image sequences is among the most well studied problems in the intersection of statistical machine learning and computer vision. Often, tracking and detection methodologies use a rigid representation to describe the facial region 1, hence they can neither capture nor exploit the non-rigid facial deformations, which are crucial for countless of applications (e.g., facial expression analysis, facial motion capture, high-performance face recognition etc.). Usually, the non-rigid deformations are captured by locating and tracking the position of a set of fiducial facial landmarks (e.g., eyes, nose, mouth etc.). Recently, we witnessed a burst of research in automatic facial landmark localisation in static imagery. This is partly attributed to the availability of large amount of annotated data, many of which have been provided by the first facial landmark localisation challenge (also known as 300-W challenge). Even though now well established benchmarks exist for facial landmark localisation in static imagery, to the best of our knowledge, there is no established benchmark for assessing the performance of facial landmark tracking methodologies, containing an adequate number of annotated face videos. In conjunction with ICCV’2015 we run the first competition/challenge on facial landmark tracking in long-term videos. In this paper, we present the first benchmark for long-term facial landmark tracking, containing currently over 110 annotated videos, and we summarise the results of the competition

    Person Recognition in Personal Photo Collections

    Full text link
    Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).Comment: Accepted to ICCV 2015, revise

    Multi-View Face Recognition From Single RGBD Models of the Faces

    Get PDF
    This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks
    • …
    corecore