4,770 research outputs found

    High-level feature detection from video in TRECVid: a 5-year retrospective of achievements

    Get PDF
    Successful and effective content-based access to digital video requires fast, accurate and scalable methods to determine the video content automatically. A variety of contemporary approaches to this rely on text taken from speech within the video, or on matching one video frame against others using low-level characteristics like colour, texture, or shapes, or on determining and matching objects appearing within the video. Possibly the most important technique, however, is one which determines the presence or absence of a high-level or semantic feature, within a video clip or shot. By utilizing dozens, hundreds or even thousands of such semantic features we can support many kinds of content-based video navigation. Critically however, this depends on being able to determine whether each feature is or is not present in a video clip. The last 5 years have seen much progress in the development of techniques to determine the presence of semantic features within video. This progress can be tracked in the annual TRECVid benchmarking activity where dozens of research groups measure the effectiveness of their techniques on common data and using an open, metrics-based approach. In this chapter we summarise the work done on the TRECVid high-level feature task, showing the progress made year-on-year. This provides a fairly comprehensive statement on where the state-of-the-art is regarding this important task, not just for one research group or for one approach, but across the spectrum. We then use this past and on-going work as a basis for highlighting the trends that are emerging in this area, and the questions which remain to be addressed before we can achieve large-scale, fast and reliable high-level feature detection on video

    Action Recognition in Tennis Videos using Optical Flow and Conditional Random Fields

    Get PDF
    The aim of Action Recognition is the automated analysis and interpretation of events in video sequences. As result of the applications that can be developed, and the widespread availability and popularization of digital video (security cameras, monitoring, social networks, among many other), this area is currently the focus of a strong and wide research interest in various domains such as video security, humancomputer interaction, patient monitoring and video retrieval, among others. Our long-term goal is to develop automatic action identification in video sequences using Conditional Random Fields (CRFs). In this work we focus, as a case of study, in the identification of a limited set of tennis shots during tennis matches. Three challenges have been addressed: player tracking, player movements representation and action recognition. Video processing techniques are used to generate textual tags in specific frames, and then the CRFs are used as a classifier to recognise the actions performed in those frames. The preliminary results appear to be quite promising.Sociedad Argentina de Informática e Investigación Operativ

    Unconstrained Face Detection and Open-Set Face Recognition Challenge

    Full text link
    Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses, weather conditions and image blur. Although face verification or closed-set face identification have surpassed human capabilities on some datasets, open-set identification is much more complex as it needs to reject both unknown identities and false accepts from the face detector. We show that unconstrained face detection can approach high detection rates albeit with moderate false accept rates. By contrast, open-set face recognition is currently weak and requires much more attention.Comment: This is an ERRATA version of the paper originally presented at the International Joint Conference on Biometrics. Due to a bug in our evaluation code, the results of the participants changed. The final conclusion, however, is still the sam

    Action Recognition in Tennis Videos using Optical Flow and Conditional Random Fields

    Get PDF
    The aim of Action Recognition is the automated analysis and interpretation of events in video sequences. As result of the applications that can be developed, and the widespread availability and popularization of digital video (security cameras, monitoring, social networks, among many other), this area is currently the focus of a strong and wide research interest in various domains such as video security, humancomputer interaction, patient monitoring and video retrieval, among others. Our long-term goal is to develop automatic action identification in video sequences using Conditional Random Fields (CRFs). In this work we focus, as a case of study, in the identification of a limited set of tennis shots during tennis matches. Three challenges have been addressed: player tracking, player movements representation and action recognition. Video processing techniques are used to generate textual tags in specific frames, and then the CRFs are used as a classifier to recognise the actions performed in those frames. The preliminary results appear to be quite promising.Sociedad Argentina de Informática e Investigación Operativ

    Sparsity in Dynamics of Spontaneous Subtle Emotions: Analysis \& Application

    Full text link
    Spontaneous subtle emotions are expressed through micro-expressions, which are tiny, sudden and short-lived dynamics of facial muscles; thus poses a great challenge for visual recognition. The abrupt but significant dynamics for the recognition task are temporally sparse while the rest, irrelevant dynamics, are temporally redundant. In this work, we analyze and enforce sparsity constrains to learn significant temporal and spectral structures while eliminate irrelevant facial dynamics of micro-expressions, which would ease the challenge in the visual recognition of spontaneous subtle emotions. The hypothesis is confirmed through experimental results of automatic spontaneous subtle emotion recognition with several sparsity levels on CASME II and SMIC, the only two publicly available spontaneous subtle emotion databases. The overall performances of the automatic subtle emotion recognition are boosted when only significant dynamics are preserved from the original sequences.Comment: IEEE Transaction of Affective Computing (2016

    ViSOR: VIdeo Surveillance On-line Repository for annotation retrieval

    Full text link
    Aim of the Visor Project [1] is to gather and make freely available a repository of surveillance and video footages for the research community on pattern recognition and multimedia retrieval. Th
    corecore