1,257 research outputs found

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Using video objects and relevance feedback in video retrieval

    Get PDF
    Video retrieval is mostly based on using text from dialogue and this remains the most signi¯cant component, despite progress in other aspects. One problem with this is when a searcher wants to locate video based on what is appearing in the video rather than what is being spoken about. Alternatives such as automatically-detected features and image-based keyframe matching can be used, though these still need further improvement in quality. One other modality for video retrieval is based on segmenting objects from video and allowing end users to use these as part of querying. This uses similarity between query objects and objects from video, and in theory allows retrieval based on what is actually appearing on-screen. The main hurdles to greater use of this are the overhead of object segmentation on large amounts of video and the issue of whether we can actually achieve effective object-based retrieval. We describe a system to support object-based video retrieval where a user selects example video objects as part of the query. During a search a user builds up a set of these which are matched against objects previously segmented from a video library. This match is based on MPEG-7 Dominant Colour, Shape Compaction and Texture Browsing descriptors. We use a user-driven semi-automated segmentation process to segment the video archive which is very accurate and is faster than conventional video annotation

    A computer vision approach to classification of birds in flight from video sequences

    Get PDF
    Bird populations are an important bio-indicator; so collecting reliable data is useful for ecologists helping conserve and manage fragile ecosystems. However, existing manual monitoring methods are labour-intensive, time-consuming, and error-prone. The aim of our work is to develop a reliable system, capable of automatically classifying individual bird species in flight from videos. This is challenging, but appropriate for use in the field, since there is often a requirement to identify in flight, rather than when stationary. We present our work in progress, which uses combined appearance and motion features to classify and present experimental results across seven species using Normal Bayes classifier with majority voting and achieving a classification rate of 86%

    The aceToolbox: low-level audiovisual feature extraction for retrieval and classification

    Get PDF
    In this paper we present an overview of a software platform that has been developed within the aceMedia project, termed the aceToolbox, that provides global and local lowlevel feature extraction from audio-visual content. The toolbox is based on the MPEG-7 eXperimental Model (XM), with extensions to provide descriptor extraction from arbitrarily shaped image segments, thereby supporting local descriptors reflecting real image content. We describe the architecture of the toolbox as well as providing an overview of the descriptors supported to date. We also briefly describe the segmentation algorithm provided. We then demonstrate the usefulness of the toolbox in the context of two different content processing scenarios: similarity-based retrieval in large collections and scene-level classification of still images

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    Automatic classification of flying bird species using computer vision techniques [forthcoming]

    Get PDF
    Bird populations are identified as important biodiversity indicators, so collecting reliable population data is important to ecologists and scientists. However, existing manual monitoring methods are labour-intensive, time-consuming, and potentially error prone. The aim of our work is to develop a reliable automated system, capable of classifying the species of individual birds, during flight, using video data. This is challenging, but appropriate for use in the field, since there is often a requirement to identify in flight, rather than while stationary. We present our work, which uses a new and rich set of appearance features for classification from video. We also introduce motion features including curvature and wing beat frequency. Combined with Normal Bayes classifier and a Support Vector Machine classifier, we present experimental evaluations of our appearance and motion features across a data set comprising 7 species. Using our appearance feature set alone we achieved a classification rate of 92% and 89% (using Normal Bayes and SVM classifiers respectively) which significantly outperforms a recent comparable state-of-the-art system. Using motion features alone we achieved a lower-classification rate, but motivate our on-going work which we seeks to combine these appearance and motion feature to achieve even more robust classification
    corecore