47 research outputs found

    Feature Similarity and Frequency-Based Weighted Visual Words Codebook Learning Scheme for Human Action Recognition

    Get PDF
    This paper has been presented at : 8th Pacific-Rim Symposium, PSIVT 2017.Human action recognition has become a popular field for computer vision researchers in the recent decade. This paper presents a human action recognition scheme based on a textual information concept inspired by document retrieval systems. Videos are represented using a commonly used local feature representation. In addition, we formulate a new weighted class specific dictionary learning scheme to reflect the importance of visual words for a particular action class. Weighted class specific dictionary learning enriches the scheme to learn a sparse representation for a particular action class. To evaluate our scheme on realistic and complex scenarios, we have tested it on UCF Sports and UCF11 benchmark datasets. This paper reports experimental results that outperform recent state-of-the-art methods for the UCF Sports and the UCF11 dataset i.e. 98.93% and 93.88% in terms of average accuracy respectively. To the best of our knowledge, this contribution is first to apply a weighted class specific dictionary learning method on realistic human action recognition datasets.Sergio A Velastin acknowledges funding by the Universidad Carlos III de Madrid, the European Unions Seventh Framework Programme for research, technological development and demonstration under grant agreement n 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509) and Banco Santander. Authors also acknowledges support from the Directorate of ASR and TD, University of Engineering and Technology Taxila, Pakistan

    Action recognition in video using a spatial-temporal graph-based feature representation

    Get PDF
    We propose a video graph based human action recognition framework. Given an input video sequence, we extract spatio-temporal local features and construct a video graph to incorporate appearance and motion constraints to reflect the spatio-temporal dependencies among features. them. In particular, we extend a popular dbscan density-based clustering algorithm to form an intuitive video graph. During training, we estimate a linear SVM classifier using the standard Bag-of-words method. During classification, we apply Graph-Cut optimization to find the most frequent action label in the constructed graph and assign this label to the test video sequence. The proposed approach achieves stateof-the-art performance with standard human action recognition benchmarks, namely KTH and UCF-sports datasets and competitive results for the Hollywood (HOHA) dataset

    Beyond just keeping hands on the wheel: Towards visual interpretation of driver hand motion patterns

    Full text link
    Abstract — Observing hand activity in the car provides a rich set of patterns relating to vehicle maneuvering, secondary tasks, driver distraction, and driver intent inference. This work strives to develop a vision-based framework for analyzing such patterns in real-time. First, hands are detected and tracked from a monocular camera. This provides position information of the left and right hands with no intrusion over long, naturalistic drives. Second, the motion trajectories are studied in settings of activity recognition, prediction, and higher-level semantic categorization. I

    TREAT: Terse Rapid Edge-Anchored Tracklets

    Get PDF
    Fast computation, efficient memory storage, and performance on par with standard state-of-the-art descriptors make binary descriptors a convenient tool for many computer vision applications. However their development is mostly tailored for static images. To respond to this limitation, we introduce TREAT (Terse Rapid Edge-Anchored Tracklets), a new binary detector and descriptor, based on tracklets. It harnesses moving edge maps to perform efficient feature detection, tracking, and description at low computational cost. Experimental results on 3 different public datasets demonstrate improved performance over other popular binary features. These experiments also provide a basis for benchmarking the performance of binary descriptors in video-based applications
    corecore