20 research outputs found

    Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients

    Get PDF
    In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlapping temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%

    TREAT: Terse Rapid Edge-Anchored Tracklets

    Get PDF
    Fast computation, efficient memory storage, and performance on par with standard state-of-the-art descriptors make binary descriptors a convenient tool for many computer vision applications. However their development is mostly tailored for static images. To respond to this limitation, we introduce TREAT (Terse Rapid Edge-Anchored Tracklets), a new binary detector and descriptor, based on tracklets. It harnesses moving edge maps to perform efficient feature detection, tracking, and description at low computational cost. Experimental results on 3 different public datasets demonstrate improved performance over other popular binary features. These experiments also provide a basis for benchmarking the performance of binary descriptors in video-based applications

    BiLSTM with CNN Features For HAR in Videos

    Get PDF
    El reconocimiento de acciones en videos es actualmente un tema de inter茅s en el 谩rea de visi贸n por computadora debido a sus potenciales aplicaciones tales como indexaci贸n en multimedia, vigilancia en espacios p煤blicos, entre otras. En este trabajo proponemos una arquitectura CNN-BiLSTM. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las caracter铆sticas del video de entrada. Luego, un BiLSTM clasifica el video en una clase en particular. Evaluamos el rendimiento de nuestro sistema utilizando la precisi贸n como m茅trica de evaluaci贸n, obteniendo 40.9% y 78.1% para los conjuntos de datos HMDB-51 y LTCF-101 respectivamente.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Towards Real time activity recognition

    Get PDF

    Reconocimiento de Acciones Humanas en Videos usando una Red Neuronal CNN LSTM Robusta

    Get PDF
    Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN鈥揕STM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task.El reconocimiento de acciones en videos es actualmente un tema de inter茅s en el 谩rea de visi贸n por computadora, debido a potenciales aplicaciones como: indexaci贸n multimedia, vigilancia en espacios p煤blicos, entre otras. En este art铆culo proponemos: (1) Implementar una arquitectura CNN鈥揕STM para esta tarea. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las caracter铆sticas del video de entrada. Luego, una capa LSTM determina la clase particular del video. (2) Estudiar c贸mo la cantidad de unidades LSTM afecta el rendimiento del sistema. Para llevar a cabo las fases de entrenamiento y prueba, utilizamos los conjuntos de datos KTH, UCF-11 y HMDB-51. (3) Evaluar el rendimiento de nuestro sistema utilizando la precisi贸n como m茅trica de evaluaci贸n, dado el balance existente entre las clases de los conjuntos de datos. Obtenemos un 93%, 91% y 47% de precisi贸n respectivamente para cada conjunto de datos, mejorando los resultados del estado del arte para los primeros dos. Adem谩s de los resultados obtenidos, la principal contribuci贸n de este trabajo yace en la evaluaci贸n de diferentes arquitecturas CNN-LSTM para la tarea de reconocimiento de acciones

    Going Deeper into Action Recognition: A Survey

    Full text link
    Understanding human actions in visual data is tied to advances in complementary research areas including object recognition, human dynamics, domain adaptation and semantic segmentation. Over the last decade, human action analysis evolved from earlier schemes that are often limited to controlled environments to nowadays advanced solutions that can learn from millions of videos and apply to almost all daily activities. Given the broad range of applications from video surveillance to human-computer interaction, scientific milestones in action recognition are achieved more rapidly, eventually leading to the demise of what used to be good in a short time. This motivated us to provide a comprehensive review of the notable steps taken towards recognizing human actions. To this end, we start our discussion with the pioneering methods that use handcrafted representations, and then, navigate into the realm of deep learning based approaches. We aim to remain objective throughout this survey, touching upon encouraging improvements as well as inevitable fallbacks, in the hope of raising fresh questions and motivating new research directions for the reader
    corecore