8 research outputs found

    CNN-LSTM Architecture for Action Recognition in Videos

    Get PDF
    Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose a CNN{LSTM architecture. First, a pre-trained VGG16 convolutional neuronal networks extracts the features of the input video. Then, a LSTM classi es the video in a particular class. To carry out the training and the test, we used the UCF-11 dataset. Evaluate the performance of our system using the evaluation metric in accuracy. We apply LOOCV with k = 25, we obtain ~ 98% and ~ 91% for training and test respectively.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    CNN-LSTM Architecture for Action Recognition in Videos

    Get PDF
    Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose a CNN{LSTM architecture. First, a pre-trained VGG16 convolutional neuronal networks extracts the features of the input video. Then, a LSTM classi es the video in a particular class. To carry out the training and the test, we used the UCF-11 dataset. Evaluate the performance of our system using the evaluation metric in accuracy. We apply LOOCV with k = 25, we obtain ~ 98% and ~ 91% for training and test respectively.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    CNN-LSTM Architecture for Action Recognition in Videos

    Get PDF
    Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose a CNN{LSTM architecture. First, a pre-trained VGG16 convolutional neuronal networks extracts the features of the input video. Then, a LSTM classi es the video in a particular class. To carry out the training and the test, we used the UCF-11 dataset. Evaluate the performance of our system using the evaluation metric in accuracy. We apply LOOCV with k = 25, we obtain ~ 98% and ~ 91% for training and test respectively.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition

    Full text link
    Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs unscalable and difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art competitors.Comment: 9 page

    Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

    Full text link
    Recurrent Neural Networks (RNNs) are powerful sequence modeling tools. However, when dealing with high dimensional inputs, the training of RNNs becomes computational expensive due to the large number of model parameters. This hinders RNNs from solving many important computer vision tasks, such as Action Recognition in Videos and Image Captioning. To overcome this problem, we propose a compact and flexible structure, namely Block-Term tensor decomposition, which greatly reduces the parameters of RNNs and improves their training efficiency. Compared with alternative low-rank approximations, such as tensor-train RNN (TT-RNN), our method, Block-Term RNN (BT-RNN), is not only more concise (when using the same rank), but also able to attain a better approximation to the original RNNs with much fewer parameters. On three challenging tasks, including Action Recognition in Videos, Image Captioning and Image Generation, BT-RNN outperforms TT-RNN and the standard RNN in terms of both prediction accuracy and convergence rate. Specifically, BT-LSTM utilizes 17,388 times fewer parameters than the standard LSTM to achieve an accuracy improvement over 15.6\% in the Action Recognition task on the UCF11 dataset.Comment: CVPR201

    BiLSTM with CNN Features For HAR in Videos

    Get PDF
    El reconocimiento de acciones en videos es actualmente un tema de inter茅s en el 谩rea de visi贸n por computadora debido a sus potenciales aplicaciones tales como indexaci贸n en multimedia, vigilancia en espacios p煤blicos, entre otras. En este trabajo proponemos una arquitectura CNN-BiLSTM. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las caracter铆sticas del video de entrada. Luego, un BiLSTM clasifica el video en una clase en particular. Evaluamos el rendimiento de nuestro sistema utilizando la precisi贸n como m茅trica de evaluaci贸n, obteniendo 40.9% y 78.1% para los conjuntos de datos HMDB-51 y LTCF-101 respectivamente.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    Reconocimiento de Acciones Humanas en Videos usando una Red Neuronal CNN LSTM Robusta

    Get PDF
    Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. In this paper we propose (1) The implementation of a CNN鈥揕STM architecture. First, a pre-trained VGG16 convolutional neural network extracts the features of the input video. Then, an LSTM classifies the video sequence in a particular class. (2) A study of how the number of LSTM units affects the performance of the system. To carry out the training and test phases, we used the KTH, UCF-11 and HMDB-51 datasets. (3) An evaluation of the performance of our system using accuracy as evaluation metric, given the existing balance of the classes in the datasets. We obtain 93%, 91% and 47% accuracy respectively for each dataset, improving state of the art results for the former two. Besides the results attained, the main contribution of this work lays on the evaluation of different CNN-LSTM architectures for the action recognition task.El reconocimiento de acciones en videos es actualmente un tema de inter茅s en el 谩rea de visi贸n por computadora, debido a potenciales aplicaciones como: indexaci贸n multimedia, vigilancia en espacios p煤blicos, entre otras. En este art铆culo proponemos: (1) Implementar una arquitectura CNN鈥揕STM para esta tarea. Primero, una red neuronal convolucional VGG16 previamente entrenada extrae las caracter铆sticas del video de entrada. Luego, una capa LSTM determina la clase particular del video. (2) Estudiar c贸mo la cantidad de unidades LSTM afecta el rendimiento del sistema. Para llevar a cabo las fases de entrenamiento y prueba, utilizamos los conjuntos de datos KTH, UCF-11 y HMDB-51. (3) Evaluar el rendimiento de nuestro sistema utilizando la precisi贸n como m茅trica de evaluaci贸n, dado el balance existente entre las clases de los conjuntos de datos. Obtenemos un 93%, 91% y 47% de precisi贸n respectivamente para cada conjunto de datos, mejorando los resultados del estado del arte para los primeros dos. Adem谩s de los resultados obtenidos, la principal contribuci贸n de este trabajo yace en la evaluaci贸n de diferentes arquitecturas CNN-LSTM para la tarea de reconocimiento de acciones

    Spatial-temporal motion information integration for action detection and recognition in non-static background

    No full text
    Various motion detection methods have been proposed in the past decade, but there are seldom attempts to investigate the advantages and disadvantages of different detection mechanisms so that they can complement each other to achieve a better performance. Toward such a demand, this paper proposes a human action detection and recognition framework to bridge the semantic gap between low-level pixel intensity change and the high-level understanding of the meaning of an action. To achieve a robust estimation of the region of action with the complexities of an uncontrolled background, we propose the combination of the optical flow field and Harris3D corner detector to obtain a new spatial-temporal estimation in the video sequences. The action detection method, considering the integrated motion information, works well with the dynamic background and camera motion, and demonstrates the advantage of the proposed method of integrating multiple spatial-temporal cues. Then the local features (SIFT and STIP) extracted from the estimated region of action are used to learn the Universal Background Model (UBM) for the action recognition task. The experimental results on KTH and UCF YouTube Action (UCF11) data sets show that the proposed action detection and recognition framework can not only better estimate the region of action but also achieve better recognition accuracy comparing with the peer work
    corecore