Search CORE

8 research outputs found

Dilated Temporal Relational Adversarial Network for Generic Video Summarization

Author: Kampffmeyer Michael
Liang Xiaodan
Tan Min
Xing Eric P.
Zhang Dingwen
Zhang Yujia
Publication venue
Publication date: 01/01/2019
Field of study

The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it selects the set of key frames, which contain the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced to enhance temporal representation capturing. The generator uses this unit to effectively exploit global multi-scale temporal context to select key frames and to complement the commonly used Bi-LSTM. To ensure that summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on three public datasets show the effectiveness of the proposed approach

arXiv.org e-Print Archive

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Contextual RNN-GANs for Abstract Reasoning Diagram Generation

Author: Bansal Mohit
Ghosh Arnab
Kulharia Viveka
Mukerjee Amitabha
Namboodiri Vinay
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 06/12/2016
Field of study

Understanding object motions and transformations is a core problem in computer science. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting or simulation. Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence. For this, we develop a novel Contextual Generative Adversarial Network based on Recurrent Neural Networks (Context-RNN-GANs), where both the generator and the discriminator modules are based on contextual history and the adversarial discriminator guides the generator to produce realistic images for the particular time step in the image sequence. We employ the Context-RNN-GAN model (and its variants) on a novel dataset of Diagrammatic Abstract Reasoning as well as perform initial evaluations on a next-frame prediction task of videos. Empirically, we show that our Context-RNN-GAN model performs competitively with 10th-grade human performance but there is still scope for interesting improvements as compared to college-grade human performance

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Sistema de Aprendizaje Profundo para reconocimiento de actividades con sensores de captura de movimientos

Author: Sáez Bombín Sergio
Publication venue
Publication date: 01/01/2020
Field of study

En este este Trabajo Fin de Máster se desarrolla un sistema para el reconocimiento de actividades humanas (HAR) a partir de lo que se conoce como redes neuronales y sensores inerciales. El sistema es capaz de distinguir entre 11 actividades a partir de los datos que indican la orientación del cuerpo (cuaterniones) provenientes únicamente de 5 sensores. Las pruebas han sido realizadas con un conjunto de datos públicos conocido como REALDISP, ampliamente utilizado en la resolución de problemas HAR. También se aborda el problema de la generación de datos de movimiento a partir de redes neuronales, como complemento a la resolución del problema HAR. Además, a lo largo de este Trabajo Fin de Máster se expone la situación en la que se encuentra hoy en día el reconocimiento de actividades físicas mediante el uso de la Inteligencia Artificial y más en concreto, del Deep Learning, así como los fundamentos matemáticos y teóricos en los que se basa el diseño de redes neuronales, con el objetivo de justificar las decisiones de diseño que se han llevado a cabo. Finalmente, se describen las redes neuronales diseñadas presentando los resultados obtenidos.In this End of Master Project a system for Human Action Recognition (HAR) is developed with Artificial Neural Networks and inertial sensors. The system can distinguish between 11 activities from data that indicate the body’s orientation, that is, quaternions, which come from only 5 sensors. The tests have been done with a public dataset known as REALDISP, widely used in solving HAR problems. The generation of movement data is also addressed using neural networks, as a complement of HAR problem. In addition, throughout this End of Master Project, the nowadays’ situation of the recognition of physical activities through the use of Artificial Intelligence and, more specifically, Deep Learning, is exposed, as well as the mathematical and theoretical foundations on which the design of neural networks is based, in order to justify the design decisions that have been carried out. Finally, the neural networks designed by presenting the resulst obtained are described.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaMáster en Ingeniería de Telecomunicació

Repositorio Documental de la Universidad de Valladolid

Recommended from our members

Robust and Efficient Classification of Videos in the Wild

Author: Mahasseni Behrooz
Publication venue: 'Oregon State University'
Publication date
Field of study

Recognizing human actions in videos is a long-standing problem in computer vision with a wide range of applications including video surveillance, content retrieval, and sports analysis. This thesis focuses on addressing efficiency and robustness of video classification in unconstrained real-world settings. The thesis work can be broadly divided into four major parts. First, we address view-invariant action recognition. This problem is formulated within the multi-task learning framework, where the action model of each viewpoint is specified as a separate task and all tasks are trained jointly. Second, we address a large-scale action recognition in uncontrolled settings. For robustness, we augment the standard training video dataset with additional data from another modality data source -- namely, 3D skeleton sequences of human body motion --. A recurrent neural network called long short-term memory (LSTM) is used to encode sequences from 3D skeleton data. For learning another LSTM for video classification, we use a modified hybrid backpropagation through time algorithm. Third, we address the unsupervised video summarization. We formulate the problem as a subset frame selection and specified a novel deep generative network to compute a video summary with the smallest representation error. Fourth, we introduce the new problem of budget-aware semantic segmentation of videos. In this line of work, we consider two models. The first model uses a conditional random field (CRF) model and replaces the standard inference steps for feature computation with a sequential policy which intelligently selects a subset of regions and their corresponding features. The second model is a deep recurrent policy which is learned to select a subset of frames and uses a shallow convolutional neural network (CNN) to propagate the available segmentation to unlabeled frames. This research has advanced the state of the art in computer vision because the approaches developed enabled meeting stringent runtime requirements arising in many applications, and working in less sanitized settings

ScholarsArchive@OSU