8 research outputs found

    Dilated Temporal Relational Adversarial Network for Generic Video Summarization

    Get PDF
    The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it selects the set of key frames, which contain the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced to enhance temporal representation capturing. The generator uses this unit to effectively exploit global multi-scale temporal context to select key frames and to complement the commonly used Bi-LSTM. To ensure that summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on three public datasets show the effectiveness of the proposed approach

    Contextual RNN-GANs for Abstract Reasoning Diagram Generation

    No full text
    Understanding object motions and transformations is a core problem in computer science. Modeling sequences of evolving images may provide better representations and models of motion and may ultimately be used for forecasting or simulation. Diagrammatic Abstract Reasoning is an avenue in which diagrams evolve in complex patterns and one needs to infer the underlying pattern sequence and generate the next image in the sequence. For this, we develop a novel Contextual Generative Adversarial Network based on Recurrent Neural Networks (Context-RNN-GANs), where both the generator and the discriminator modules are based on contextual history and the adversarial discriminator guides the generator to produce realistic images for the particular time step in the image sequence. We employ the Context-RNN-GAN model (and its variants) on a novel dataset of Diagrammatic Abstract Reasoning as well as perform initial evaluations on a next-frame prediction task of videos. Empirically, we show that our Context-RNN-GAN model performs competitively with 10th-grade human performance but there is still scope for interesting improvements as compared to college-grade human performance

    Sistema de Aprendizaje Profundo para reconocimiento de actividades con sensores de captura de movimientos

    Get PDF
    En este este Trabajo Fin de Máster se desarrolla un sistema para el reconocimiento de actividades humanas (HAR) a partir de lo que se conoce como redes neuronales y sensores inerciales. El sistema es capaz de distinguir entre 11 actividades a partir de los datos que indican la orientación del cuerpo (cuaterniones) provenientes únicamente de 5 sensores. Las pruebas han sido realizadas con un conjunto de datos públicos conocido como REALDISP, ampliamente utilizado en la resolución de problemas HAR. También se aborda el problema de la generación de datos de movimiento a partir de redes neuronales, como complemento a la resolución del problema HAR. Además, a lo largo de este Trabajo Fin de Máster se expone la situación en la que se encuentra hoy en día el reconocimiento de actividades físicas mediante el uso de la Inteligencia Artificial y más en concreto, del Deep Learning, así como los fundamentos matemáticos y teóricos en los que se basa el diseño de redes neuronales, con el objetivo de justificar las decisiones de diseño que se han llevado a cabo. Finalmente, se describen las redes neuronales diseñadas presentando los resultados obtenidos.In this End of Master Project a system for Human Action Recognition (HAR) is developed with Artificial Neural Networks and inertial sensors. The system can distinguish between 11 activities from data that indicate the body’s orientation, that is, quaternions, which come from only 5 sensors. The tests have been done with a public dataset known as REALDISP, widely used in solving HAR problems. The generation of movement data is also addressed using neural networks, as a complement of HAR problem. In addition, throughout this End of Master Project, the nowadays’ situation of the recognition of physical activities through the use of Artificial Intelligence and, more specifically, Deep Learning, is exposed, as well as the mathematical and theoretical foundations on which the design of neural networks is based, in order to justify the design decisions that have been carried out. Finally, the neural networks designed by presenting the resulst obtained are described.Departamento de Teoría de la Señal y Comunicaciones e Ingeniería TelemáticaMáster en Ingeniería de Telecomunicació
    corecore