50,938 research outputs found

    Convolutional Drift Networks for Video Classification

    Full text link
    Analyzing spatio-temporal data like video is a challenging task that requires processing visual and temporal information effectively. Convolutional Neural Networks have shown promise as baseline fixed feature extractors through transfer learning, a technique that helps minimize the training cost on visual information. Temporal information is often handled using hand-crafted features or Recurrent Neural Networks, but this can be overly specific or prohibitively complex. Building a fully trainable system that can efficiently analyze spatio-temporal data without hand-crafted features or complex training is an open challenge. We present a new neural network architecture to address this challenge, the Convolutional Drift Network (CDN). Our CDN architecture combines the visual feature extraction power of deep Convolutional Neural Networks with the intrinsically efficient temporal processing provided by Reservoir Computing. In this introductory paper on the CDN, we provide a very simple baseline implementation tested on two egocentric (first-person) video activity datasets.We achieve video-level activity classification results on-par with state-of-the art methods. Notably, performance on this complex spatio-temporal task was produced by only training a single feed-forward layer in the CDN.Comment: Published in IEEE Rebooting Computin

    Learning to Detect Violent Videos using Convolutional Long Short-Term Memory

    Full text link
    Developing a technique for the automatic analysis of surveillance videos in order to identify the presence of violence is of broad interest. In this work, we propose a deep neural network for the purpose of recognizing violent videos. A convolutional neural network is used to extract frame level features from a video. The frame level features are then aggregated using a variant of the long short term memory that uses convolutional gates. The convolutional neural network along with the convolutional long short term memory is capable of capturing localized spatio-temporal features which enables the analysis of local motion taking place in the video. We also propose to use adjacent frame differences as the input to the model thereby forcing it to encode the changes occurring in the video. The performance of the proposed feature extraction pipeline is evaluated on three standard benchmark datasets in terms of recognition accuracy. Comparison of the results obtained with the state of the art techniques revealed the promising capability of the proposed method in recognizing violent videos.Comment: Accepted in International Conference on Advanced Video and Signal based Surveillance(AVSS 2017

    Deep learning-based switchable network for in-loop filtering in high efficiency video coding

    Get PDF
    The video codecs are focusing on a smart transition in this era. A future area of research that has not yet been fully investigated is the effect of deep learning on video compression. The paper’s goal is to reduce the ringing and artifacts that loop filtering causes when high-efficiency video compression is used. Even though there is a lot of research being done to lessen this effect, there are still many improvements that can be made. In This paper we have focused on an intelligent solution for improvising in-loop filtering in high efficiency video coding (HEVC) using a deep convolutional neural network (CNN). The paper proposes the design and implementation of deep CNN-based loop filtering using a series of 15 CNN networks followed by a combine and squeeze network that improves feature extraction. The resultant output is free from double enhancement and the peak signal-to-noise ratio is improved by 0.5 dB compared to existing techniques. The experiments then demonstrate that improving the coding efficiency by pipelining this network to the current network and using it for higher quantization parameters (QP) is more effective than using it separately. Coding efficiency is improved by an average of 8.3% with the switching based deep CNN in-loop filtering

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page
    • …
    corecore