1,758 research outputs found

    Efficient and Effective Solutions for Video Classification

    Get PDF
    The aim of this PhD thesis is to make a step forward towards teaching computers to understand videos in a similar way as humans do. In this work we tackle the video classification and/or action recognition tasks. This thesis was completed in a period of transition, the research community moving from traditional approaches (such as hand-crafted descriptor extraction) to deep learning. Therefore, this thesis captures this transition period, however, unlike image classification, where the state-of-the-art results are dominated by deep learning approaches, for video classification the deep learning approaches are not so dominant. As a matter of fact, most of the current state-of-the-art results in video classification are based on a hybrid approach where the hand-crafted descriptors are combined with deep features to obtain the best performance. This is due to several factors, such as the fact that video is a more complex data as compared to an image, therefore, more difficult to model and also that the video datasets are not large enough to train deep models with effective results. The pipeline for video classification can be broken down into three main steps: feature extraction, encoding and classification. While for the classification part, the existing techniques are more mature, for feature extraction and encoding there is still a significant room for improvement. In addition to these main steps, the framework contains some pre/post processing techniques, such as feature dimensionality reduction, feature decorrelation (for instance using Principal Component Analysis - PCA) and normalization, which can influence considerably the performance of the pipeline. One of the bottlenecks of the video classification pipeline is represented by the feature extraction step, where most of the approaches are extremely computationally demanding, what makes them not suitable for real-time applications. In this thesis, we tackle this issue, propose different speed-ups to improve the computational cost and introduce a new descriptor that can capture motion information from a video without the need of computing optical flow (which is very expensive to compute). Another important component for video classification is represented by the feature encoding step, which builds the final video representation that serves as input to a classifier. During the PhD, we proposed several improvements over the standard approaches for feature encoding. We also propose a new feature encoding approach for deep feature encoding. To summarize, the main contributions of this thesis are as follows3: (1) We propose several speed-ups for descriptor extraction, providing a version for the standard video descriptors that can run in real-time. We also investigate the trade-off between accuracy and computational efficiency; (2) We provide a new descriptor for extracting information from a video, which is very efficient to compute, being able to extract motion information without the need of extracting the optical flow; (3) We investigate different improvements over the standard encoding approaches for boosting the performance of the video classification pipeline.;(4) We propose a new feature encoding approach specifically designed for encoding local deep features, providing a more robust video representation

    Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

    Get PDF
    Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness

    Unified Framework for Identity and Imagined Action Recognition from EEG patterns

    Full text link
    We present a unified deep learning framework for the recognition of user identity and the recognition of imagined actions, based on electroencephalography (EEG) signals, for application as a brain-computer interface. Our solution exploits a novel shifted subsampling preprocessing step as a form of data augmentation, and a matrix representation to encode the inherent local spatial relationships of multi-electrode EEG signals. The resulting image-like data is then fed to a convolutional neural network to process the local spatial dependencies, and eventually analyzed through a bidirectional long-short term memory module to focus on temporal relationships. Our solution is compared against several methods in the state of the art, showing comparable or superior performance on different tasks. Specifically, we achieve accuracy levels above 90% both for action and user classification tasks. In terms of user identification, we reach 0.39% equal error rate in the case of known users and gestures, and 6.16% in the more challenging case of unknown users and gestures. Preliminary experiments are also conducted in order to direct future works towards everyday applications relying on a reduced set of EEG electrodes

    ViBe: A universal background subtraction algorithm for video sequences

    Full text link
    This paper presents a technique for motion detection that incorporates several innovative mechanisms. For example, our proposed technique stores, for each pixel, a set of values taken in the past at the same location or in the neighborhood. It then compares this set to the current pixel value in order to determine whether that pixel belongs to the background, and adapts the model by choosing randomly which values to substitute from the background model. This approach differs from those based on the classical belief that the oldest values should be replaced first. Finally, when the pixel is found to be part of the background, its value is propagated into the background model of a neighboring pixel. We describe our method in full details (including pseudocode and the parameter values used) and compare it to other background subtraction techniques. Efficiency figures show that our method outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate. We also analyze the performance of a downscaled version of our algorithm to the absolute minimum of one comparison and one byte of memory per pixel. It appears that even such a simplified version of our algorithm performs better than mainstream techniques. There is a dedicated web page for ViBe at http://www.telecom.ulg.ac.be/research/vibe
    • …