14,726 research outputs found

    Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor

    Get PDF
    We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard datasets shows that the proposed approach is competitive to the best two-stream video classification approaches found in the literature. At the same time: (i) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; (ii) selective decoding is up to 12 times faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video Technology. Extension of ICIP 2017 conference pape

    Improved reception of in-body signals by means of a wearable multi-antenna system

    Get PDF
    High data-rate wireless communication for in-body human implants is mainly performed in the 402-405 MHz Medical Implant Communication System band and the 2.45 GHz Industrial, Scientific and Medical band. The latter band offers larger bandwidth, enabling high-resolution live video transmission. Although in-body signal attenuation is larger, at least 29 dB more power may be transmitted in this band and the antenna efficiency for compact antennas at 2.45 GHz is also up to 10 times higher. Moreover, at the receive side, one can exploit the large surface provided by a garment by deploying multiple compact highly efficient wearable antennas, capturing the signals transmitted by the implant directly at the body surface, yielding stronger signals and reducing interference. In this paper, we implement a reliable 3.5 Mbps wearable textile multi-antenna system suitable for integration into a jacket worn by a patient, and evaluate its potential to improve the In-to-Out Body wireless link reliability by means of spatial receive diversity in a standardized measurement setup. We derive the optimal distribution and the minimum number of on-body antennas required to ensure signal levels that are large enough for real-time wireless endoscopy-capsule applications, at varying positions and orientations of the implant in the human body

    Efficient hash-driven Wyner-Ziv video coding for visual sensors

    Get PDF

    Asynchronous spiking neurons, the natural key to exploit temporal sparsity

    Get PDF
    Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic signal processing. We believe one important feature of the brain (asynchronous state-full processing) is the key to its excellence in this domain. In this work, we show how asynchronous processing with state-full neurons allows exploitation of the existing sparsity in natural signals. This paper explains three different types of sparsity and proposes an inference algorithm which exploits all types of sparsities in the execution of already trained networks. Our experiments in three different applications (Handwritten digit recognition, Autonomous Steering and Hand-Gesture recognition) show that this model of inference reduces the number of required operations for sparse input data by a factor of one to two orders of magnitudes. Additionally, due to fully asynchronous processing this type of inference can be run on fully distributed and scalable neuromorphic hardware platforms
    corecore