18,032 research outputs found
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
Region-adaptive probability model selection for the arithmetic coding of video texture
In video coding systems using adaptive arithmetic coding to compress texture information, the employed symbol probability models need to be retrained every time the coding process moves into an area with different texture. To avoid this inefficiency, we propose to replace the probability models used in the original coder with multiple switchable sets of probability models. We determine the model set to use in each spatial region in an optimal manner, taking into account the additional signaling overhead. Experimental results show that this approach, when applied to H. 264/AVC's context-based adaptive binary arithmetic coder (CABAC), yields significant bit-rate savings, which are comparable to or higher than those obtained using alternative improvements to CABAC previously proposed in the literature
- …