2,556 research outputs found
A comprehensive video codec comparison
In this paper, we compare the video codecs AV1 (version 1.0.0-2242 from August 2019), HEVC (HM and x265), AVC (x264), the exploration software JEM which is based on HEVC, and the VVC (successor of HEVC) test model VTM (version 4.0 from February 2019) under two fair and balanced configurations: All Intra for the assessment of intra coding and Maximum Coding Efficiency with all codecs being tuned for their best coding efficiency settings. VTM achieves the highest coding efficiency in both configurations, followed by JEM and AV1. The worst coding efficiency is achieved by x264 and x265, even in the placebo preset for highest coding efficiency. AV1 gained a lot in terms of coding efficiency compared to previous versions and now outperforms HM by 24% BD-Rate gains. VTM gains 5% over AV1 in terms of BD-Rates. By reporting separate numbers for JVET and AOM test sequences, it is ensured that no bias in the test sequences exists. When comparing only intra coding tools, it is observed that the complexity increases exponentially for linearly increasing coding efficiency
Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor
We investigate video classification via a two-stream convolutional neural
network (CNN) design that directly ingests information extracted from
compressed video bitstreams. Our approach begins with the observation that all
modern video codecs divide the input frames into macroblocks (MBs). We
demonstrate that selective access to MB motion vector (MV) information within
compressed video bitstreams can also provide for selective, motion-adaptive, MB
pixel decoding (a.k.a., MB texture decoding). This in turn allows for the
derivation of spatio-temporal video activity regions at extremely high speed in
comparison to conventional full-frame decoding followed by optical flow
estimation. In order to evaluate the accuracy of a video classification
framework based on such activity data, we independently train two CNN
architectures on MB texture and MV correspondences and then fuse their scores
to derive the final classification of each test video. Evaluation on two
standard datasets shows that the proposed approach is competitive to the best
two-stream video classification approaches found in the literature. At the same
time: (i) a CPU-based realization of our MV extraction is over 977 times faster
than GPU-based optical flow methods; (ii) selective decoding is up to 12 times
faster than full-frame decoding; (iii) our proposed spatial and temporal CNNs
perform inference at 5 to 49 times lower cloud computing cost than the fastest
methods from the literature.Comment: Accepted in IEEE Transactions on Circuits and Systems for Video
Technology. Extension of ICIP 2017 conference pape
- âŠ