68 research outputs found

    Parallel scalability and efficiency of HEVC parallelization approaches

    Get PDF
    Unlike H.264/advanced video coding, where parallelism was an afterthought, High Efficiency Video Coding currently contains several proposals aimed at making it more parallel-friendly. A performance comparison of the different proposals, however, has not yet been performed. In this paper, we will fill this gap by presenting efficient implementations of the most promising parallelization proposals, namely tiles and wavefront parallel processing (WPP). In addition, we present a novel approach called overlapped wavefront (OWF), which achieves higher performance and efficiency than tiles and WPP. Experiments conducted on a 12-core system running at 3.33 GHz show that our implementations achieve average speedups, for 4k sequences, of 8.7, 9.3, and 10.7 for WPP, tiles, and OWF, respectively

    Improving the parallelization efficiency of HEVC decoding

    Get PDF
    In this paper we present a new parallelization approach for HEVC decoding called Overlapped Wavefront (OWF). It is based on wavefront processing and improves its parallelization efficiency by allowing overlapped execution of consecutive pictures. Furthermore, in this strategy of the decoding steps are performed on a CTB basis rather than on a picture basis, which improves data locality. Our implementation achieves between 29.6%, 42.4%, and 66.6% higher frame rates compared to previous results and 11.3%, 21.0%, and 38.0% higher frame rates compared to Tiles, for 2160p, 1600p, and 1080p, respectively.EC/FP7/248647/EU/ENabling technologies for a programmable many-CORE/ENCOR

    Rinnakkainen toteutus H.265 videokoodaus standardille

    Get PDF
    The objective of this study was to research the scalability of the parallel features in the new H.265 video compression standard, also know as High Efficiency Video Coding (HEVC). Compared to its predecessor, the H.264 standard, H.265 typically achieves around 50% bitrate reduction for the same subjective video quality. Especially videos with higher resolution (Full HD and beyond) achieve better compression ratios. Also a better utilization of parallel computing resources is provided. H.265 introduces two novel parallelization features: Tiles and Wavefront Parallel Processing (WPP). In Tiles, each video frame is divided into areas that can be decoded without referencing to other areas in the same frame. In WPP, the relations between code blocks in a frame are encoded so that the decoding process can progress through the frame as a front using multiple threads. In this study, the reference implementation for the H.265 decoder was augmented to support both of these parallelization features. The performance of the parallel implementations was measured using three different setups. From the measurement results it could be seen that the introduction of more CPU cores reduced the total decode time of the video frames to a certain point. When using the Tiles feature, it was observed that the encoding geometry, i.e. how each frame was divided into individually decodable areas, had a noticeable effect on the decode times with certain thread counts. When using WPP, it was observed that what was mostly synchronization overhead, sometimes had a negative effect on the decode times when using larger (4-12) amounts of threads.TÀmÀn tutkimuksen aiheena oli tutkia uuden H.265 videonpakkausstandardin (tunnetaan myös nimellÀ HEVC (engl. High Efficiency Video Coding)) rinnakkaisuusominaisuuksien skaalautuvuutta. Verrattuna edeltÀjÀÀnsÀ, H.264 videonpakkaustandardiin, H.265 tyypillisesti saavuttaa samalla kuvanlaadulla noin 50% pienemmÀn pakkauskoon. Erityisesti suuren resoluution videoilla (Full HD ja suuremmat) pakkaustehokkuuden paremmuus korostuu. Huomiota on kiinnitetty myös moniydinprosessoreiden hyödyntÀmiseen videokoodauksessa. H.265 tarjoaa kaksi uutta rinnakkaisuusominaisuutta: niin kutsutut Tiles- ja WPP-menetelmÀt (engl. \emph{Wavefront Parallel Processing}). Tiles-menetelmÀssÀ jokainen videon kuva jaetaan alueisiin, jotka voidaan purkaa viittaamatta saman kuvan muihin alueisiin. WPP-menetelmÀssÀ suhteet kuvan lohkoihin pakataan siten ettÀ purkamisprosessi pystyy etenemÀÀn kuvan lÀpi rintamana hyödyntÀen useampia sÀikeitÀ. TÀssÀ tutkimuksessa H.265 videodekooderin referenssitoteutusta laajennettiin tukemaan molempia nÀistÀ rinnakkaisuusominaisuuksista. SuorituskykyÀ mitattiin kÀyttÀen kolmea eri mittausasetelmaa. Mittaustuloksista ilmeni, ettÀ prosessoriydinten lukumÀÀrÀn kasvattaminen nopeutti videoiden purkamista tiettyyn pisteeseen asti. Tiles-menetelmÀÀ mitatessa havaittiin, ettÀ alueiden geometrialla, eli kuinka kuva jaettiin riippumattomiin alueisiin, on huomattava vaikutus purkamisnopeuteen tietyillÀ sÀiemÀÀrillÀ. WPP-menetelmÀÀ mitattaessa havaittiin ettÀ korkeampiin sÀiemÀÀriin (4-12) siirryttÀessÀ purkamisnopeus alkoi hidastua. TÀmÀ johtui pÀÀasiassa sÀikeiden keskinÀiseen synkronointiin kuluvasta ajasta

    Parallel HEVC Decoding on Multi- and Many-core Architectures : A Power and Performance Analysis

    Get PDF
    The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially

    High Performance Multiview Video Coding

    Get PDF
    Following the standardization of the latest video coding standard High Efficiency Video Coding in 2013, in 2014, multiview extension of HEVC (MV-HEVC) was published and brought significantly better compression performance of around 50% for multiview and 3D videos compared to multiple independent single-view HEVC coding. However, the extremely high computational complexity of MV-HEVC demands significant optimization of the encoder. To tackle this problem, this work investigates the possibilities of using modern parallel computing platforms and tools such as single-instruction-multiple-data (SIMD) instructions, multi-core CPU, massively parallel GPU, and computer cluster to significantly enhance the MVC encoder performance. The aforementioned computing tools have very different computing characteristics and misuse of the tools may result in poor performance improvement and sometimes even reduction. To achieve the best possible encoding performance from modern computing tools, different levels of parallelism inside a typical MVC encoder are identified and analyzed. Novel optimization techniques at various levels of abstraction are proposed, non-aggregation massively parallel motion estimation (ME) and disparity estimation (DE) in prediction unit (PU), fractional and bi-directional ME/DE acceleration through SIMD, quantization parameter (QP)-based early termination for coding tree unit (CTU), optimized resource-scheduled wave-front parallel processing for CTU, and workload balanced, cluster-based multiple-view parallel are proposed. The result shows proposed parallel optimization techniques, with insignificant loss to coding efficiency, significantly improves the execution time performance. This , in turn, proves modern parallel computing platforms, with appropriate platform-specific algorithm design, are valuable tools for improving the performance of computationally intensive applications

    Parallel video decoding in the emerging HEVC standard

    Get PDF
    In this paper we propose and evaluate a parallelization strategy for the emerging HEVC video coding standard. The proposed strategy is based on entropy slices which allows exploiting parallelism in the entropy decoding stage while maintaining high coding efficiency. Our approach requires to encode videos with one entropy slice per LCU row in order to decode multiple LCU rows in a wavefront parallel manner. Evaluations performed on a PC with 12 Intel Xeon cores running at 3.3 GHz show that it is possible to achieve real-time performance for 1920×1080p50 (53.1 fps) and 2560×1600 (29.5fps) video resolutions with speedups of 5.2× and 6.3× compared to sequential execution, respectively
    • 

    corecore