Search CORE

14 research outputs found

Complexity Analysis Of Next-Generation VVC Encoding and Decoding

Author: Adelimanesh Mohammad Ali
Gabbouj Moncef
Hashemi Mahmoud Reza
Pakdaman Farhad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/05/2020
Field of study

While the next generation video compression standard, Versatile Video Coding (VVC), provides a superior compression efficiency, its computational complexity dramatically increases. This paper thoroughly analyzes this complexity for both encoder and decoder of VVC Test Model 6, by quantifying the complexity break-down for each coding tool and measuring the complexity and memory requirements for VVC encoding/decoding. These extensive analyses are performed for six video sequences of 720p, 1080p, and 2160p, under Low-Delay (LD), Random-Access (RA), and All-Intra (AI) conditions (a total of 320 encoding/decoding). Results indicate that the VVC encoder and decoder are 5x and 1.5x more complex compared to HEVC in LD, and 31x and 1.8x in AI, respectively. Detailed analysis of coding tools reveals that in LD on average, motion estimation tools with 53%, transformation and quantization with 22%, and entropy coding with 7% dominate the encoding complexity. In decoding, loop filters with 30%, motion compensation with 20%, and entropy decoding with 16%, are the most complex modules. Moreover, the required memory bandwidth for VVC encoding/decoding are measured through memory profiling, which are 30x and 3x of HEVC. The reported results and insights are a guide for future research and implementations of energy-efficient VVC encoder/decoder.Comment: IEEE ICIP 202

arXiv.org e-Print Archive

Crossref

An ‘on-demand’ Data Communication Architecture for Supplying Multiple Applications from a Single Data Source: An Industrial Application Case Study

Author: Azolibe Ifeanyi
McGookin Euan
Publication venue
Publication date: 01/01/2015
Field of study

A key aspect of automation is the manipulation of feedback sensor data for the automated control of particular process actuators. Often in practice this data can be reused for other applications, such as the live update of a graphical user interface, a fault detection application or a business intelligence process performance engine in real-time. In order for this data to be reused effectively, appropriate data communication architecture must be utilised to provide such functionality. This architecture must accommodate the dependencies of the system and sustain the required data transmission speed to ensure stability and data integrity. Such an architecture is presented in this paper, which shows how the data needs of multiple applications are satisfied from a single source of data. It shows how the flexibility of this architecture enables the integration of additional data sources as the data dependencies grow. This research is based on the development of a fully integrated automation system for the test of fuel controls used on civil transport aircraft engines

Crossref

Enlighten

A high performance hardware architecture for one bit transform based motion estimation

Author: Akın Abdulkadir
Akin Abdulkadir
Dogan Yigit
Doğan Yiğit
Hamzaoglu Ilker
Hamzaoğlu İlker
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2009
Field of study

Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (IBT) based ME algorithms have low computational complexity. Therefore, in this paper, we propose a high performance systolic hardware architecture for IBT based ME. The proposed hardware performs full search ME for 4 Macroblocks in parallel and it is the fastest IBT based ME hardware reported in the literature. In addition, it uses less on-chip memory than the previous IBT based ME hardware by using a novel data reuse scheme and memory organization. The proposed hardware is implemented in Verilog HDL. It consumes %34 of the slices in a Xilinx XC2VP30-7 FPGA. It works at 115 MHz in the same FPGA and is capable of processing 50 1920x1080 full High Definition frames per second. Therefore, it can be used in consumer electronics products that require real-time video processing or compression

Crossref

Sabanci University Research Database

High-Efficient Video Transmission for HDTV Broadcasting

Author: Ali Ismail Yasser
Publication venue: 'IntechOpen'
Publication date: 05/11/2018
Field of study

Before broadcasting a video signal, redundant data should be removed from the transmitted video signal. This redundancy operation can be performed using many video coding standards such as H.264/Advanced Video Coding (AVC) and H.265/High-Efficient Video Coding (HEVC) standards. Although both standards produce a great video resolution, too much data are considered to be still redundant. The most exhaustive process in video encoding process is the Motion Estimation (ME) process. The more the resolution of the transmitted video signal, the more the video data to be fetched from the main memory. This will increase the required memory access time for performing the Motion Estimation process. In This chapter, a smart ME coprocessor architecture, which greatly reduces the memory access time, is presented. Data reuse algorithm is used to minimize the memory access time. The discussed coprocessor effectively reuses the data of the search area to minimize the overall memory access time (I/O memory bandwidth) while fully using all resources and hardware. This would speed up the video broadcasting process. For a search range of 32 × 32 and block size of 16 × 16, the architecture can perform Motion Estimation for 30 fps of HDTV video and easily outperforms many fast full-search architectures

IntechOpen

Crossref

Low Power Architectures for MPEG-4 AVC/H.264 Video Compression

Author: Bahari Asral
Publication venue: The University of Edinburgh
Publication date: 01/01/2008
Field of study

Edinburgh Research Archive

Hardware acceleration of the trace transform for vision applications

Author: Fahmy Suhaib A.
Publication venue: Department of Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2008
Field of study

Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

Spiral - Imperial College Digital Repository

Exploring Processor and Memory Architectures for Multimedia

Author: Iranpour Ali
Publication venue
Publication date: 01/01/2012
Field of study

Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

CiteSeerX

Lund University Publications

Hardware based High Accuracy Integer Motion Estimation and Merge Mode Estimation

Author: 김태성
Publication venue: 서울대학교 대학원
Publication date: 01/08/2017
Field of study

학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이혁재.HEVC는 H.264/AVC 대비 2배의 뛰어난 압축 효율을 가지지만, 많은 압축 기술이 사용됨으로써, 인코더 측의 계산 복잡도를 크게 증가시켰다. HEVC의 높은 계산 복잡도를 줄이기 위한 많은 연구들이 이루어졌지만, 대부분의 연구들은 H.264/AVC를 위한 계산 복잡도 감소 방법을 확장 적용하는 데에 그쳐, 만족스럽지 않은 계산 복잡도 감소 성능을 보이거나, 지나치게 큰 압축 효율 손실을 동반하여 HEVC의 최대 압축 성능을 끌어내지 못했다. 특히 앞서 연구된 하드웨어 기반의 인코더는 실시간 인코더의 실현이 우선되어 압축 효율의 희생이 매우 크다. 그러므로, 본 연구에서는 하드웨어 기반 Inter prediction의 고속화를 이룸과 동시에 HEVC가 가진 압축 성능의 손실을 최소화하고, 실시간 코딩이 가능한 하드웨어 구조를 제안하였다. 본 연구에서 제안한 bottom-up MV 예측 방법은 기존의 공간적, 시간적으로 인접한 PU로부터 MV를 예측하는 방법이 아닌, HEVC의 계층적으로 인접한 PU로부터 MV를 예측하는 방법을 제안하여 MV 예측의 정확도를 큰 폭으로 향상시켰다. 결과적으로 압축 효율의 변화 없이 IME의 계산 복잡도를 67% 감소시킬 수 있었다. 또한, 본 연구에서는 제안된 bottom-up IME 알고리즘을 적용하여 실시간 동작이 가능한 하드웨어 기반의 IME를 제안하였다. 기존의 하드웨어 기반 IME는 고속 IME 알고리즘이 갖는 단계별 의존성으로 인한 idle cycle의 발생과 참조 데이터 접근 문제로 인해, 고속 IME 알고리즘을 사용하지 않거나 또는 하드웨어에 맞게 고속 IME 알고리즘을 수정하였기 때문에 압축 효율의 저하가 수 퍼센트 이상으로 매우 컸다. 그러나 본 연구에서는 고속 IME 알고리즘인 TZS 알고리즘을 채택하여 TZS 알고리즘의 계산 복잡도 감소 성능을 훼손하지 않는 하드웨어 기반의 IME를 제안하였다. 고속 IME 알고리즘을 하드웨어에서 사용하기 위해서 다음 세 가지 사항을 제안하고 하드웨어에 적용하였다. 첫 째로, 고속 IME 알고리즘의 고질적 문제인 idle cycle 발생 문제를 서로 다른 참조 픽쳐와 서로 다른 depth에 대한 IME를 컨텍스트 스위칭을 통해 해결하였다. 둘 째로, 참조 데이터로의 빠르고 자유로운 접근을 위해 참조 데이터의 locality 이용한 multi bank SRAM 구조를 제안하였다. 셋 째로, 지나치게 자유로운 참조 데이터 접근이 발생시키는 대량의 스위칭 mux의 사용을 피하기 위해 탐색 중심을 기준으로 하는 제한된 자유도의 참조 데이터 접근을 제안하였다. 결과 제안된 IME 하드웨어는 HEVC의 모든 블록 크기를 지원하면서, 참조 픽처 4장를 사용하여, 4k UHD 영상을 60fps의 속도로 처리할 수 있으며 이 때 압축 효율의 손실은 0.11%로 거의 나타나지 않는다. 이 때 사용되는 하드웨어 리소스는 1.27M gates이다. HEVC에 새로이 채택된 merge mode estimation은 압축 효율 개선 효과가 뛰어난 새로운 기술이지만, 매 PU 마다 계산 복잡도의 변동 폭이 커서 하드웨어로 구현되는 경우 하드웨어 리소스의 낭비가 많다. 그러므로 본 연구에서는 효율적인 하드웨어 기반 MME 방법과 하드웨어 구조를 함께 제안하였다. 기존 MME 방식은 이웃 PU에 의해 보간 필터 적용 여부가 결정되기 때문에, 보간 필터의 사용률은 50% 이하를 나타낸다. 그럼에도 불구하고 하드웨어는 보간 필터를 사용하는 경우에 맞추어 설계되어왔기 때문에 하드웨어 리소스의 사용 효율이 낮았다. 본 연구에서는 가장 하드웨어 리소스를 많이 사용하는 세로 방향 보간 필터를 절반 크기로 줄인 두 개의 데이터 패스를 갖는 MME 하드웨어 구조를 제안하였고, 높은 하드웨어 사용률을 유지하면서 압축 효율 손실을 최소화 하는 merge 후보 할당 알고리즘을 제안하였다. 결과, 기존 하드웨어 기반 MME 보다 24% 적은 하드웨어 리소스를 사용하면서도 7.4% 더 빠른 수행 시간을 갖는 새로운 하드웨어 기반의 MME를 달성하였다. 제안된 하드웨어 기반의 MME는 460.8K gates의 하드웨어 리소스를 사용하고 4k UHD 영상을 30 fps의 속도로 처리할 수 있다.제 1 장 서 론 1 1.1 연구 배경 1 1.2 연구 내용 3 1.3 공통 실험 환경 5 1.4 논문 구성 6 제 2 장 관련 연구 7 2.1 HEVC 표준 7 2.1.1 쿼드-트리 기반의 계층적 블록 구조 7 2.1.2 HEVC 의 Inter Prediction 9 2.2 화면 간 예측의 속도 향상을 위한 이전 연구 17 2.2.1 고속 Integer Motion Estimation 알고리즘 17 2.2.2 고속 Merge Mode Estimation 알고리즘 20 2.3 화면 간 예측 하드웨어 구조에 대한 이전 연구 21 2.3.1 하드웨어 기반 Integer Motion Estimation 연구 21 2.3.2 하드웨어 기반 Merge Mode Estimation 연구 25 제 3 장 Bottom-up Integer Motion Estimation 26 3.1 서로 다른 계층 간의 Motion Vector 관계 관찰 26 3.1.1 서로 다른 계층 간의 Motion Vector 관계 분석 26 3.1.2 Top-down 및 Bottom-up 방향의 Motion Vector 관계 분석 30 3.2 Bottom-up Motion Vector Prediction 33 3.3 Bottom-up Integer Motion Estimation 37 3.3.1 Bottom-up Integer Motion Estimation - Single MVP 37 3.3.2 Bottom-up Integer Motion Estimation - Multiple MVP 38 3.4 실험 결과 40 제 4 장 하드웨어 기반 Integer Motion Estimation 46 4.1 Bottom-up Integer Motion Estimation의 하드웨어 적용 46 4.2 하드웨어를 위한 수정된 Test Zone Search 47 4.2.1 SAD-tree를 활용한 CU 내 PU의 병렬 처리 47 4.2.2 Grid 기반의 Sampled Raster Search 53 4.2.3 서로 다른 PU 간의 중복 연산 제거 55 4.3 Idle cycle이 감소된 5-stage 파이프라인 스케줄 56 4.3.1 파이프라인 스테이지 별 동작 56 4.3.2 Test Zone Search의 의존성으로 인한 Idle cycle 도입 58 4.3.3 컨텍스트 스위칭을 통한 Idle cycle 감소 60 4.4 고속 동작을 위한 참조 데이터 공급 방법 63 4.4.1 참조 데이터 접근 패턴 및 접근 지연 발생 시 문제점 63 4.4.2 Search Points의 Locality를 활용한 참조 데이터 접근 64 4.4.3 단일 cycle 참조 데이터 접근을 위한 Multi Bank 메모리 구조 66 4.4.4 참조 데이터 접근의 자유도 제어를 통한 스위칭 복잡도 저감 방법 68 4.5 하드웨어 구조 72 4.5.1 전체 하드웨어 구조 72 4.5.2 하드웨어 세부 스케줄 78 4.6 하드웨어 구현 결과 및 실험 결과 82 4.6.1 하드웨어 구현 결과 82 4.6.2 수행 시간 및 압축 효율 84 4.6.3 제안 방법 적용 단계 별 성능 변화 88 4.6.4 이전 연구와의 비교 91 제 5 장 하드웨어 기반 Merge Mode Estimation 96 5.1 기존 Merge Mode Estimation의 하드웨어 관점에서의 고찰 96 5.1.1 기존 Merge Mode Estimation 96 5.1.2 기존 Merge Mode Estimation 하드웨어 구조 및 분석 98 5.1.3 기존 Merge Mode Estimation의 하드웨어 사용률 저하 문제 100 5.2 연산량 변동폭을 감소시킨 새로운 Merge Mode Estimation 103 5.3 새로운 Merge Mode Estimation의 하드웨어 구현 106 5.3.1 후보 타입 별 독립적 path를 갖는 하드웨어 구조 106 5.3.2 하드웨어 사용률을 높이기 위한 적응적 후보 할당 방법 109 5.3.3 적응적 후보 할당 방법을 적용한 하드웨어 스케줄 111 5.4 실험 결과 및 하드웨어 구현 결과 114 5.4.1 수행 시간 및 압축 효율 변화 114 5.4.2 하드웨어 구현 결과 116 제 6 장 Overall Inter Prediction 117 6.1 CTU 단위의 3-stage 파이프라인 Inter Prediction 117 6.2 Two-way Encoding Order 119 6.2.1 Top-down 인코딩 순서와 Bottom-up 인코딩 순서 119 6.2.2 기존 고속 알고리즘과 호환되는 Two-way Encoding Order 120 6.2.3 기존 고속 알고리즘과 결합 및 비교 실험 결과 123 제 7 장 Next Generation Video Coding으로의 확장 127 7.1 Bottom-up Motion Vector Prediction의 확장 127 7.2 Bottom-up Integer Motion Estimation의 확장 130 제 8 장 결 론 132Docto

SNU Open Repository and Archive