22 research outputs found

    High throughput image compression and decompression on GPUs

    Get PDF
    Diese Arbeit befasst sich mit der Entwicklung eines GPU-freundlichen, intra-only, Wavelet-basierten Videokompressionsverfahrens mit hohem Durchsatz, das für visuell verlustfreie Anwendungen optimiert ist. Ausgehend von der Beobachtung, dass der JPEG 2000 Entropie-Kodierer ein Flaschenhals ist, werden verschiedene algorithmische Änderungen vorgeschlagen und bewertet. Zunächst wird der JPEG 2000 Selective Arithmetic Coding Mode auf der GPU realisiert, wobei sich die Erhöhung des Durchsatzes hierdurch als begrenzt zeigt. Stattdessen werden zwei nicht standard-kompatible Änderungen vorgeschlagen, die (1) jede Bitebebene in nur einem einzelnen Pass verarbeiten (Single-Pass-Modus) und (2) einen echten Rohcodierungsmodus einführen, der sample-weise parallelisierbar ist und keine aufwendige Kontextmodellierung erfordert. Als nächstes wird ein alternativer Entropiekodierer aus der Literatur, der Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), evaluiert. Er gibt Signaladaptivität zu Gunsten von höherer Parallelität auf und daher wird hier untersucht und gezeigt, dass ein aus verschiedensten Testsequenzen gemitteltes statisches Wahrscheinlichkeitsmodell eine kompetitive Kompressionseffizienz erreicht. Es wird zudem eine Kombination von BPC-PaCo mit dem Single-Pass-Modus vorgeschlagen, der den Speedup gegenüber dem JPEG 2000 Entropiekodierer von 2,15x (BPC-PaCo mit zwei Pässen) auf 2,6x (BPC-PaCo mit Single-Pass-Modus) erhöht auf Kosten eines um 0,3 dB auf 1,0 dB erhöhten Spitzen-Signal-Rausch-Verhältnis (PSNR). Weiter wird ein paralleler Algorithmus zur Post-Compression Ratenkontrolle vorgestellt sowie eine parallele Codestream-Erstellung auf der GPU. Es wird weiterhin ein theoretisches Laufzeitmodell formuliert, das es durch Benchmarking von einer GPU ermöglicht die Laufzeit einer Routine auf einer anderen GPU vorherzusagen. Schließlich wird der erste JPEG XS GPU Decoder vorgestellt und evaluiert. JPEG XS wurde als Low Complexity Codec konzipiert und forderte erstmals explizit GPU-Freundlichkeit bereits im Call for Proposals. Ab Bitraten über 1 bpp ist der Decoder etwa 2x schneller im Vergleich zu JPEG 2000 und 1,5x schneller als der schnellste hier vorgestellte Entropiekodierer (BPC-PaCo mit Single-Pass-Modus). Mit einer GeForce GTX 1080 wird ein Decoder Durchsatz von rund 200 fps für eine UHD-4:4:4-Sequenz erreicht.This work investigates possibilities to create a high throughput, GPU-friendly, intra-only, Wavelet-based video compression algorithm optimized for visually lossless applications. Addressing the key observation that JPEG 2000’s entropy coder is a bottleneck and might be overly complex for a high bit rate scenario, various algorithmic alterations are proposed. First, JPEG 2000’s Selective Arithmetic Coding mode is realized on the GPU, but the gains in terms of an increased throughput are shown to be limited. Instead, two independent alterations not compliant to the standard are proposed, that (1) give up the concept of intra-bit plane truncation points and (2) introduce a true raw-coding mode that is fully parallelizable and does not require any context modeling. Next, an alternative block coder from the literature, the Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), is evaluated. Since it trades signal adaptiveness for increased parallelism, it is shown here how a stationary probability model averaged from a set of test sequences yields competitive compression efficiency. A combination of BPC-PaCo with the single-pass mode is proposed and shown to increase the speedup with respect to the original JPEG 2000 entropy coder from 2.15x (BPC-PaCo with two passes) to 2.6x (proposed BPC-PaCo with single-pass mode) at the marginal cost of increasing the PSNR penalty by 0.3 dB to at most 1 dB. Furthermore, a parallel algorithm is presented that determines the optimal code block bit stream truncation points (given an available bit rate budget) and builds the entire code stream on the GPU, reducing the amount of data that has to be transferred back into host memory to a minimum. A theoretical runtime model is formulated that allows, based on benchmarking results on one GPU, to predict the runtime of a kernel on another GPU. Lastly, the first ever JPEG XS GPU-decoder realization is presented. JPEG XS was designed to be a low complexity codec and for the first time explicitly demanded GPU-friendliness already in the call for proposals. Starting at bit rates above 1 bpp, the decoder is around 2x faster compared to the original JPEG 2000 and 1.5x faster compared to JPEG 2000 with the fastest evaluated entropy coder (BPC-PaCo with single-pass mode). With a GeForce GTX 1080, a decoding throughput of around 200 fps is achieved for a UHD 4:4:4 sequence

    Compression of DNA sequencing data

    Get PDF
    With the release of the latest generations of sequencing machines, the cost of sequencing a whole human genome has dropped to less than US$1,000. The potential applications in several fields lead to the forecast that the amount of DNA sequencing data will soon surpass the volume of other types of data, such as video data. In this dissertation, we present novel data compression technologies with the aim of enhancing storage, transmission, and processing of DNA sequencing data. The first contribution in this dissertation is a method for the compression of aligned reads, i.e., read-out sequence fragments that have been aligned to a reference sequence. The method improves compression by implicitly assembling local parts of the underlying sequences. Compared to the state of the art, our method achieves the best trade-off between memory usage and compressed size. Our second contribution is a method for the quantization and compression of quality scores, i.e., values that quantify the error probability of each read-out base. Specifically, we propose two Bayesian models that are used to precisely control the quantization. With our method it is possible to compress the data down to 0.15 bit per quality score. Notably, we can recommend a particular parametrization for one of our models which—by removing noise from the data as a side effect—does not lead to any degradation in the distortion metric. This parametrization achieves an average rate of 0.45 bit per quality score. The third contribution is the first implementation of an entropy codec compliant to MPEG-G. We show that, compared to the state of the art, our method achieves the best compression ranks on average, and that adding our method to CRAM would be beneficial both in terms of achievable compression and speed. Finally, we provide an overview of the standardization landscape, and in particular of MPEG-G, in which our contributions have been integrated.Mit der Einführung der neuesten Generationen von Sequenziermaschinen sind die Kosten für die Sequenzierung eines menschlichen Genoms auf weniger als 1.000 US-Dollar gesunken. Es wird prognostiziert, dass die Menge der Sequenzierungsdaten bald diejenige anderer Datentypen, wie z.B. Videodaten, übersteigen wird. Daher werden in dieser Arbeit neue Datenkompressionsverfahren zur Verbesserung der Speicherung, Übertragung und Verarbeitung von Sequenzierungsdaten vorgestellt. Der erste Beitrag in dieser Arbeit ist eine Methode zur Komprimierung von alignierten Reads, d.h. ausgelesenen Sequenzfragmenten, die an eine Referenzsequenz angeglichen wurden. Die Methode verbessert die Komprimierung, indem sie die Reads nutzt, um implizit lokale Teile der zugrunde liegenden Sequenzen zu schätzen. Im Vergleich zum Stand der Technik erzielt die Methode das beste Ergebnis in einer gemeinsamen Betrachtung von Speichernutzung und erzielter Komprimierung. Der zweite Beitrag ist eine Methode zur Quantisierung und Komprimierung von Qualitätswerten, welche die Fehlerwahrscheinlichkeit jeder ausgelesenen Base quantifizieren. Konkret werden zwei Bayes’sche Modelle vorgeschlagen, mit denen die Quantisierung präzise gesteuert werden kann. Mit der vorgeschlagenen Methode können die Daten auf bis zu 0,15 Bit pro Qualitätswert komprimiert werden. Besonders hervorzuheben ist, dass eine bestimmte Parametrisierung für eines der Modelle empfohlen werden kann, die – durch die Entfernung von Rauschen aus den Daten als Nebeneffekt – zu keiner Verschlechterung der Verzerrungsmetrik führt. Mit dieser Parametrisierung wird eine durchschnittliche Rate von 0,45 Bit pro Qualitätswert erreicht. Der dritte Beitrag ist die erste Implementierung eines MPEG-G-konformen Entropie-Codecs. Es wird gezeigt, dass der vorgeschlagene Codec die durchschnittlich besten Kompressionswerte im Vergleich zum Stand der Technik erzielt und dass die Aufnahme des Codecs in CRAM sowohl hinsichtlich der erreichbaren Kompression als auch der Geschwindigkeit von Vorteil wäre. Abschließend wird ein Überblick über Standards zur Komprimierung von Sequenzierungsdaten gegeben. Insbesondere wird hier auf MPEG-G eingangen, da alle Beiträge dieser Arbeit in MPEG-G integriert wurden

    Signal processing for improved MPEG-based communication systems

    Get PDF

    Architectures for ubiquitous 3D on heterogeneous computing platforms

    Get PDF
    Today, a wide scope for 3D graphics applications exists, including domains such as scientific visualization, 3D-enabled web pages, and entertainment. At the same time, the devices and platforms that run and display the applications are more heterogeneous than ever. Display environments range from mobile devices to desktop systems and ultimately to distributed displays that facilitate collaborative interaction. While the capability of the client devices may vary considerably, the visualization experiences running on them should be consistent. The field of application should dictate how and on what devices users access the application, not the technical requirements to realize the 3D output. The goal of this thesis is to examine the diverse challenges involved in providing consistent and scalable visualization experiences to heterogeneous computing platforms and display setups. While we could not address the myriad of possible use cases, we developed a comprehensive set of rendering architectures in the major domains of scientific and medical visualization, web-based 3D applications, and movie virtual production. To provide the required service quality, performance, and scalability for different client devices and displays, our architectures focus on the efficient utilization and combination of the available client, server, and network resources. We present innovative solutions that incorporate methods for hybrid and distributed rendering as well as means to manage data sets and stream rendering results. We establish the browser as a promising platform for accessible and portable visualization services. We collaborated with experts from the medical field and the movie industry to evaluate the usability of our technology in real-world scenarios. The presented architectures achieve a wide coverage of display and rendering setups and at the same time share major components and concepts. Thus, they build a strong foundation for a unified system that supports a variety of use cases.Heutzutage existiert ein großer Anwendungsbereich für 3D-Grafikapplikationen wie wissenschaftliche Visualisierungen, 3D-Inhalte in Webseiten, und Unterhaltungssoftware. Gleichzeitig sind die Geräte und Plattformen, welche die Anwendungen ausführen und anzeigen, heterogener als je zuvor. Anzeigegeräte reichen von mobilen Geräten zu Desktop-Systemen bis hin zu verteilten Bildschirmumgebungen, die eine kollaborative Anwendung begünstigen. Während die Leistungsfähigkeit der Geräte stark schwanken kann, sollten die dort laufenden Visualisierungen konsistent sein. Das Anwendungsfeld sollte bestimmen, wie und auf welchem Gerät Benutzer auf die Anwendung zugreifen, nicht die technischen Voraussetzungen zur Erzeugung der 3D-Grafik. Das Ziel dieser Thesis ist es, die diversen Herausforderungen zu untersuchen, die bei der Bereitstellung von konsistenten und skalierbaren Visualisierungsanwendungen auf heterogenen Plattformen eine Rolle spielen. Während wir nicht die Vielzahl an möglichen Anwendungsfällen abdecken konnten, haben wir eine repräsentative Auswahl an Rendering-Architekturen in den Kernbereichen wissenschaftliche Visualisierung, web-basierte 3D-Anwendungen, und virtuelle Filmproduktion entwickelt. Um die geforderte Qualität, Leistung, und Skalierbarkeit für verschiedene Client-Geräte und -Anzeigen zu gewährleisten, fokussieren sich unsere Architekturen auf die effiziente Nutzung und Kombination der verfügbaren Client-, Server-, und Netzwerkressourcen. Wir präsentieren innovative Lösungen, die hybrides und verteiltes Rendering als auch das Verwalten der Datensätze und Streaming der 3D-Ausgabe umfassen. Wir etablieren den Web-Browser als vielversprechende Plattform für zugängliche und portierbare Visualisierungsdienste. Um die Verwendbarkeit unserer Technologie in realitätsnahen Szenarien zu testen, haben wir mit Experten aus der Medizin und Filmindustrie zusammengearbeitet. Unsere Architekturen erreichen eine umfassende Abdeckung von Anzeige- und Rendering-Szenarien und teilen sich gleichzeitig wesentliche Komponenten und Konzepte. Sie bilden daher eine starke Grundlage für ein einheitliches System, das eine Vielzahl an Anwendungsfällen unterstützt

    디스플레이 장치를 위한 고정 비율 압축 하드웨어 설계

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 이혁재.디스플레이 장치에서의 압축 방식은 일반적인 비디오 압축 표준과는 다른 몇 가지 특징이 있다. 첫째, 특수한 어플리케이션을 목표로 한다. 둘째, 압축 이득, 소비 전력, 실시간 처리 등을 위해 하드웨어 크기가 작고, 목표로 하는 압축률이 낮다. 셋째, 래스터 주사 순서에 적합해야 한다. 넷째, 프레임 메모리 크기를 제한시키거나 임의 접근을 하기 위하여 압축 단위당 목표 압축률을 실시간으로 정확히 맞출 수 있어야 한다. 본 논문에서는 이와 같은 특징을 만족시키는 세 가지 압축 알고리즘과 하드웨어 구조를 제안하도록 한다. LCD 오버드라이브를 위한 압축 방식으로는 BTC(block truncation coding) 기반의 압축 방식을 제안하도록 한다. 본 논문은 압축 이득을 증가시키기 위하여 목표 압축률 12에 대한 압축 방식을 제안하는데, 압축 효율을 향상시키기 위하여 크게 두 가지 방법을 이용한다. 첫 번째는 이웃하는 블록과의 공간적 연관성을 이용하여 비트를 절약하는 방법이다. 그리고 두 번째는 단순한 영역은 2×16 코딩 블록, 복잡한 영역은 2×8 코딩 블록을 이용하는 방법이다. 2×8 코딩 블록을 이용하는 경우 목표 압축률을 맞추기 위하여 첫 번째 방법으로 절약된 비트를 이용한다. 저비용 근접-무손실 프레임 메모리 압축을 위한 방식으로는 1D SPIHT(set partitioning in hierarchical trees) 기반의 압축 방식을 제안하도록 한다. SPIHT은 고정 목표 압축률을 맞추는데 매우 효과적인 압축 방식이다. 그러나 1D 형태인 1D SPIHT은 래스터 주사 순서에 적합함에도 관련 연구가 많이 진행되지 않았다. 본 논문은 1D SPIHT의 가장 큰 문제점인 속도 문제를 해결할 수 있는 하드웨어 구조를 제안한다. 이를 위해 1D SPIHT 알고리즘은 병렬성을 이용할 수 있는 형태로 수정된다. 인코더의 경우 병렬 처리를 방해하는 의존 관계가 해결되고, 파이프라인 스케쥴링이 가능하게 된다. 디코더의 경우 병렬로 동작하는 각 패스가 디코딩할 비트스트림의 길이를 미리 예측할 수 있도록 알고리즘이 수정된다. 고충실도(high-fidelity) RGBW 컬러 이미지 압축을 위한 방식으로는 예측 기반의 압축 방식을 제안하도록 한다. 제안 예측 방식은 두 단계의 차분 과정으로 구성된다. 첫 번째는 공간적 연관성을 이용하는 단계이고, 두 번째는 인터-컬러 연관성을 이용하는 단계이다. 코딩의 경우 압축 효율이 높은 VLC(variable length coding) 방식을 이용하도록 한다. 그러나 기존의 VLC 방식은 목표 압축률을 정확히 맞추는데 어려움이 있었으므로 본 논문에서는 Golomb-Rice 코딩을 기반으로 한 고정 길이 압축 방식을 제안하도록 한다. 제안 인코더는 프리-코더와 포스터-코더로 구성되어 있다. 프리-코더는 특정 상황에 대하여 실제 인코딩을 수행하고, 다른 모든 상황에 대한 예측 인코딩 정보를 계산하여 포스터-코더에 전달한다. 그리고 포스트-코더는 전달받은 정보를 이용하여 실제 비트스트림을 생성한다.제 1 장 서론 1 1.1 연구 배경 1 1.2 연구 내용 4 1.3 논문 구성 8 제 2 장 이전 연구 9 2.1 BTC 9 2.1.1 기본 BTC 알고리즘 9 2.1.2 컬러 이미지 압축을 위한 BTC 알고리즘 10 2.2 SPIHT 13 2.2.1 1D SPIHT 알고리즘 13 2.2.2 SPIHT 하드웨어 17 2.3 예측 기반 코딩 19 2.3.1 예측 방법 19 2.3.2 VLC 20 2.3.3 예측 기반 코딩 하드웨어 22 제 3 장 LCD 오버드라이브를 위한 BTC 24 3.1 제안 알고리즘 24 3.1.1 비트-절약 방법 25 3.1.2 블록 크기 선택 방법 29 3.1.3 알고리즘 요약 31 3.2 하드웨어 구조 33 3.2.1 프레임 메모리 인터페이스 34 3.2.2 인코더와 디코더의 구조 37 3.3 실험 결과 44 3.3.1 알고리즘 성능 44 3.3.2 하드웨어 구현 결과 49 제 4 장 저비용 근접-무손실 프레임 메모리 압축을 위한 고속 1D SPIHT 54 4.1 인코더 하드웨어 구조 54 4.1.1 의존 관계 분석 및 제안하는 파이프라인 스케쥴 54 4.1.2 분류 비트 재배치 57 4.2 디코더 하드웨어 구조 59 4.2.1 비트스트림의 시작 주소 계산 59 4.2.2 절반-패스 처리 방법 63 4.3 하드웨어 구현 65 4.4 실험 결과 73 제 5 장 고충실도 RGBW 컬러 이미지 압축을 위한 고정 압축비 VLC 81 5.1 제안 알고리즘 81 5.1.1 RGBW 인터-컬러 연관성을 이용한 예측 방식 82 5.1.2 고정 압축비를 위한 Golomb-Rice 코딩 85 5.1.3 알고리즘 요약 89 5.2 하드웨어 구조 90 5.2.1 인코더 구조 91 5.2.2 디코더 구조 95 5.3 실험 결과 101 5.3.1 알고리즘 실험 결과 101 5.3.2 하드웨어 구현 결과 107 제 6 장 압축 성능 및 하드웨어 크기 비교 분석 113 6.1 압축 성능 비교 113 6.2 하드웨어 크기 비교 120 제 7 장 결론 125 참고문헌 128 ABSTRACT 135Docto

    Header Compression and Signal Processing for Wideband Communication Systems.

    Get PDF
    This thesis is dedicated to the investigation, development and practical verification of header compression and signal processing techniques over TErrestrial Trunked RAdio (TETRA), TETRA Enhanced Data Services (TEDS) and Power Line Communication (PLC). TETRA release I is a narrowband private mobile radio technology used by safety and security organizations, while TEDS is a widebandsystem. With the introduction of IP support, TEDS enables multimedia based applications and services to communicate across communication systems. However the IP extension for TEDS comes at a cost of significant header contributions with the payload. With small application payloads and fast rate application traffic profiles, the header contribution in the total size of the packet is considerably more than the actual application payload. This overhead constitutes the considerable slot capacity at the physical layer of TEDS and PLC. Advanced header compression techniques such as Robust Header Compression (RoHC) compress the huge header sizes and offer significant compression gain without compromising quality of service (QoS). Systems can utilize this bandwidth to transmit more information payload than control information. In this study, the objective is to investigate the integration of RoHC in TEDS and design a novel IPv6 enabled protocol stack for PLC with integrated RoHC. The purpose of the study is also to investigate the throughput optimization technique such as RoHC over TEDS and PLC by simulating different traffic profile classes and to illustrate the benefit of using RoHC over TEDS and PLC. The thesis also aims to design and simulate the TEDS physical layer for the purpose of investigating the performance of higher order modulation schemes. Current TEDS, standards are based on the transmission frequencies above 400MHz range, however with delays in the standardization of broadband TETRA, it is important to explore all possible avenues to extend the capacity of the system. The research concludes the finding of the application of RoHC for TEDS and PLC, against different traffic classes and propagation channels. The benefit of using RoHC in terms of saving bandwidth, slot capacity and other QoS parameters is presented along with integration aspects into TEDS and PLC communication stacks. The study also presents the TEDS physical layer simulation results for modulation schemes and transmission frequency other than specified in the standard. The research results presented in this thesis have been published in international symposiums and professional journals. The application of the benefits of using RoHC for TEDS has been proposed to the ETSI TETRA for contribution to the TETRA standard under STF 378. Simulation results for the investigation of characteristics of ?/4 DQPSK performance below 200 MHz have also been also presented to ETSI TETRA as a contribution to the existing TEDS standard. The Results presented for the design of IPv6 enabled stacked with integrated RoHC have been submitted as deliverable under the FP-7 project DLC+VIT4IP. All the results, simulations and investigations presented in the thesis have been carried out through the platform provided by HW Communication Ltd

    Architecture design of video processing systems on a chip

    Get PDF

    Virtualisation and Thin Client : A Survey of Virtual Desktop environments

    Get PDF
    This survey examines some of the leading commercial Virtualisation and Thin Client technologies. Reference is made to a number of academic research sources and to prominent industry specialists and commentators. A basic virtualisation Laboratory model is assembled to demonstrate fundamental Thin Client operations and to clarify potential problem areas

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of today’s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence

    Implementation of a VLC HDTV Distribution System for Consumer Premises

    Get PDF
    A unidirectional, visible light communication (VLC) system intended for the distribution of Digital Video Broadcasting (DVB), high-definition television (HDTV) content to DVB compatible TVs within consumer premises is presented. The system receives off-air HDTV content through a consumer grade DVB-T/T2 terrestrial set-top-box (STB) and re-encodes its Moving Picture Experts Group (MPEG) transport stream (TS) using a pulse position modulation (PPM) scheme called inversion offset PPM (IOPPM). The re-encoded TS is used to intensity modulate (IM) a blue light-emitting diode (LED) operating at a wavelength of 470 nm. Directed line-of-sight (DLOS) transmission is used over a free-space optical (FSO) channel exhibiting a Gaussian impulse response. A direct-detection (DD) receiver is used to detect the transmitted IOPPM stream, which is then decoded to recover the original MPEG TS. A STB supporting a high-definition multimedia interface (HDMI) is used to decode the MPEG TS and enable connectivity to an HD monitor. The system is presented as a complementary or an alternative distribution system to existing Wi-Fi and power-line technologies. VLC connectivity is promoted as a safer, securer, unlicensed and unregulated approach. The system is intended to enable TV manufacturers to reduce costs by, firstly, relocating the TV’s region specific radio frequency (RF) tuner and demodulator blocks to an external STB capable of supporting DVB reception standards, and, secondly, by eliminating all input and output connectors interfaces from the TV. Given the current trend for consumers to wall-mount TVs, the elimination of all connector interfaces, except the power cable, makes mounting simpler and easier. The operation of the final system was verified using real-world, off-air broadcast DVB-T/T2 channels supporting HDTV content. A serial optical transmission at a frequency of 66 MHz was achieved. The system also achieved 60 Mbit/s, error free transmission over a distance of 1.2 m without using error correction techniques. The methodology used to realise the system was a top-down, modular approach. Results were obtained from electrical modelling, simulation and experimental techniques, and using time-domain and FFT based measurements and analysis. The modular approach was adopted to enable design, development and testing of the subsystems independently of the overall system
    corecore