29 research outputs found

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    High throughput image compression and decompression on GPUs

    Get PDF
    Diese Arbeit befasst sich mit der Entwicklung eines GPU-freundlichen, intra-only, Wavelet-basierten Videokompressionsverfahrens mit hohem Durchsatz, das für visuell verlustfreie Anwendungen optimiert ist. Ausgehend von der Beobachtung, dass der JPEG 2000 Entropie-Kodierer ein Flaschenhals ist, werden verschiedene algorithmische Änderungen vorgeschlagen und bewertet. Zunächst wird der JPEG 2000 Selective Arithmetic Coding Mode auf der GPU realisiert, wobei sich die Erhöhung des Durchsatzes hierdurch als begrenzt zeigt. Stattdessen werden zwei nicht standard-kompatible Änderungen vorgeschlagen, die (1) jede Bitebebene in nur einem einzelnen Pass verarbeiten (Single-Pass-Modus) und (2) einen echten Rohcodierungsmodus einführen, der sample-weise parallelisierbar ist und keine aufwendige Kontextmodellierung erfordert. Als nächstes wird ein alternativer Entropiekodierer aus der Literatur, der Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), evaluiert. Er gibt Signaladaptivität zu Gunsten von höherer Parallelität auf und daher wird hier untersucht und gezeigt, dass ein aus verschiedensten Testsequenzen gemitteltes statisches Wahrscheinlichkeitsmodell eine kompetitive Kompressionseffizienz erreicht. Es wird zudem eine Kombination von BPC-PaCo mit dem Single-Pass-Modus vorgeschlagen, der den Speedup gegenüber dem JPEG 2000 Entropiekodierer von 2,15x (BPC-PaCo mit zwei Pässen) auf 2,6x (BPC-PaCo mit Single-Pass-Modus) erhöht auf Kosten eines um 0,3 dB auf 1,0 dB erhöhten Spitzen-Signal-Rausch-Verhältnis (PSNR). Weiter wird ein paralleler Algorithmus zur Post-Compression Ratenkontrolle vorgestellt sowie eine parallele Codestream-Erstellung auf der GPU. Es wird weiterhin ein theoretisches Laufzeitmodell formuliert, das es durch Benchmarking von einer GPU ermöglicht die Laufzeit einer Routine auf einer anderen GPU vorherzusagen. Schließlich wird der erste JPEG XS GPU Decoder vorgestellt und evaluiert. JPEG XS wurde als Low Complexity Codec konzipiert und forderte erstmals explizit GPU-Freundlichkeit bereits im Call for Proposals. Ab Bitraten über 1 bpp ist der Decoder etwa 2x schneller im Vergleich zu JPEG 2000 und 1,5x schneller als der schnellste hier vorgestellte Entropiekodierer (BPC-PaCo mit Single-Pass-Modus). Mit einer GeForce GTX 1080 wird ein Decoder Durchsatz von rund 200 fps für eine UHD-4:4:4-Sequenz erreicht.This work investigates possibilities to create a high throughput, GPU-friendly, intra-only, Wavelet-based video compression algorithm optimized for visually lossless applications. Addressing the key observation that JPEG 2000’s entropy coder is a bottleneck and might be overly complex for a high bit rate scenario, various algorithmic alterations are proposed. First, JPEG 2000’s Selective Arithmetic Coding mode is realized on the GPU, but the gains in terms of an increased throughput are shown to be limited. Instead, two independent alterations not compliant to the standard are proposed, that (1) give up the concept of intra-bit plane truncation points and (2) introduce a true raw-coding mode that is fully parallelizable and does not require any context modeling. Next, an alternative block coder from the literature, the Bitplane Coder with Parallel Coefficient Processing (BPC-PaCo), is evaluated. Since it trades signal adaptiveness for increased parallelism, it is shown here how a stationary probability model averaged from a set of test sequences yields competitive compression efficiency. A combination of BPC-PaCo with the single-pass mode is proposed and shown to increase the speedup with respect to the original JPEG 2000 entropy coder from 2.15x (BPC-PaCo with two passes) to 2.6x (proposed BPC-PaCo with single-pass mode) at the marginal cost of increasing the PSNR penalty by 0.3 dB to at most 1 dB. Furthermore, a parallel algorithm is presented that determines the optimal code block bit stream truncation points (given an available bit rate budget) and builds the entire code stream on the GPU, reducing the amount of data that has to be transferred back into host memory to a minimum. A theoretical runtime model is formulated that allows, based on benchmarking results on one GPU, to predict the runtime of a kernel on another GPU. Lastly, the first ever JPEG XS GPU-decoder realization is presented. JPEG XS was designed to be a low complexity codec and for the first time explicitly demanded GPU-friendliness already in the call for proposals. Starting at bit rates above 1 bpp, the decoder is around 2x faster compared to the original JPEG 2000 and 1.5x faster compared to JPEG 2000 with the fastest evaluated entropy coder (BPC-PaCo with single-pass mode). With a GeForce GTX 1080, a decoding throughput of around 200 fps is achieved for a UHD 4:4:4 sequence

    디스플레이 장치를 위한 고정 비율 압축 하드웨어 설계

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 이혁재.디스플레이 장치에서의 압축 방식은 일반적인 비디오 압축 표준과는 다른 몇 가지 특징이 있다. 첫째, 특수한 어플리케이션을 목표로 한다. 둘째, 압축 이득, 소비 전력, 실시간 처리 등을 위해 하드웨어 크기가 작고, 목표로 하는 압축률이 낮다. 셋째, 래스터 주사 순서에 적합해야 한다. 넷째, 프레임 메모리 크기를 제한시키거나 임의 접근을 하기 위하여 압축 단위당 목표 압축률을 실시간으로 정확히 맞출 수 있어야 한다. 본 논문에서는 이와 같은 특징을 만족시키는 세 가지 압축 알고리즘과 하드웨어 구조를 제안하도록 한다. LCD 오버드라이브를 위한 압축 방식으로는 BTC(block truncation coding) 기반의 압축 방식을 제안하도록 한다. 본 논문은 압축 이득을 증가시키기 위하여 목표 압축률 12에 대한 압축 방식을 제안하는데, 압축 효율을 향상시키기 위하여 크게 두 가지 방법을 이용한다. 첫 번째는 이웃하는 블록과의 공간적 연관성을 이용하여 비트를 절약하는 방법이다. 그리고 두 번째는 단순한 영역은 2×16 코딩 블록, 복잡한 영역은 2×8 코딩 블록을 이용하는 방법이다. 2×8 코딩 블록을 이용하는 경우 목표 압축률을 맞추기 위하여 첫 번째 방법으로 절약된 비트를 이용한다. 저비용 근접-무손실 프레임 메모리 압축을 위한 방식으로는 1D SPIHT(set partitioning in hierarchical trees) 기반의 압축 방식을 제안하도록 한다. SPIHT은 고정 목표 압축률을 맞추는데 매우 효과적인 압축 방식이다. 그러나 1D 형태인 1D SPIHT은 래스터 주사 순서에 적합함에도 관련 연구가 많이 진행되지 않았다. 본 논문은 1D SPIHT의 가장 큰 문제점인 속도 문제를 해결할 수 있는 하드웨어 구조를 제안한다. 이를 위해 1D SPIHT 알고리즘은 병렬성을 이용할 수 있는 형태로 수정된다. 인코더의 경우 병렬 처리를 방해하는 의존 관계가 해결되고, 파이프라인 스케쥴링이 가능하게 된다. 디코더의 경우 병렬로 동작하는 각 패스가 디코딩할 비트스트림의 길이를 미리 예측할 수 있도록 알고리즘이 수정된다. 고충실도(high-fidelity) RGBW 컬러 이미지 압축을 위한 방식으로는 예측 기반의 압축 방식을 제안하도록 한다. 제안 예측 방식은 두 단계의 차분 과정으로 구성된다. 첫 번째는 공간적 연관성을 이용하는 단계이고, 두 번째는 인터-컬러 연관성을 이용하는 단계이다. 코딩의 경우 압축 효율이 높은 VLC(variable length coding) 방식을 이용하도록 한다. 그러나 기존의 VLC 방식은 목표 압축률을 정확히 맞추는데 어려움이 있었으므로 본 논문에서는 Golomb-Rice 코딩을 기반으로 한 고정 길이 압축 방식을 제안하도록 한다. 제안 인코더는 프리-코더와 포스터-코더로 구성되어 있다. 프리-코더는 특정 상황에 대하여 실제 인코딩을 수행하고, 다른 모든 상황에 대한 예측 인코딩 정보를 계산하여 포스터-코더에 전달한다. 그리고 포스트-코더는 전달받은 정보를 이용하여 실제 비트스트림을 생성한다.제 1 장 서론 1 1.1 연구 배경 1 1.2 연구 내용 4 1.3 논문 구성 8 제 2 장 이전 연구 9 2.1 BTC 9 2.1.1 기본 BTC 알고리즘 9 2.1.2 컬러 이미지 압축을 위한 BTC 알고리즘 10 2.2 SPIHT 13 2.2.1 1D SPIHT 알고리즘 13 2.2.2 SPIHT 하드웨어 17 2.3 예측 기반 코딩 19 2.3.1 예측 방법 19 2.3.2 VLC 20 2.3.3 예측 기반 코딩 하드웨어 22 제 3 장 LCD 오버드라이브를 위한 BTC 24 3.1 제안 알고리즘 24 3.1.1 비트-절약 방법 25 3.1.2 블록 크기 선택 방법 29 3.1.3 알고리즘 요약 31 3.2 하드웨어 구조 33 3.2.1 프레임 메모리 인터페이스 34 3.2.2 인코더와 디코더의 구조 37 3.3 실험 결과 44 3.3.1 알고리즘 성능 44 3.3.2 하드웨어 구현 결과 49 제 4 장 저비용 근접-무손실 프레임 메모리 압축을 위한 고속 1D SPIHT 54 4.1 인코더 하드웨어 구조 54 4.1.1 의존 관계 분석 및 제안하는 파이프라인 스케쥴 54 4.1.2 분류 비트 재배치 57 4.2 디코더 하드웨어 구조 59 4.2.1 비트스트림의 시작 주소 계산 59 4.2.2 절반-패스 처리 방법 63 4.3 하드웨어 구현 65 4.4 실험 결과 73 제 5 장 고충실도 RGBW 컬러 이미지 압축을 위한 고정 압축비 VLC 81 5.1 제안 알고리즘 81 5.1.1 RGBW 인터-컬러 연관성을 이용한 예측 방식 82 5.1.2 고정 압축비를 위한 Golomb-Rice 코딩 85 5.1.3 알고리즘 요약 89 5.2 하드웨어 구조 90 5.2.1 인코더 구조 91 5.2.2 디코더 구조 95 5.3 실험 결과 101 5.3.1 알고리즘 실험 결과 101 5.3.2 하드웨어 구현 결과 107 제 6 장 압축 성능 및 하드웨어 크기 비교 분석 113 6.1 압축 성능 비교 113 6.2 하드웨어 크기 비교 120 제 7 장 결론 125 참고문헌 128 ABSTRACT 135Docto

    Compressão de dados sensoriais em sistemas robóticos

    Get PDF
    One of the main problems in the development and debugging of robotic systems is the amount of data stored in files containing sensor data (ex. ROS proprietary log files - BAGS). If we consider a robot with several cameras and other sensors that collect information from the environment several times per second, we quickly obtain very large files. Besides the concerns regarding storage and, in some cases, transmission, it becomes extremely hard to find important information in these files. In this thesis, we tried to solve both problems studying and implementing data compression solutions to reduce the referred files. The main focus was image and video compression, by far the most storage consuming data. Moreover, we conducted a detailed study about the effect of lossy compression methods in the performance of some state of the art image analysis algorithms. Another contribution was the development of an intelligent video player to help roboticists in their work while they evaluate the recorded data after experiments. Parts of the video that do not contain relevant information are skipped during the play. Based on the results, we concluded that ROS native compression is not sufficient. Furthermore, solutions based on ROS, or virtually any robotic system that has to deal with image/video data, would benefit with the use of a H.265 codec, as it provides the smallest number of bits per pixel without a significant penalty on the performance of image analysis algorithms.Um dos principais problemas no desenvolvimento e depuração de sistemas robóticos é a quantidade de dados armazenados em ficheiros contendo dados sensoriais (ex. ficheiros de log proprietários de ROS - Bags). Se considerarmos um robô com várias câmaras e outros sensores, que recolhem informação do ambiente diversas vezes por segundo, obtemos rapidamente ficheiros muito grandes. Além das preocupações com o armazenamento e, em alguns casos, a transmissão, torna-se extremamente difícil encontrar informações importantes nesses ficheiros. Nesta dissertação, procuramos a melhor solução para os dois problemas estudando e implementando soluções de compressão de dados para reduzir os ficheiros referidos. O foco principal foi compressão de imagem/video, de longe, os dados que consomem mais armazenamento. Além disso, realizamos um estudo detalhado sobre o efeito de compressão com perdas no desempenho de alguns algoritmos de análise de imagem estado da arte. Outra contribuição foi o desenvolvimento de um leitor de vídeo inteligente para ajudar os roboticistas no seu trabalho enquanto avaliam os dados gravados. Partes do vídeo que não contêm informações relevantes são aceleradas durante a leitura. Com base nos resultados, concluímos que a compressão nativa de ROS não é suficiente. Além disso, soluções baseadas em ROS, ou de um modo geral qualquer sistema robótico que precise de lidar com dados de imagem/vídeo, beneficiaria com o uso de um codec H.265, uma vez que fornece o menor número de bits por pixel sem penalização significativa da eficiência dos algoritmos de análise de imagem.Mestrado em Engenharia de Computadores e Telemátic

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Novi algoritam za kompresiju seizmičkih podataka velike amplitudske rezolucije

    Get PDF
    Renewable sources cannot meet energy demand of a growing global market. Therefore, it is expected that oil & gas will remain a substantial sources of energy in a coming years. To find a new oil & gas deposits that would satisfy growing global energy demands, significant efforts are constantly involved in finding ways to increase efficiency of a seismic surveys. It is commonly considered that, in an initial phase of exploration and production of a new fields, high-resolution and high-quality images of the subsurface are of the great importance. As one part in the seismic data processing chain, efficient managing and delivering of a large data sets, that are vastly produced by the industry during seismic surveys, becomes extremely important in order to facilitate further seismic data processing and interpretation. In this respect, efficiency to a large extent relies on the efficiency of the compression scheme, which is often required to enable faster transfer and access to data, as well as efficient data storage. Motivated by the superior performance of High Efficiency Video Coding (HEVC), and driven by the rapid growth in data volume produced by seismic surveys, this work explores a 32 bits per pixel (b/p) extension of the HEVC codec for compression of seismic data. It is proposed to reassemble seismic slices in a format that corresponds to video signal and benefit from the coding gain achieved by HEVC inter mode, besides the possible advantages of the (still image) HEVC intra mode. To this end, this work modifies almost all components of the original HEVC codec to cater for high bit-depth coding of seismic data: Lagrange multiplier used in optimization of the coding parameters has been adapted to the new data statistics, core transform and quantization have been reimplemented to handle the increased bit-depth range, and modified adaptive binary arithmetic coder has been employed for efficient entropy coding. In addition, optimized block selection, reduced intra prediction modes, and flexible motion estimation are tested to adapt to the structure of seismic data. Even though the new codec after implementation of the proposed modifications goes beyond the standardized HEVC, it still maintains a generic HEVC structure, and it is developed under the general HEVC framework. There is no similar work in the field of the seismic data compression that uses the HEVC as a base codec setting. Thus, a specific codec design has been tailored which, when compared to the JPEG-XR and commercial wavelet-based codec, significantly improves the peak-signal-tonoise- ratio (PSNR) vs. compression ratio performance for 32 b/p seismic data. Depending on a proposed configurations, PSNR gain goes from 3.39 dB up to 9.48 dB. Also, relying on the specific characteristics of seismic data, an optimized encoder is proposed in this work. It reduces encoding time by 67.17% for All-I configuration on trace image dataset, and 67.39% for All-I, 97.96% for P2-configuration and 98.64% for B-configuration on 3D wavefield dataset, with negligible coding performance losses. As a side contribution of this work, HEVC is analyzed within all of its functional units, so that the presented work itself can serve as a specific overview of methods incorporated into the standard

    A practical comparison between two powerful PCC codec’s

    Get PDF
    Recent advances in the consumption of 3D content creates the necessity of efficient ways to visualize and transmit 3D content. As a result, methods to obtain that same content have been evolving, leading to the development of new methods of representations, namely point clouds and light fields. A point cloud represents a set of points with associated Cartesian coordinates associated with each point(x, y, z), as well as being able to contain even more information inside that point (color, material, texture, etc). This kind of representation changes the way on how 3D content in consumed, having a wide range of applications, from videogaming to medical ones. However, since this type of data carries so much information within itself, they are data-heavy, making the storage and transmission of content a daunting task. To resolve this issue, MPEG created a point cloud coding normalization project, giving birth to V-PCC (Video-based Point Cloud Coding) and G-PCC (Geometry-based Point Cloud Coding) for static content. Firstly, a general analysis of point clouds is made, spanning from their possible solutions, to their acquisition. Secondly, point cloud codecs are studied, namely VPCC and G-PCC from MPEG. Then, a state of art study of quality evaluation is performed, namely subjective and objective evaluation. Finally, a report on the JPEG Pleno Point Cloud, in which an active colaboration took place, is made, with the comparative results of the two codecs and used metrics.Os avanços recentes no consumo de conteúdo 3D vêm criar a necessidade de maneiras eficientes de visualizar e transmitir conteúdo 3D. Consequentemente, os métodos de obtenção desse mesmo conteúdo têm vindo a evoluir, levando ao desenvolvimento de novas maneiras de representação, nomeadamente point clouds e lightfields. Um point cloud (núvem de pontos) representa um conjunto de pontos com coordenadas cartesianas associadas a cada ponto (x, y, z), além de poder conter mais informação dentro do mesmo (cor, material, textura, etc). Este tipo de representação abre uma nova janela na maneira como se consome conteúdo 3D, tendo um elevado leque de aplicações, desde videojogos e realidade virtual a aplicações médicas. No entanto, este tipo de dados, ao carregarem com eles tanta informação, tornam-se incrivelmente pesados, tornando o seu armazenamento e transmissão uma tarefa hercúleana. Tendo isto em mente, a MPEG criou um projecto de normalização de codificação de point clouds, dando origem ao V-PCC (Video-based Point Cloud Coding) e G-PCC (Geometry-based Point Cloud Coding) para conteúdo estático. Esta dissertação tem como objectivo uma análise geral sobre os point clouds, indo desde as suas possívei utilizações à sua aquisição. Seguidamente, é efectuado um estudo dos codificadores de point clouds, nomeadamente o V-PCC e o G-PCC da MPEG, o estado da arte da avaliação de qualidade, objectiva e subjectiva, e finalmente, são reportadas as actividades da JPEG Pleno Point Cloud, na qual se teve uma colaboração activa

    Energy-precision tradeoffs in the graphics pipeline

    Get PDF
    The energy consumption of a graphics processing unit (GPU) is an important factor in its design, whether for a server, desktop, or mobile device. Mobile products, such as smart phones, tablets, and laptop computers, rely on batteries to function; the less the demand for power is on these batteries, the longer they will last before needing to be recharged. GPUs used in servers and desktops, while not dependent on a battery for operation, are still limited by the efficiency of power supplies and heat dissipation techniques. In this dissertation, I propose to lower the energy consumption of GPUs by reducing the precision of floating-point arithmetic in the graphics pipeline and the data sent and stored on- and off-chip. The key idea behind this work is twofold: energy can be saved through a systematic and targeted reduction in the number of bits 1) computed and 2) communicated. Reducing the number of bits computed will necessarily reduce either the precision or range of a floating point number. I focus on saving energy by way of reducing precision, which can exploit the over-provisioning of bits in many stages of the graphics pipeline. Reducing the number of bits communicated takes several forms. First, I propose enhancements to existing compression schemes for off-chip buffers to save bandwidth. I also suggest a simple extension that exploits unused bits in reduced-precision data undergoing compression. Finally, I present techniques for saving energy in on-chip communication of reduced-precision data. By designing and simulating variable-precision arithmetic circuits with promising energy versus precision characteristics and tradeoffs, I have developed an energy model for GPUs. Using this model and my techniques, I have shown that significant savings (up to 70% in computation in the vertex and pixel shader stages) are possible by reducing the precision of the arithmetic. Further, my compression approaches have enabled improvements of 1.26x over past work, and a general-purpose compressor design has achieved bandwidth savings of 34%, 87%, and 65% for color, depth, and geometry data, respectively, which is competitive with past work. Lastly, an initial exploration in signal gating unused lines in on-chip buses has suggested savings of 13-48% for the tested applications' traffic from a multiprocessor's register file to its L1 cache

    Remote Sensing Data Compression

    Get PDF
    A huge amount of data is acquired nowadays by different remote sensing systems installed on satellites, aircrafts, and UAV. The acquired data then have to be transferred to image processing centres, stored and/or delivered to customers. In restricted scenarios, data compression is strongly desired or necessary. A wide diversity of coding methods can be used, depending on the requirements and their priority. In addition, the types and properties of images differ a lot, thus, practical implementation aspects have to be taken into account. The Special Issue paper collection taken as basis of this book touches on all of the aforementioned items to some degree, giving the reader an opportunity to learn about recent developments and research directions in the field of image compression. In particular, lossless and near-lossless compression of multi- and hyperspectral images still remains current, since such images constitute data arrays that are of extremely large size with rich information that can be retrieved from them for various applications. Another important aspect is the impact of lossless compression on image classification and segmentation, where a reasonable compromise between the characteristics of compression and the final tasks of data processing has to be achieved. The problems of data transition from UAV-based acquisition platforms, as well as the use of FPGA and neural networks, have become very important. Finally, attempts to apply compressive sensing approaches in remote sensing image processing with positive outcomes are observed. We hope that readers will find our book useful and interestin

    Gbit/second lossless data compression hardware

    Get PDF
    This thesis investigates how to improve the performance of lossless data compression hardware as a tool to reduce the cost per bit stored in a computer system or transmitted over a communication network. Lossless data compression allows the exact reconstruction of the original data after decompression. Its deployment in some high-bandwidth applications has been hampered due to performance limitations in the compressing hardware that needs to match the performance of the original system to avoid becoming a bottleneck. Advancing the area of lossless data compression hardware, hence, offers a valid motivation with the potential of doubling the performance of the system that incorporates it with minimum investment. This work starts by presenting an analysis of current compression methods with the objective of identifying the factors that limit performance and also the factors that increase it. [Continues.
    corecore