31 research outputs found

    Rapidgzip: Parallel Decompression and Seeking in Gzip Files Using Cache Prefetching

    Full text link
    Gzip is a file compression format, which is ubiquitously used. Although a multitude of gzip implementations exist, only pugz can fully utilize current multi-core processor architectures for decompression. Yet, pugz cannot decompress arbitrary gzip files. It requires the decompressed stream to only contain byte values 9-126. In this work, we present a generalization of the parallelization scheme used by pugz that can be reliably applied to arbitrary gzip-compressed data without compromising performance. We show that the requirements on the file contents posed by pugz can be dropped by implementing an architecture based on a cache and a parallelized prefetcher. This architecture can safely handle faulty decompression results, which can appear when threads start decompressing in the middle of a gzip file by using trial and error. Using 128 cores, our implementation reaches 8.7 GB/s decompression bandwidth for gzip-compressed base64-encoded data, a speedup of 55 over the single-threaded GNU gzip, and 5.6 GB/s for the Silesia corpus, a speedup of 33 over GNU gzip.Comment: HPDC'2

    Vers la Compression à Tous les Niveaux de la Hiérarchie de la Mémoire

    Get PDF
    Hardware compression techniques are typically simplifications of software compression methods. They must, however, comply with area, power and latency constraints. This study unveils the challenges of adopting compression in memory design. The goal of this analysis is not to summarize proposals, but to put in evidence the solutions they employ to handle those challenges. An in-depth description of the main characteristics of multiple methods is provided, as well as criteria that can be used as a basis for the assessment of such schemes.Typically, these schemes are not very efficient, and those that do compress well decompress slowly. This work explores their granularity to redefine their perspectives and improve their efficiency, through a concept called Region-Chunk compression. Its goal is to achieve low (good) compression ratio and fast decompression latency. The key observation is that by further sub-dividing the chunks of data being compressed one can reduce data duplication. This concept can be applied to several previously proposed compressors, resulting in a reduction of their average compressed size. In particular, a single-cycle-decompression compressor is boosted to reach a compressibility level competitive to state-of-the-art proposals.Finally, to increase the probability of successfully co-allocating compressed lines, Pairwise Space Sharing (PSS) is proposed. PSS can be applied orthogonally to compaction methods at no extra latency penalty, and with a cost-effective metadata overhead. The proposed system (Region-Chunk+PSS) further enhances the normalized average cache capacity by 2.7% (geometric mean), while featuring short decompression latency.Les techniques de compression matérielle sont généralement des simplifications des méthodes de compression logicielle. Elles doivent, toutefois, se conformer aux contraintes de surface, de puissance et de latence. Cette étude dévoile les défis de l’adoption de la compression dans la conception de la mémoire. Le but de l’analyse n’est pas de résumer les propositions, mais de mettre en évidence les solutions qu’ils emploient pour relever ces défis. Une description détaillée des principales caractéristiques de plusieurs méthodes est fournie, ainsi que des critères qui peuvent être utilisés comme base pour l’évaluation de ces systèmes.Généralement, ces schémas ne sont pas très efficaces, et les schémas qui compressent bien décompressent lentement. Ce travail explore leur granularité pour redéfinir leurs perspectives et améliorer leur efficacité, à travers un concept appelé compression Region-Chunk. Son objectif est d’obtenir un haut (bon) taux de compression et une latence de décompression rapide. L’observation clé est qu’en subdivisant davantage les blocs de données compressés, on peut réduire la duplication des données. Ce concept peut être appliqué à plusieurs compresseurs précédemment proposés, entraînant une réduction de leur taille moyenne compressée. En particulier, un compresseur à décompression à cycle unique est boosté pour atteindre un niveau de compressibilité compétitif par rapport aux propositions de pointe.Enfin, pour augmenter la probabilité de co-allouer avec succès des lignes compressées, Pairwise Space Sharing (PSS) est proposé. PSS peutêtre appliqué orthogonalement aux méthodes de compactage sans pénalité de latence supplémentaire, et avec une surcharge de métadonnées rentable. Le système proposé (Region-Chunk + PSS) améliore encore la capacité normalisé moyenne du cache de 2,7% (moyenne géométrique), tout en offrant une courte latence de décompression

    Técnicas de compresión de imágenes hiperespectrales sobre hardware reconfigurable

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, leída el 18-12-2020Sensors are nowadays in all aspects of human life. When possible, sensors are used remotely. This is less intrusive, avoids interferces in the measuring process, and more convenient for the scientist. One of the most recurrent concerns in the last decades has been sustainability of the planet, and how the changes it is facing can be monitored. Remote sensing of the earth has seen an explosion in activity, with satellites now being launched on a weekly basis to perform remote analysis of the earth, and planes surveying vast areas for closer analysis...Los sensores aparecen hoy en día en todos los aspectos de nuestra vida. Cuando es posible, de manera remota. Esto es menos intrusivo, evita interferencias en el proceso de medida, y además facilita el trabajo científico. Una de las preocupaciones recurrentes en las últimas décadas ha sido la sotenibilidad del planeta, y cómo menitoirzar los cambios a los que se enfrenta. Los estudios remotos de la tierra han visto un gran crecimiento, con satélites lanzados semanalmente para analizar la superficie, y aviones sobrevolando grades áreas para análisis más precisos...Fac. de InformáticaTRUEunpu

    Pengelolaan Arsip Secara Digital Menggunakan Algoritma LZSS Modifikasi untuk Kompresi File

    Get PDF
    Dokumen yang menumpuk dalam bentuk hardcopy dapat menimbulkan suatu masalah, baik dalam penataan, keamanan dokumen maupun ruang penyimpanan dokumen. Selain itu perawatan ekstra diperlukan untuk mencegah kerusakan pada dokumen penting yang disebabkan oleh serangga dan lain-lain. Dengan teknologi sekarang ini dokumen dapat disimpan dalam bentuk file dan sekaligus ukuran file dapat diperkecil untuk meminimalkan media penyimpanan dan penyusunan dokumen dapat ditata dengan baik sehingga pada saat pencarian dokumen menjadi lebih mudah. File merupakan data digital yang berisi informasi. Untuk memperkecil ukuran file yaitu dengan cara dipadatkan atau biasa disebut dengan kompresi sehingga ukuran file menjadi lebih kecil dari ukuran semula. Pada penelitian ini, metode yang digunakan untuk kompres file adalah LZSS (Lempel Ziv Strorer-Szymanski) yang telah dimodifikasi untuk menghasilkan kompres yang lebih optimal. Algoritma kompresi LZSS modifikasi dikembangkan berdasarkan konsep algoritma kompresi LZSS (pengkodean (offset,len)) serta penambahan pengkodean (offset,-len). Dari 9 sampel file yang dipilih sebagai data simulasi, diperoleh rata-rata rasio kompresi algoritma LZSS adalah 75,40% dan rata-rata kompresi algoritma LZSS modifikasi adalah 67,86%. Hal ini menunjukkan bahwa algoritman LZSS modifikasi menghasilkan tingkat kompresi yang lebih baik

    Database Streaming Compression on Memory-Limited Machines

    Get PDF
    Dynamic Huffman compression algorithms operate on data-streams with a bounded symbol list. With these algorithms, the complete list of symbols must be contained in main memory or secondary storage. A horizontal format transaction database that is streaming can have a very large item list. Many nodes tax both the processing hardware primary memory size, and the processing time to dynamically maintain the tree. This research investigated Huffman compression of a transaction-streaming database with a very large symbol list, where each item in the transaction database schema’s item list is a symbol to compress. The constraint of a large symbol list is, in this research, equivalent to the constraint of a memory-limited machine. A large symbol set will result if each item in a large database item list is a symbol to compress in a database stream. In addition, database streams may have some temporal component spanning months or years. Finally, the horizontal format is the format most suited to a streaming transaction database because the transaction IDs are not known beforehand This research prototypes an algorithm that will compresses a transaction database stream. There are several advantages to the memory limited dynamic Huffman algorithm. Dynamic Huffman algorithms are single pass algorithms. In many instances a second pass over the data is not possible, such as with streaming databases. Previous dynamic Huffman algorithms are not memory limited, they are asymptotic to O(n), where n is the number of distinct item IDs. Memory is required to grow to fit the n items. The improvement of the new memory limited Dynamic Huffman algorithm is that it would have an O(k) asymptotic memory requirement; where k is the maximum number of nodes in the Huffman tree, k \u3c n, and k is a user chosen constant. The new memory limited Dynamic Huffman algorithm compresses horizontally encoded transaction databases that do not contain long runs of 0’s or 1’s

    GPU上での展開に適した可逆データ圧縮方式に関する研究

    Get PDF
    広島大学(Hiroshima University)博士(工学)Doctor of Engineeringdoctora

    Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization

    Get PDF
    PCIe is a high-performing interface used to move data from a central host PC to an accelerator such as Field Programmable Gate Arrays (FPGA). This interface allows a system to perform fast data transfers in High-Performance Computing (HPC) and provide a performance boost. However, HPC systems normally require large datasets, and in these situations PCIe can become a bottleneck. To address this issue, we propose an open-source hardware compression/decompression system that can be used to adapt with continuously-streamed data with low latency and high throughput. We implement a compressor and decompressor engines on FPGA, scale up with multiple engines working in parallel, and evaluate the energy reduction and performance with different numbers of multiple engines. To alleviate the performance bottleneck in the processor acting as a controller, we propose a hardware scheduler to fairly distribute the datasets among the engines. Our design reduces the transmission time in PCIe, and the results show an energy reduction of up to 48% in the PCIe transfers, thanks to the decrease in the number of bits that have to be transmitted. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application

    Low-complexity, low-area computer architectures for cryptographic application in resource constrained environments

    Get PDF
    RCE (Resource Constrained Environment) is known for its stringent hardware design requirements. With the rise of Internet of Things (IoT), low-complexity and low-area designs are becoming prominent in the face of complex security threats. Two low-complexity, low-area cryptographic processors based on the ultimate reduced instruction set computer (URISC) are created to provide security features for wireless visual sensor networks (WVSN) by using field-programmable gate array (FPGA) based visual processors typically used in RCEs. The first processor is the Two Instruction Set Computer (TISC) running the Skipjack cipher. To improve security, a Compact Instruction Set Architecture (CISA) processor running the full AES with modified S-Box was created. The modified S-Box achieved a gate count reduction of 23% with no functional compromise compared to Boyar’s. Using the Spartan-3L XC3S1500L-4-FG320 FPGA, the implementation of the TISC occupies 71 slices and 1 block RAM. The TISC achieved a throughput of 46.38 kbps at a stable 24MHz clock. The CISA which occupies 157 slices and 1 block RAM, achieved a throughput of 119.3 kbps at a stable 24MHz clock. The CISA processor is demonstrated in two main applications, the first in a multilevel, multi cipher architecture (MMA) with two modes of operation, (1) by selecting cipher programs (primitives) and sharing crypto-blocks, (2) by using simple authentication, key renewal schemes, and showing perceptual improvements over direct AES on images. The second application demonstrates the use of the CISA processor as part of a selective encryption architecture (SEA) in combination with the millions instructions per second set partitioning in hierarchical trees (MIPS SPIHT) visual processor. The SEA is implemented on a Celoxica RC203 Vertex XC2V3000 FPGA occupying 6251 slices and a visual sensor is used to capture real world images. Four images frames were captured from a camera sensor, compressed, selectively encrypted, and sent over to a PC environment for decryption. The final design emulates a working visual sensor, from on node processing and encryption to back-end data processing on a server computer
    corecore