35 research outputs found
Vector quantization
During the past ten years Vector Quantization (VQ) has developed from a theoretical possibility promised by Shannon's source coding theorems into a powerful and competitive technique for speech and image coding and compression at medium to low bit rates. In this survey, the basic ideas behind the design of vector quantizers are sketched and some comments made on the state-of-the-art and current research efforts
Hybrid Technique for Arabic Text Compression
Arabic content on the Internet and other digital media is increasing exponentially, and the number of Arab users of these media has multiplied by more than 20 over the past five years. There is a real need to save allocated space for this content as well as allowing more efficient usage, searching, and retrieving information operations on this content. Using techniques borrowed from other languages or general data compression techniques, ignoring the proper features of Arabic has limited success in terms of compression ratio. In this paper, we present a hybrid technique that uses the linguistic features of Arabic language to improve the compression ratio of Arabic texts. This technique works in phases. In the first phase, the text file is split into four different files using a multilayer model-based approach. In the second phase, each one of these four files is compressed using the Burrows-Wheeler compression algorithm
Recommended from our members
Towards green scientific data compression through high-level I/O interfaces
Every HPC system today has to cope with a deluge of data generated by scientific applications, simulations or large- scale experiments. The upscaling of supercomputer systems and infrastructures, generally results in a dramatic increase of their energy consumption. In this paper, we argue that techniques like data compression can lead to significant gains in terms of power efficiency by reducing both network and storage requirements. To that end, we propose a novel methodology for achieving on-the-fly intelligent determination of energy efficient data reduction for a given data set by leveraging state-of-the-art compression algorithms and meta data at application-level I/O. We motivate our work by analyzing the energy and storage saving needs of real-life scientific HPC applications, and review the various compression techniques that can be applied. We find that the resulting data reduction can decrease the data volume transferred and stored by as much as 80% in some cases, consequently leading to significant savings in storage and networking costs
Recommended from our members
Towards decoupling the selection of compression algorithms from quality constraints – an investigation of lossy compression efficiency
Data intense scientific domains use data compression to reduce the storage space needed. Lossless data compression preserves information accurately but lossy data compression can achieve much higher compression rates depending on the tolerable error margins. There are many ways of defining precision and to exploit this knowledge, therefore, the field of lossy compression is subject to active research. From the perspective of a scientist, the qualitative definition about the implied loss of data precision should only matter.
With the Scientific Compression Library (SCIL), we are developing a meta-compressor that allows users to define various quantities for acceptable error and expected performance behavior. The library then picks a suitable chain of algorithms yielding the user’s requirements, the ongoing work is a preliminary stage for the design of an adaptive selector. This approach is a crucial step towards a scientifically safe use of much-needed lossy data compression, because it disentangles the tasks of determining scientific characteristics of tolerable noise, from the task of determining an optimal compression strategy. Future algorithms can be used without changing application code.
In this paper, we evaluate various lossy compression algorithms for compressing different scientific datasets (Isabel, ECHAM6), and focus on the analysis of synthetically created data that serves as blueprint for many observed datasets. We also briefly describe the available quantitiesof SCIL to define data precision and introduce two efficient compression algorithms for individualdata points. This shows that the best algorithm depends on user settings and data properties
Recommended from our members
AIMES: advanced computation and I/O methods for earth-system simulations
Dealing with extreme scale Earth-system models is challenging from the computer science perspective, as the required computing power and storage capacity are steadily increasing.
Scientists perform runs with growing resolution or aggregate results from many similar smaller-scale runs with slightly different initial conditions (the so-called ensemble runs).
In the fifth Coupled Model Intercomparison Project (CMIP5), the produced datasets require more than three Petabytes of storage and the compute and storage requirements are increasing significantly for CMIP6.
Climate scientists across the globe are developing next-generation models based on improved numerical formulation leading to grids that are discretized in alternative forms such as an icosahedral (geodesic) grid.
The developers of these models face similar problems in scaling, maintaining and optimizing code.
Performance portability and the maintainability of code are key concerns of scientists as, compared to industry projects, model code is continuously revised and extended to incorporate further levels of detail.
This leads to a rapidly growing code base that is rarely refactored.
However, code modernization is important to maintain productivity of the scientist working
with the code and for utilizing performance provided by modern and future architectures.
The need for performance optimization is motivated by the evolution of the parallel architecture landscape from
homogeneous flat machines to heterogeneous combinations of processors with deep memory hierarchy.
Notably, the rise of many-core, throughput-oriented accelerators, such as GPUs, requires non-trivial code changes at minimum and, even worse, may necessitate a substantial rewrite of the existing codebase.
At the same time, the code complexity increases the difficulty for computer scientists and vendors to understand and optimize the code for a given system.
Storing the products of climate predictions requires a large storage and archival system which is expensive.
Often, scientists restrict the number of scientific variables and write interval to keep the costs
balanced.
Compression algorithms can reduce the costs significantly but can also increase the scientific yield of simulation runs.
In the AIMES project, we addressed the key issues of programmability, computational efficiency and I/O limitations that are common in next-generation icosahedral earth-system models.
The project focused on the separation of concerns between domain scientist, computational scientists, and computer scientists