Search CORE

110 research outputs found

Two high-performance alternatives to ZLIB scientific-data compression

Author: Almeida Samuel
Melle-Franco M.
Oliveira Vitor Serafim Pereira
Pina António Manuel Silva
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2014
Field of study

ZLIB is used in diverse frameworks by the scientific community, both to reduce disk storage and to alleviate pressure on I/O. As it becomes a bottleneck on multi-core systems, higher throughput alternatives must be considered, exploring parallelism and/or more effective compression schemes. This work provides a comparative study of the ZLIB, LZ4 and FPC compressors (serial and parallel implementations), focusing on CR, bandwidth and speedup. LZ4 provides very high throughput (decompressing over 1GB/s versus 120MB/s for ZLIB) but its CR suffers a degradation of 5-10%. FPC also provides higher throughputs than ZLIB, but the CR varies a lot with the data. ZLIB and LZ4 can achieve almost linear speedups for some datasets, while current implementation of parallel FPC provides little if any performance gain. For the ROOT dataset, LZ4 was found to provide higher CR, scalability and lower memory consumption than FPC, thus emerging as a better alternative to ZLIB.This work is funded by National Funds through the FCT Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014, UT Austin - Portugal FCT grant SFRH/ BD/47840/2008, and the resources from the project SeARCH funded under contract CONC- REEQ/443/2005. We would also like to thank Nuno Castro and Rafael Silva for their contributions

Universidade do Minho: RepositoriUM

Monte Carlo Particle Lists: MCPL

Author: Cai Xiao Xiao
Kanaki Kalliopi
Kittelmann Thomas
Klinkby Esben
Knudsen Erik B
Willendrup Peter
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

A binary format with lists of particle state information, for interchanging particles between various Monte Carlo simulation applications, is presented. Portable C code for file manipulation is made available to the scientific community, along with converters and plugins for several popular simulation packages

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Bicriteria data compression

Author: Farruggia Andrea
Ferragina Paolo
Frangioni Antonio
Venturini Rossano
Publication venue
Publication date: 15/07/2013
Field of study

The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve effective compression ratio and very efficient decompression speed. Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of its decompression speed and its flexibility in trading decompression speed versus compressed-space efficiency. Each of the existing implementations offers a trade-off between space occupancy and decompression speed, so software engineers have to content themselves by picking the one which comes closer to the requirements of the application in their hands. Starting from these premises, and for the first time in the literature, we address in this paper the problem of trading optimally, and in a principled way, the consumption of these two resources by introducing the Bicriteria LZ77-Parsing problem, which formalizes in a principled way what data-compressors have traditionally approached by means of heuristics. The goal is to determine an LZ77 parsing which minimizes the space occupancy in bits of the compressed file, provided that the decompression time is bounded by a fixed amount (or vice-versa). This way, the software engineer can set its space (or time) requirements and then derive the LZ77 parsing which optimizes the decompression speed (or the space occupancy, respectively). We solve this problem efficiently in O(n log^2 n) time and optimal linear space within a small, additive approximation, by proving and deploying some specific structural properties of the weighted graph derived from the possible LZ77-parsings of the input file. The preliminary set of experiments shows that our novel proposal dominates all the highly engineered competitors, hence offering a win-win situation in theory&practice

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Even bigger data: preparing for the LHC/ATLAS upgrade

Author: Castro Nuno F.
Oliveira Vitor Serafim Pereira
Onofre A.
Pina António Manuel Silva
Veloso F.
Publication venue
Publication date: 01/01/2012
Field of study

The Large Hadron Collider’s (LHC) experiments’ data volume is expected to grow one order of magnitude following the machine operation conditions upgrade in 2013-2014. The challenge to the scientific results of our team is: i) how to deal with a 10-fold increase in the data volume that must be processed for each analysis, while ii) supporting the increase in the complexity of the analysis applications, iii) reduce the turnover time of the results and iv) these issues must be addressed with limited additional resources given Europe’s present political and economic panorama. In this paper we take a position in this challenge and on the research directions to be explored. A systematic analysis of the analysis applications is presented to study optimization opportunities of the application and of the underlying running system. Than a new local system architecture is proposed to increase resource usage efficiency and to provide a gradual upgrade route from current systems.FCT grants SFRH/BPD/63495/2009; SFRH/BPD/47928/2008, by the UT Austin | Portugal FCT grant SFRH/BD/47840/2008; FCT project PEst-OE/EEI/UI0752/2011

Universidade do Minho: RepositoriUM

Software Challenges For HL-LHC Data Analysis

Author: Amadio Guilherme
An Sitong
Bellenot Bertrand
Blomer Jakob
Brann Kim Albertsson
Canal Philippe
Couet Olivier
Galli Massimiliano
Guiraud Enrico
Hageboeck Stephan
Linev Sergey
Moneta Lorenzo
Naumann Axel
Padulano Vincenzo Eduardo
Pla Xavier Valls
Rademakers Fons
ROOT Team
Saavedra Enric Tejedor
Shadura Oksana
Tadel Alja Mrak
Tadel Matevz
Vassilev Vassil
Vila Pere Mato
Wunsch Stefan
Publication venue
Publication date: 04/05/2020
Field of study

The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

arXiv.org e-Print Archive

CERN Document Server

RNTuple performance: Status and Outlook

Author: Blomer Jakob
Lopez-Gomez Javier
Publication venue
Publication date: 07/04/2022
Field of study

Upcoming HEP experiments, e.g. at the HL-LHC, are expected to increase the volume of generated data by at least one order of magnitude. In order to retain the ability to analyze the influx of data, full exploitation of modern storage hardware and systems, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes critical. To this end, the ROOT RNTuple I/O subsystem has been designed to address performance bottlenecks and shortcomings of ROOT's current state of the art TTree I/O subsystem. RNTuple provides a backwards-incompatible redesign of the TTree binary format and access API that evolves the ROOT event data I/O for the challenges of the upcoming decades. It focuses on a compact data format, on performance engineering for modern storage hardware, for instance through making parallel and asynchronous I/O calls by default, and on robust interfaces that are easy to use correctly. In this contribution, we evaluate the RNTuple performance for typical HEP analysis tasks. We compare the throughput delivered by RNTuple to popular I/O libraries outside HEP, such as HDF5 and Apache Parquet. We demonstrate the advantages of RNTuple for HEP analysis workflows and provide an outlook on the road to its use in production.Comment: 5 pages, 5 figures; submitted to proceedings of 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021

arXiv.org e-Print Archive

CERN Document Server

Dialable Cryptography for Wireless Networks

Author: Eaddie Marnita T.
Publication venue: AFIT Scholar
Publication date: 06/03/2008
Field of study

The objective of this research is to develop an adaptive cryptographic protocol, which allows users to select an optimal cryptographic strength and algorithm based upon the hardware and bandwidth available and allows users to reason about the level of security versus the system throughput. In this constantly technically-improving society, the ability to communicate via wireless technology provides an avenue for delivering information at anytime nearly anywhere. Sensitive or classified information can be transferred wirelessly across unsecured channels by using cryptographic algorithms. The research presented will focus on dynamically selecting optimal cryptographic algorithms and cryptographic strengths based upon the hardware and bandwidth available. The research will explore the performance of transferring information using various cryptographic algorithms and strengths using different CPU and bandwidths on various sized packets or files. This research will provide a foundation for dynamically selecting cryptographic algorithms and key sizes. The conclusion of the research provides a selection process for users to determine the best cryptographic algorithms and strengths to send desired information without waiting for information security personnel to determine the required method for transferring. This capability will be an important stepping stone towards the military’s vision of future Net-Centric Warfare capabilities

AFTI Scholar (Air Force Institute of Technology)

Improving performance using computational compression through memoization: A case study using a railway power consumption simulator

Author: Calderón Mateos Alejandro
Carretero Pérez Jesús
Fernández Muñoz Javier
García Carballeira Félix
García Fernández Alberto
Publication venue: 'SAGE Publications'
Publication date: 01/11/2016
Field of study

The objective of data compression is to avoid redundancy in order to reduce the size of the data to be stored or transmitted. In some scenarios, data compression may help to increase global performance by reducing the amount of data at a competitive cost in terms of global time and energy consumption. We have introduced computational compression as a technique for reducing redundant computation, in other words, to avoid carrying out the same computation with the same input to obtain the same output. In some scenarios, such as simulations, graphic processing, and so on, part of the computation is repeated using the same input in order to obtain the same output, and this computation could have an important cost in terms of global time and energy consumption. We propose applying computational compression by using memoization in order to store the results for future reuse and, in this way, minimize the use of the same costly computation. Although memoization was proposed for sequential applications in the 1980s, and there are some projects that have applied it in very specific domains, we propose a novel, domain-independent way of using it in high-performance applications, as a means of avoiding redundant computation.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013-41350-P (Scalable Data Management Techniques for High-End Computing Systems)

Universidad Carlos III de Madrid e-Archivo

A holistic scalability strategy for time series databases following cascading polyglot persistence

Author: Becerra Fontal Yolanda
Cucchietti Fernando
García Calatrava Carlos
Publication venue: 'MDPI AG'
Publication date: 01/08/2022
Field of study

Time series databases aim to handle big amounts of data in a fast way, both when introducing new data to the system, and when retrieving it later on. However, depending on the scenario in which these databases participate, reducing the number of requested resources becomes a further requirement. Following this goal, NagareDB and its Cascading Polyglot Persistence approach were born. They were not just intended to provide a fast time series solution, but also to find a great cost-efficiency balance. However, although they provided outstanding results, they lacked a natural way of scaling out in a cluster fashion. Consequently, monolithic approaches could extract the maximum value from the solution but distributed ones had to rely on general scalability approaches. In this research, we proposed a holistic approach specially tailored for databases following Cascading Polyglot Persistence to further maximize its inherent resource-saving goals. The proposed approach reduced the cluster size by 33%, in a setup with just three ingestion nodes and up to 50% in a setup with 10 ingestion nodes. Moreover, the evaluation shows that our scaling method is able to provide efficient cluster growth, offering scalability speedups greater than 85% in comparison to a theoretically 100% perfect scaling, while also ensuring data safety via data replication.This research was partly supported by the Grant Agreement No. 857191, by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB) and by the Generalitat de Catalunya (contract 2017-SGR-1414).Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals