110 research outputs found

    Two high-performance alternatives to ZLIB scientific-data compression

    Get PDF
    ZLIB is used in diverse frameworks by the scientific community, both to reduce disk storage and to alleviate pressure on I/O. As it becomes a bottleneck on multi-core systems, higher throughput alternatives must be considered, exploring parallelism and/or more effective compression schemes. This work provides a comparative study of the ZLIB, LZ4 and FPC compressors (serial and parallel implementations), focusing on CR, bandwidth and speedup. LZ4 provides very high throughput (decompressing over 1GB/s versus 120MB/s for ZLIB) but its CR suffers a degradation of 5-10%. FPC also provides higher throughputs than ZLIB, but the CR varies a lot with the data. ZLIB and LZ4 can achieve almost linear speedups for some datasets, while current implementation of parallel FPC provides little if any performance gain. For the ROOT dataset, LZ4 was found to provide higher CR, scalability and lower memory consumption than FPC, thus emerging as a better alternative to ZLIB.This work is funded by National Funds through the FCT Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project PEst-OE/EEI/UI0752/2014, UT Austin - Portugal FCT grant SFRH/ BD/47840/2008, and the resources from the project SeARCH funded under contract CONC- REEQ/443/2005. We would also like to thank Nuno Castro and Rafael Silva for their contributions

    Monte Carlo Particle Lists: MCPL

    Get PDF
    A binary format with lists of particle state information, for interchanging particles between various Monte Carlo simulation applications, is presented. Portable C code for file manipulation is made available to the scientific community, along with converters and plugins for several popular simulation packages

    Bicriteria data compression

    Get PDF
    The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve effective compression ratio and very efficient decompression speed. Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of its decompression speed and its flexibility in trading decompression speed versus compressed-space efficiency. Each of the existing implementations offers a trade-off between space occupancy and decompression speed, so software engineers have to content themselves by picking the one which comes closer to the requirements of the application in their hands. Starting from these premises, and for the first time in the literature, we address in this paper the problem of trading optimally, and in a principled way, the consumption of these two resources by introducing the Bicriteria LZ77-Parsing problem, which formalizes in a principled way what data-compressors have traditionally approached by means of heuristics. The goal is to determine an LZ77 parsing which minimizes the space occupancy in bits of the compressed file, provided that the decompression time is bounded by a fixed amount (or vice-versa). This way, the software engineer can set its space (or time) requirements and then derive the LZ77 parsing which optimizes the decompression speed (or the space occupancy, respectively). We solve this problem efficiently in O(n log^2 n) time and optimal linear space within a small, additive approximation, by proving and deploying some specific structural properties of the weighted graph derived from the possible LZ77-parsings of the input file. The preliminary set of experiments shows that our novel proposal dominates all the highly engineered competitors, hence offering a win-win situation in theory&practice

    Even bigger data: preparing for the LHC/ATLAS upgrade

    Get PDF
    The Large Hadron Collider’s (LHC) experiments’ data volume is expected to grow one order of magnitude following the machine operation conditions upgrade in 2013-2014. The challenge to the scientific results of our team is: i) how to deal with a 10-fold increase in the data volume that must be processed for each analysis, while ii) supporting the increase in the complexity of the analysis applications, iii) reduce the turnover time of the results and iv) these issues must be addressed with limited additional resources given Europe’s present political and economic panorama. In this paper we take a position in this challenge and on the research directions to be explored. A systematic analysis of the analysis applications is presented to study optimization opportunities of the application and of the underlying running system. Than a new local system architecture is proposed to increase resource usage efficiency and to provide a gradual upgrade route from current systems.FCT grants SFRH/BPD/63495/2009; SFRH/BPD/47928/2008, by the UT Austin | Portugal FCT grant SFRH/BD/47840/2008; FCT project PEst-OE/EEI/UI0752/2011

    Software Challenges For HL-LHC Data Analysis

    Full text link
    The high energy physics community is discussing where investment is needed to prepare software for the HL-LHC and its unprecedented challenges. The ROOT project is one of the central software players in high energy physics since decades. From its experience and expectations, the ROOT team has distilled a comprehensive set of areas that should see research and development in the context of data analysis software, for making best use of HL-LHC's physics potential. This work shows what these areas could be, why the ROOT team believes investing in them is needed, which gains are expected, and where related work is ongoing. It can serve as an indication for future research proposals and cooperations

    RNTuple performance: Status and Outlook

    Full text link
    Upcoming HEP experiments, e.g. at the HL-LHC, are expected to increase the volume of generated data by at least one order of magnitude. In order to retain the ability to analyze the influx of data, full exploitation of modern storage hardware and systems, such as low-latency high-bandwidth NVMe devices and distributed object stores, becomes critical. To this end, the ROOT RNTuple I/O subsystem has been designed to address performance bottlenecks and shortcomings of ROOT's current state of the art TTree I/O subsystem. RNTuple provides a backwards-incompatible redesign of the TTree binary format and access API that evolves the ROOT event data I/O for the challenges of the upcoming decades. It focuses on a compact data format, on performance engineering for modern storage hardware, for instance through making parallel and asynchronous I/O calls by default, and on robust interfaces that are easy to use correctly. In this contribution, we evaluate the RNTuple performance for typical HEP analysis tasks. We compare the throughput delivered by RNTuple to popular I/O libraries outside HEP, such as HDF5 and Apache Parquet. We demonstrate the advantages of RNTuple for HEP analysis workflows and provide an outlook on the road to its use in production.Comment: 5 pages, 5 figures; submitted to proceedings of 20th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT 2021

    Dialable Cryptography for Wireless Networks

    Get PDF
    The objective of this research is to develop an adaptive cryptographic protocol, which allows users to select an optimal cryptographic strength and algorithm based upon the hardware and bandwidth available and allows users to reason about the level of security versus the system throughput. In this constantly technically-improving society, the ability to communicate via wireless technology provides an avenue for delivering information at anytime nearly anywhere. Sensitive or classified information can be transferred wirelessly across unsecured channels by using cryptographic algorithms. The research presented will focus on dynamically selecting optimal cryptographic algorithms and cryptographic strengths based upon the hardware and bandwidth available. The research will explore the performance of transferring information using various cryptographic algorithms and strengths using different CPU and bandwidths on various sized packets or files. This research will provide a foundation for dynamically selecting cryptographic algorithms and key sizes. The conclusion of the research provides a selection process for users to determine the best cryptographic algorithms and strengths to send desired information without waiting for information security personnel to determine the required method for transferring. This capability will be an important stepping stone towards the military’s vision of future Net-Centric Warfare capabilities

    Improving performance using computational compression through memoization: A case study using a railway power consumption simulator

    Get PDF
    The objective of data compression is to avoid redundancy in order to reduce the size of the data to be stored or transmitted. In some scenarios, data compression may help to increase global performance by reducing the amount of data at a competitive cost in terms of global time and energy consumption. We have introduced computational compression as a technique for reducing redundant computation, in other words, to avoid carrying out the same computation with the same input to obtain the same output. In some scenarios, such as simulations, graphic processing, and so on, part of the computation is repeated using the same input in order to obtain the same output, and this computation could have an important cost in terms of global time and energy consumption. We propose applying computational compression by using memoization in order to store the results for future reuse and, in this way, minimize the use of the same costly computation. Although memoization was proposed for sequential applications in the 1980s, and there are some projects that have applied it in very specific domains, we propose a novel, domain-independent way of using it in high-performance applications, as a means of avoiding redundant computation.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013-41350-P (Scalable Data Management Techniques for High-End Computing Systems)

    A holistic scalability strategy for time series databases following cascading polyglot persistence

    Get PDF
    Time series databases aim to handle big amounts of data in a fast way, both when introducing new data to the system, and when retrieving it later on. However, depending on the scenario in which these databases participate, reducing the number of requested resources becomes a further requirement. Following this goal, NagareDB and its Cascading Polyglot Persistence approach were born. They were not just intended to provide a fast time series solution, but also to find a great cost-efficiency balance. However, although they provided outstanding results, they lacked a natural way of scaling out in a cluster fashion. Consequently, monolithic approaches could extract the maximum value from the solution but distributed ones had to rely on general scalability approaches. In this research, we proposed a holistic approach specially tailored for databases following Cascading Polyglot Persistence to further maximize its inherent resource-saving goals. The proposed approach reduced the cluster size by 33%, in a setup with just three ingestion nodes and up to 50% in a setup with 10 ingestion nodes. Moreover, the evaluation shows that our scaling method is able to provide efficient cluster growth, offering scalability speedups greater than 85% in comparison to a theoretically 100% perfect scaling, while also ensuring data safety via data replication.This research was partly supported by the Grant Agreement No. 857191, by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB) and by the Generalitat de Catalunya (contract 2017-SGR-1414).Peer ReviewedPostprint (published version
    • …
    corecore