476 research outputs found
On optimally partitioning a text to improve its compression
In this paper we investigate the problem of partitioning an input string T in
such a way that compressing individually its parts via a base-compressor C gets
a compressed output that is shorter than applying C over the entire T at once.
This problem was introduced in the context of table compression, and then
further elaborated and extended to strings and trees. Unfortunately, the
literature offers poor solutions: namely, we know either a cubic-time algorithm
for computing the optimal partition based on dynamic programming, or few
heuristics that do not guarantee any bounds on the efficacy of their computed
partition, or algorithms that are efficient but work in some specific scenarios
(such as the Burrows-Wheeler Transform) and achieve compression performance
that might be worse than the optimal-partitioning by a
factor. Therefore, computing efficiently the optimal solution is still open. In
this paper we provide the first algorithm which is guaranteed to compute in
O(n \log_{1+\eps}n) time a partition of T whose compressed output is
guaranteed to be no more than -worse the optimal one, where
may be any positive constant
La giurisdizione volontaria nel Diritto Processuale Civile Internazionale
Divulgação dos SUMÁRIOS das obras recentemente incorporadas ao acervo da Biblioteca Ministro Oscar Saraiva do STJ. Em respeito à lei de Direitos Autorais, não disponibilizamos a obra na íntegra. STJ0008192
Bicriteria data compression
The advent of massive datasets (and the consequent design of high-performing
distributed storage systems) have reignited the interest of the scientific and
engineering community towards the design of lossless data compressors which
achieve effective compression ratio and very efficient decompression speed.
Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of
its decompression speed and its flexibility in trading decompression speed
versus compressed-space efficiency. Each of the existing implementations offers
a trade-off between space occupancy and decompression speed, so software
engineers have to content themselves by picking the one which comes closer to
the requirements of the application in their hands. Starting from these
premises, and for the first time in the literature, we address in this paper
the problem of trading optimally, and in a principled way, the consumption of
these two resources by introducing the Bicriteria LZ77-Parsing problem, which
formalizes in a principled way what data-compressors have traditionally
approached by means of heuristics. The goal is to determine an LZ77 parsing
which minimizes the space occupancy in bits of the compressed file, provided
that the decompression time is bounded by a fixed amount (or vice-versa). This
way, the software engineer can set its space (or time) requirements and then
derive the LZ77 parsing which optimizes the decompression speed (or the space
occupancy, respectively). We solve this problem efficiently in O(n log^2 n)
time and optimal linear space within a small, additive approximation, by
proving and deploying some specific structural properties of the weighted graph
derived from the possible LZ77-parsings of the input file. The preliminary set
of experiments shows that our novel proposal dominates all the highly
engineered competitors, hence offering a win-win situation in theory&practice
Compressed Text Indexes:From Theory to Practice!
A compressed full-text self-index represents a text in a compressed form and
still answers queries efficiently. This technology represents a breakthrough
over the text indexing techniques of the previous decade, whose indexes
required several times the size of the text. Although it is relatively new,
this technology has matured up to a point where theoretical research is giving
way to practical developments. Nonetheless this requires significant
programming skills, a deep engineering effort, and a strong algorithmic
background to dig into the research results. To date only isolated
implementations and focused comparisons of compressed indexes have been
reported, and they missed a common API, which prevented their re-use or
deployment within other applications.
The goal of this paper is to fill this gap. First, we present the existing
implementations of compressed indexes from a practitioner's point of view.
Second, we introduce the Pizza&Chili site, which offers tuned implementations
and a standardized API for the most successful compressed full-text
self-indexes, together with effective testbeds and scripts for their automatic
validation and test. Third, we show the results of our extensive experiments on
these codes with the aim of demonstrating the practical relevance of this novel
and exciting technology
Cache-Oblivious Peeling of Random Hypergraphs
The computation of a peeling order in a randomly generated hypergraph is the
most time-consuming step in a number of constructions, such as perfect hashing
schemes, random -SAT solvers, error-correcting codes, and approximate set
encodings. While there exists a straightforward linear time algorithm, its poor
I/O performance makes it impractical for hypergraphs whose size exceeds the
available internal memory.
We show how to reduce the computation of a peeling order to a small number of
sequential scans and sorts, and analyze its I/O complexity in the
cache-oblivious model. The resulting algorithm requires
I/Os and time to peel a random hypergraph with edges.
We experimentally evaluate the performance of our implementation of this
algorithm in a real-world scenario by using the construction of minimal perfect
hash functions (MPHF) as our test case: our algorithm builds a MPHF of
billion keys in less than hours on a single machine. The resulting data
structure is both more space-efficient and faster than that obtained with the
current state-of-the-art MPHF construction for large-scale key sets
Exhaust Energy Recovery with Variable Geometry Turbine to Reduce Fuel Consumption for Microcars
The objective proposed by EU to reduce by about 4%/year CO2 emission of internal combustion engines for
the next years up to 2030, requires to increase the engine efficiency and accordingly improving the technology.
In this framework, hybrid powertrains can have the possibility of a deep market penetration since they may recover energy during brake, allow the engine to operate in better efficiency conditions and with less transients, Moreover, they can recover a large amount of energy lost through the exhaust and use it to reduce fuel consumption.
This paper concerns the modification of a conventional two in-line cylinders Diesel engine (440 cm3) adding a variable geometry turbine (VGT) coupled with a generator. The turbine is used to recover exhaust gas energy that otherwise would be lost.
The generator, connected to the turbo shaft, converts mechanical energy into electrical energy and is used to charge the vehicle battery or the auxiliaries. The aim of this work is reducing fuel consumption by replacing the alternator with a kind of electric turbo-compounding system to drive vehicle auxiliaries. If the selected turbine recovers enough energy to power auxiliaries, the alternator, which usually has low efficiency, can be removed. Along these lines, fuel consumption savings can be achieved. At a later stage, a microcar has been tested on WLTC (Class 1) driving cycle. The results show fuel consumption reduction of 6 to 9%, depending on VGT size. Indeed, four different VGT sizes have been analyzed to choose the optimal configuration that reflects a compromise between energy recovery and fuel consumption reductions
Unsteady cfd analysis of erosion mechanism in the coolant channels of a rotating gas turbine blade
The two-phase flow in a rotating wedge mimicking the final portion of a blade turbine internal cooling channel is here presented and discussed focusing on unsteady motion and erosion mechanisms. The rotation axis is placed to properly reproduce a configuration with a very strong deviation (90°).
The flow field was modelled by using the well known k---f unsteady-RANS model based on the elliptic-relaxation concept. The model was modified by some of the authors to take into account the influence of turbulence anisotropy as well as rotation. The model was applied to the well-established and fully validated T-FlowS code.
A systematic comparison of rotating and non-rotating case was carried out to show the influence of Coriolis force on flow and erosion mechanisms.
The rotational effects strongly changed the flow behaviour within the channel, affecting both the unsteady flow and the particles trajectories. In the rotating case, there is no recirculation on the tip region; besides, position of the small recirculation regions above each pedestals change. These, and other minor effects, affect the particle motion thus resulting in a different erosion pattern
Distribution-Aware Compressed Full-Text Indexes
Peer reviewe
BAC: A bagged associative classifier for big data frameworks
Big Data frameworks allow powerful distributed computations extending the results achievable on a single machine. In this work, we present a novel distributed associative classifier, named BAC, based on ensemble techniques. Ensembles are a popular approach that builds several models on different subsets of the original dataset, eventually voting to provide a unique classification outcome. Experiments on Apache Spark and preliminary results showed the capability of the proposed ensemble classifier to obtain a quality comparable with the single-machine version on popular real-world datasets, and overcome their scalability limits on large synthetic datasets
- …