212 research outputs found
Bicriteria data compression
The advent of massive datasets (and the consequent design of high-performing
distributed storage systems) have reignited the interest of the scientific and
engineering community towards the design of lossless data compressors which
achieve effective compression ratio and very efficient decompression speed.
Lempel-Ziv's LZ77 algorithm is the de facto choice in this scenario because of
its decompression speed and its flexibility in trading decompression speed
versus compressed-space efficiency. Each of the existing implementations offers
a trade-off between space occupancy and decompression speed, so software
engineers have to content themselves by picking the one which comes closer to
the requirements of the application in their hands. Starting from these
premises, and for the first time in the literature, we address in this paper
the problem of trading optimally, and in a principled way, the consumption of
these two resources by introducing the Bicriteria LZ77-Parsing problem, which
formalizes in a principled way what data-compressors have traditionally
approached by means of heuristics. The goal is to determine an LZ77 parsing
which minimizes the space occupancy in bits of the compressed file, provided
that the decompression time is bounded by a fixed amount (or vice-versa). This
way, the software engineer can set its space (or time) requirements and then
derive the LZ77 parsing which optimizes the decompression speed (or the space
occupancy, respectively). We solve this problem efficiently in O(n log^2 n)
time and optimal linear space within a small, additive approximation, by
proving and deploying some specific structural properties of the weighted graph
derived from the possible LZ77-parsings of the input file. The preliminary set
of experiments shows that our novel proposal dominates all the highly
engineered competitors, hence offering a win-win situation in theory&practice
Preemptive scheduling on uniform parallel machines with controllable job processing times
In this paper, we provide a unified approach to solving preemptive scheduling problems with uniform parallel machines and controllable processing times. We demonstrate that a single criterion problem of minimizing total compression cost subject to the constraint that all due dates should be met can be formulated in terms of maximizing a linear function over a generalized polymatroid. This justifies applicability of the greedy approach and allows us to develop fast algorithms for solving the problem with arbitrary release and due dates as well as its special case with zero release dates and a common due date. For the bicriteria counterpart of the latter problem we develop an efficient algorithm that constructs the trade-off curve for minimizing the compression cost and the makespan
A Bicriteria Approximation for the Reordering Buffer Problem
In the reordering buffer problem (RBP), a server is asked to process a
sequence of requests lying in a metric space. To process a request the server
must move to the corresponding point in the metric. The requests can be
processed slightly out of order; in particular, the server has a buffer of
capacity k which can store up to k requests as it reads in the sequence. The
goal is to reorder the requests in such a manner that the buffer constraint is
satisfied and the total travel cost of the server is minimized. The RBP arises
in many applications that require scheduling with a limited buffer capacity,
such as scheduling a disk arm in storage systems, switching colors in paint
shops of a car manufacturing plant, and rendering 3D images in computer
graphics.
We study the offline version of RBP and develop bicriteria approximations.
When the underlying metric is a tree, we obtain a solution of cost no more than
9OPT using a buffer of capacity 4k + 1 where OPT is the cost of an optimal
solution with buffer capacity k. Constant factor approximations were known
previously only for the uniform metric (Avigdor-Elgrabli et al., 2012). Via
randomized tree embeddings, this implies an O(log n) approximation to cost and
O(1) approximation to buffer size for general metrics. Previously the best
known algorithm for arbitrary metrics by Englert et al. (2007) provided an
O(log^2 k log n) approximation without violating the buffer constraint.Comment: 13 page
An Efficient Streaming Algorithm for the Submodular Cover Problem
We initiate the study of the classical Submodular Cover (SC) problem in the
data streaming model which we refer to as the Streaming Submodular Cover (SSC).
We show that any single pass streaming algorithm using sublinear memory in the
size of the stream will fail to provide any non-trivial approximation
guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only
seek to find a partial cover.
We design the first Efficient bicriteria Submodular Cover Streaming
(ESC-Streaming) algorithm for this problem, and provide theoretical guarantees
for its performance supported by numerical evidence. Our algorithm finds
solutions that are competitive with the near-optimal offline greedy algorithm
despite requiring only a single pass over the data stream. In our numerical
experiments, we evaluate the performance of ESC-Streaming on active set
selection and large-scale graph cover problems.Comment: To appear in NIPS'1
RLZAP: Relative Lempel-Ziv with Adaptive Pointers
Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of
genomes from individuals of the same species when fast random access is
desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a
reference genome is selected and then the other genomes are greedily parsed
into phrases exactly matching substrings of the reference. Deorowicz and
Grabowski (Bioinformatics, 2011) pointed out that letting each phrase end with
a mismatch character usually gives better compression because many of the
differences between individuals' genomes are single-nucleotide substitutions.
Ferrada et al. (SPIRE 2014) then pointed out that also using relative pointers
and run-length compressing them usually gives even better compression. In this
paper we generalize Ferrada et al.'s idea to handle well also short insertions,
deletions and multi-character substitutions. We show experimentally that our
generalization achieves better compression than Ferrada et al.'s implementation
with comparable random-access times
Multiobjective genetic algorithm strategies for electricity production from generation IV nuclear technology
Development of a technico-economic optimization strategy of cogeneration systems of electricity/hydrogen, consists in finding an optimal efficiency of the generating cycle and heat delivery system, maximizing the energy production and minimizing the production costs. The first part of the paper is related to the development of a multiobjective optimization library (MULTIGEN) to tackle all types of problems arising from cogeneration. After a literature review for identifying the most efficient methods, the MULTIGEN library is described, and the innovative points are listed. A new stopping criterion, based on the stagnation of the Pareto front, may lead to significant decrease of computational times, particularly in the case of problems involving only integer variables. Two practical examples are presented in the last section. The former is devoted to a bicriteria optimization of both exergy destruction and total cost of the plant, for a generating cycle coupled with a Very High Temperature Reactor (VHTR). The second example consists in designing the heat exchanger of the generating turbomachine. Three criteria are optimized: the exchange surface, the exergy destruction and the number of exchange modules
Handling Scheduling Problems with Controllable Parameters by Methods of Submodular Optimization
In this paper, we demonstrate how scheduling problems with controllable processing times can be reformulated as maximization linear programming problems over a submodular polyhedron intersected with a box. We explain a decomposition algorithm for solving the latter problem and discuss its implications for the relevant problems of preemptive scheduling on a single machine and parallel machines
- âŠ