12 research outputs found
On-Line Paging against Adversarially Biased Random Inputs
In evaluating an algorithm, worst-case analysis can be overly pessimistic.
Average-case analysis can be overly optimistic. An intermediate approach is to
show that an algorithm does well on a broad class of input distributions.
Koutsoupias and Papadimitriou recently analyzed the least-recently-used (LRU)
paging strategy in this manner, analyzing its performance on an input sequence
generated by a so-called diffuse adversary -- one that must choose each request
probabilitistically so that no page is chosen with probability more than some
fixed epsilon>0. They showed that LRU achieves the optimal competitive ratio
(for deterministic on-line algorithms), but they didn't determine the actual
ratio.
In this paper we estimate the optimal ratios within roughly a factor of two
for both deterministic strategies (e.g. least-recently-used and
first-in-first-out) and randomized strategies. Around the threshold epsilon ~
1/k (where k is the cache size), the optimal ratios are both Theta(ln k). Below
the threshold the ratios tend rapidly to O(1). Above the threshold the ratio is
unchanged for randomized strategies but tends rapidly to Theta(k) for
deterministic ones.
We also give an alternate proof of the optimality of LRU.Comment: Conference version appeared in SODA '98 as "Bounding the Diffuse
Adversary
Probabilistic alternatives for competitive analysis
In the last 20 years competitive analysis has become the main tool for analyzing the quality of online algorithms. Despite of this, competitive analysis has also been criticized: it sometimes cannot discriminate between algorithms that exhibit significantly different empirical behavior or it even favors an algorithm that is worse from an empirical point of view. Therefore, there have been several approaches to circumvent these drawbacks. In this survey, we discuss probabilistic alternatives for competitive analysis.operations research and management science;
First-Come-First-Served for Online Slot Allocation and Huffman Coding
Can one choose a good Huffman code on the fly, without knowing the underlying
distribution? Online Slot Allocation (OSA) models this and similar problems:
There are n slots, each with a known cost. There are n items. Requests for
items are drawn i.i.d. from a fixed but hidden probability distribution p.
After each request, if the item, i, was not previously requested, then the
algorithm (knowing the slot costs and the requests so far, but not p) must
place the item in some vacant slot j(i). The goal is to minimize the sum, over
the items, of the probability of the item times the cost of its assigned slot.
The optimal offline algorithm is trivial: put the most probable item in the
cheapest slot, the second most probable item in the second cheapest slot, etc.
The optimal online algorithm is First Come First Served (FCFS): put the first
requested item in the cheapest slot, the second (distinct) requested item in
the second cheapest slot, etc. The optimal competitive ratios for any online
algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For
logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log
opt).
For Huffman coding, FCFS yields an online algorithm (one that allocates
codewords on demand, without knowing the underlying probability distribution)
that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2.Comment: ACM-SIAM Symposium on Discrete Algorithms (SODA) 201
Accelerating Non-volatile/Hybrid Processor Cache Design Space Exploration for Application Specific Embedded Systems
In this article, we propose a technique to accelerate nonvolatile or hybrid
of volatile and nonvolatile processor cache design space exploration for
application specific embedded systems. Utilizing a novel cache behavior
modeling equation and a new accurate cache miss prediction mechanism, our
proposed technique can accelerate NVM or hybrid FIFO processor cache design
space exploration for SPEC CPU 2000 applications up to 249 times compared to
the conventional approach
Optimal Eviction Policies for Stochastic Address Traces
The eviction problem for memory hierarchies is studied for the Hidden Markov
Reference Model (HMRM) of the memory trace, showing how miss minimization can
be naturally formulated in the optimal control setting. In addition to the
traditional version assuming a buffer of fixed capacity, a relaxed version is
also considered, in which buffer occupancy can vary and its average is
constrained. Resorting to multiobjective optimization, viewing occupancy as a
cost rather than as a constraint, the optimal eviction policy is obtained by
composing solutions for the individual addressable items.
This approach is then specialized to the Least Recently Used Stack Model
(LRUSM), a type of HMRM often considered for traces, which includes V-1
parameters, where V is the size of the virtual space. A gain optimal policy for
any target average occupancy is obtained which (i) is computable in time O(V)
from the model parameters, (ii) is optimal also for the fixed capacity case,
and (iii) is characterized in terms of priorities, with the name of Least
Profit Rate (LPR) policy. An O(log C) upper bound (being C the buffer capacity)
is derived for the ratio between the expected miss rate of LPR and that of OPT,
the optimal off-line policy; the upper bound is tightened to O(1), under
reasonable constraints on the LRUSM parameters. Using the stack-distance
framework, an algorithm is developed to compute the number of misses incurred
by LPR on a given input trace, simultaneously for all buffer capacities, in
time O(log V) per access.
Finally, some results are provided for miss minimization over a finite
horizon and over an infinite horizon under bias optimality, a criterion more
stringent than gain optimality.Comment: 37 pages, 3 figure
On the design of efficient caching systems
Content distribution is currently the prevalent Internet use case, accounting for the majority of global Internet traffic and growing exponentially. There is general consensus that the most effective method to deal with the large amount of content demand is through the deployment of massively distributed caching infrastructures as the means to localise content delivery traffic. Solutions based on caching have been already widely deployed through Content Delivery Networks. Ubiquitous caching is also a fundamental aspect of the emerging Information-Centric Networking paradigm which aims to rethink the current Internet architecture for long term evolution. Distributed content caching systems are expected to grow substantially in the future, in terms of both footprint and traffic carried and, as such, will become substantially more complex and costly. This thesis addresses the problem of designing scalable and cost-effective distributed caching systems that will be able to efficiently support the expected massive growth of content traffic and makes three distinct contributions. First, it produces an extensive theoretical characterisation of sharding, which is a widely used technique to allocate data items to resources of a distributed system according to a hash function. Based on the findings unveiled by this analysis, two systems are designed contributing to the abovementioned objective. The first is a framework and related algorithms for enabling efficient load-balanced content caching. This solution provides qualitative advantages over previously proposed solutions, such as ease of modelling and availability of knobs to fine-tune performance, as well as quantitative advantages, such as 2x increase in cache hit ratio and 19-33% reduction in load imbalance while maintaining comparable latency to other approaches. The second is the design and implementation of a caching node enabling 20 Gbps speeds based on inexpensive commodity hardware. We believe these contributions advance significantly the state of the art in distributed caching systems