149,363 research outputs found
Generic design of Chinese remaindering schemes
We propose a generic design for Chinese remainder algorithms. A Chinese
remainder computation consists in reconstructing an integer value from its
residues modulo non coprime integers. We also propose an efficient linear data
structure, a radix ladder, for the intermediate storage and computations. Our
design is structured into three main modules: a black box residue computation
in charge of computing each residue; a Chinese remaindering controller in
charge of launching the computation and of the termination decision; an integer
builder in charge of the reconstruction computation. We then show that this
design enables many different forms of Chinese remaindering (e.g.
deterministic, early terminated, distributed, etc.), easy comparisons between
these forms and e.g. user-transparent parallelism at different parallel grains
Parallel Deferred Update Replication
Deferred update replication (DUR) is an established approach to implementing
highly efficient and available storage. While the throughput of read-only
transactions scales linearly with the number of deployed replicas in DUR, the
throughput of update transactions experiences limited improvements as replicas
are added. This paper presents Parallel Deferred Update Replication (P-DUR), a
variation of classical DUR that scales both read-only and update transactions
with the number of cores available in a replica. In addition to introducing the
new approach, we describe its full implementation and compare its performance
to classical DUR and to Berkeley DB, a well-known standalone database
Using consistent subcuts for detecting stable properties
We present a general protocol for detecting whether a property holds in a distributed system, where the property is a member of a subclass of stable properties we call the locally stable properties. Our protocol is based on a decentralized method for constructing a maximal subset of the local states that are mutually consistent, which in turn is based on a weakened version of vectored time stamps. The structure of our protocol lends itself to refinement, and we demonstrate its utility by deriving some specialized property-detection protocols, including two previously known protocols that are known to be effective
A survey of parallel execution strategies for transitive closure and logic programs
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods
Parallelizing RRT on large-scale distributed-memory architectures
This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance
A decomposition procedure based on approximate newton directions
The efficient solution of large-scale linear and nonlinear optimization problems may require exploiting any special structure in them in an efficient manner. We describe and analyze some cases in which this special structure can be used with very little cost to obtain search directions from decomposed subproblems. We also study how to correct these directions using (decomposable) preconditioned conjugate gradient methods to ensure local convergence in all cases. The choice of appropriate preconditioners results in a natural manner from the structure in the problem. Finally, we conduct computational experiments to compare the resulting procedures with direct methods, as well as to study the impact of different preconditioner choices
A System for Distributed Mechanisms: Design, Implementation and Applications
We describe here a structured system for distributed mechanism design
appropriate for both Intranet and Internet applications. In our approach the
players dynamically form a network in which they know neither their neighbours
nor the size of the network and interact to jointly take decisions. The only
assumption concerning the underlying communication layer is that for each pair
of processes there is a path of neighbours connecting them. This allows us to
deal with arbitrary network topologies.
We also discuss the implementation of this system which consists of a
sequence of layers. The lower layers deal with the operations that implement
the basic primitives of distributed computing, namely low level communication
and distributed termination, while the upper layers use these primitives to
implement high level communication among players, including broadcasting and
multicasting, and distributed decision making.
This yields a highly flexible distributed system whose specific applications
are realized as instances of its top layer. This design is implemented in Java.
The system supports at various levels fault-tolerance and includes a
provision for distributed policing the purpose of which is to exclude
`dishonest' players. Also, it can be used for repeated creation of dynamically
formed networks of players interested in a joint decision making implemented by
means of a tax-based mechanism. We illustrate its flexibility by discussing a
number of implemented examples.Comment: 36 pages; revised and expanded versio
A Reliable and Cost-Efficient Auto-Scaling System for Web Applications Using Heterogeneous Spot Instances
Cloud providers sell their idle capacity on markets through an auction-like
mechanism to increase their return on investment. The instances sold in this
way are called spot instances. In spite that spot instances are usually 90%
cheaper than on-demand instances, they can be terminated by provider when their
bidding prices are lower than market prices. Thus, they are largely used to
provision fault-tolerant applications only. In this paper, we explore how to
utilize spot instances to provision web applications, which are usually
considered availability-critical. The idea is to take advantage of differences
in price among various types of spot instances to reach both high availability
and significant cost saving. We first propose a fault-tolerant model for web
applications provisioned by spot instances. Based on that, we devise novel
auto-scaling polices for hourly billed cloud markets. We implemented the
proposed model and policies both on a simulation testbed for repeatable
validation and Amazon EC2. The experiments on the simulation testbed and the
real platform against the benchmarks show that the proposed approach can
greatly reduce resource cost and still achieve satisfactory Quality of Service
(QoS) in terms of response time and availability
Randomized protocols for asynchronous consensus
The famous Fischer, Lynch, and Paterson impossibility proof shows that it is
impossible to solve the consensus problem in a natural model of an asynchronous
distributed system if even a single process can fail. Since its publication,
two decades of work on fault-tolerant asynchronous consensus algorithms have
evaded this impossibility result by using extended models that provide (a)
randomization, (b) additional timing assumptions, (c) failure detectors, or (d)
stronger synchronization mechanisms than are available in the basic model.
Concentrating on the first of these approaches, we illustrate the history and
structure of randomized asynchronous consensus protocols by giving detailed
descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of
Distributed Computin
Shortest, Fastest, and Foremost Broadcast in Dynamic Networks
Highly dynamic networks rarely offer end-to-end connectivity at a given time.
Yet, connectivity in these networks can be established over time and space,
based on temporal analogues of multi-hop paths (also called {\em journeys}).
Attempting to optimize the selection of the journeys in these networks
naturally leads to the study of three cases: shortest (minimum hop), fastest
(minimum duration), and foremost (earliest arrival) journeys. Efficient
centralized algorithms exists to compute all cases, when the full knowledge of
the network evolution is given.
In this paper, we study the {\em distributed} counterparts of these problems,
i.e. shortest, fastest, and foremost broadcast with termination detection
(TDB), with minimal knowledge on the topology.
We show that the feasibility of each of these problems requires distinct
features on the evolution, through identifying three classes of dynamic graphs
wherein the problems become gradually feasible: graphs in which the
re-appearance of edges is {\em recurrent} (class R), {\em bounded-recurrent}
(B), or {\em periodic} (P), together with specific knowledge that are
respectively (the number of nodes), (a bound on the recurrence
time), and (the period). In these classes it is not required that all pairs
of nodes get in contact -- only that the overall {\em footprint} of the graph
is connected over time.
Our results, together with the strict inclusion between , , and ,
implies a feasibility order among the three variants of the problem, i.e.
TDB[foremost] requires weaker assumptions on the topology dynamics than
TDB[shortest], which itself requires less than TDB[fastest]. Reversely, these
differences in feasibility imply that the computational powers of ,
, and also form a strict hierarchy
- …