141 research outputs found
Multifaceted Faculty Network Design and Management: Practice and Experience Report
We report on our experience on multidimensional aspects of our faculty's
network design and management, including some unique aspects such as
campus-wide VLANs and ghosting, security and monitoring, switching and routing,
and others. We outline a historical perspective on certain research, design,
and development decisions and discuss the network topology, its scalability,
and management in detail; the services our network provides, and its evolution.
We overview the security aspects of the management as well as data management
and automation and the use of the data by other members of the IT group in the
faculty.Comment: 19 pages, 11 figures, TOC and index; a short version presented at
C3S2E'11; v6: more proofreading, index, TOC, reference
Supporting distributed computation over wide area gigabit networks
The advent of high bandwidth fibre optic links that may be used over very large distances
has lead to much research and development in the field of wide area gigabit networking. One
problem that needs to be addressed is how loosely coupled distributed systems may be built over
these links, allowing many computers worldwide to take part in complex calculations in order
to solve "Grand Challenge" problems. The research conducted as part of this PhD has looked
at the practicality of implementing a communication mechanism proposed by Craig Partridge
called Late-binding Remote Procedure Calls (LbRPC).
LbRPC is intended to export both code and data over the network to remote machines for
evaluation, as opposed to traditional RPC mechanisms that only send parameters to pre-existing
remote procedures. The ability to send code as well as data means that LbRPC requests can
overcome one of the biggest problems in Wide Area Distributed Computer Systems (WADCS):
the fixed latency due to the speed of light. As machines get faster, the fixed multi-millisecond
round trip delay equates to ever increasing numbers of CPU cycles. For a WADCS to be
efficient, programs should minimise the number of network transits they incur. By allowing the
application programmer to export arbitrary code to the remote machine, this may be achieved.
This research has looked at the feasibility of supporting secure exportation of arbitrary
code and data in heterogeneous, loosely coupled, distributed computing environments. It has
investigated techniques for making placement decisions for the code in cases where there are a
large number of widely dispersed remote servers that could be used. The latter has resulted in
the development of a novel prototype LbRPC using multicast IP for implicit placement and a
sequenced, multi-packet saturation multicast transport protocol. These prototypes show that
it is possible to export code and data to multiple remote hosts, thereby removing the need to
perform complex and error prone explicit process placement decisions
High Performance Computing using Infiniband-based clusters
L'abstract è presente nell'allegato / the abstract is in the attachmen
Tyr: Blob Storage Meets Built-In Transactions
International audienceConcurrent Big Data applications often require high-performance storage, as well as ACID (Atomicity, Consistency , Isolation, Durability) transaction support. Although blobs (binary large objects) are an increasingly popular model for addressing the storage needs of such applications, state-of-the-art blob storage systems typically offer no transaction semantics. This demands users to coordinate access to data carefully in order to avoid race conditions, inconsistent writes, overwrites and other problems that cause erratic behavior. We argue there is a gap between existing storage solutions and application requirements, which limits the design of transaction-oriented applications. We introduce Tyr , the first blob storage system to provide built-in, multiblob transactions, while retaining sequential consistency and high throughput under heavy access concurrency. Tyr offers fine-grained random write access to data and in-place atomic operations. Large-scale experiments on Microsoft Azure with a production application from CERN LHC show Tyr throughput outperforming state-of-the-art solutions by more than 75%
Atomic Transfer for Distributed Systems
Building applications and information systems increasingly means dealing with concurrency and faults stemming from distribution of system components. Atomic transactions are a well-known method for transferring the responsibility for handling concurrency and faults from developers to the software\u27s execution environment, but incur considerable execution overhead. This dissertation investigates methods that shift some of the burden of concurrency control into the network layer, to reduce response times and increase throughput. It anticipates future programmable network devices, enabling customized high-performance network protocols.
We propose Atomic Transfer (AT), a distributed algorithm to prevent race conditions due to messages crossing on a path of network switches. Switches check request messages for conflicts with response messages traveling in the opposite direction. Conflicting requests are dropped, obviating the request\u27s receiving host from detecting and handling the conflict. AT is designed to perform well under high data contention, as concurrency control effort is balanced across a network instead of being handled by the contended endpoint hosts themselves.
We use AT as the basis for a new optimistic transactional cache consistency algorithm, supporting execution of atomic applications caching shared data. We then present a scalable refinement, allowing hierarchical consistent caches with predictable performance despite high data update rates.
We give detailed I/O Automata models of our algorithms along with correctness proofs. We begin with a simplified model, assuming static network paths and no message loss, and then refine it to support dynamic network paths and safe handling of message loss.
We present a trie-based data structure for accelerating conflict-checking on switches, with benchmarks suggesting the feasibility of our approach from a performance stand-point
- …