Search CORE

65,911 research outputs found

Fast Lean Erasure-Coded Atomic Memory Object

Author: Konwar Kishori M.
Lynch Nancy
Prakash N.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 23rd International Conference on Principles of Distributed Systems (OPODIS 2019)
Publication date: 01/01/2020
Field of study

In this work, we propose FLECKS, an algorithm which implements atomic memory objects in a multi-writer multi-reader (MWMR) setting in asynchronous networks and server failures. FLECKS substantially reduces storage and communication costs over its replication-based counterparts by employing erasure-codes. FLECKS outperforms the previously proposed algorithms in terms of the metrics that to deliver good performance such as storage cost per object, communication cost a high fault-tolerance of clients and servers, guaranteed liveness of operation, and a given number of communication rounds per operation, etc. We provide proofs for liveness and atomicity properties of FLECKS and derive worst-case latency bounds for the operations. We implemented and deployed FLECKS in cloud-based clusters and demonstrate that FLECKS has substantially lower storage and bandwidth costs, and significantly lower latency of operations than the replication-based mechanisms

Dagstuhl Research Online Publication Server

Distributed Training Large-Scale Deep Architectures

Author: Chang Edward Y.
Chen Chun-Yen
Chou Chun-Nan
Lin Ting-Wei
Sung Cheng-Lung
Tsao Chia-Chin
Tung Kuan-Chieh
Wu Jui-Lin
Zou Shang-Xuan
Publication venue
Publication date: 10/08/2017
Field of study

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training

arXiv.org e-Print Archive

Crossref

The K-Server Dual and Loose Competitiveness for Paging

Author: C. H. Papadimitriou
D. D. Sleator
L. A. McGeoch
M. Chrobak
M. Chrobak
M. S. Manasse
N. Young
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1994
Field of study

This paper has two results. The first is based on the surprising observation that the well-known ``least-recently-used'' paging algorithm and the ``balance'' algorithm for weighted caching are linear-programming primal-dual algorithms. This observation leads to a strategy (called ``Greedy-Dual'') that generalizes them both and has an optimal performance guarantee for weighted caching. For the second result, the paper presents empirical studies of paging algorithms, documenting that in practice, on ``typical'' cache sizes and sequences, the performance of paging strategies are much better than their worst-case analyses in the standard model suggest. The paper then presents theoretical results that support and explain this. For example: on any input sequence, with almost all cache sizes, either the performance guarantee of least-recently-used is O(log k) or the fault rate (in an absolute sense) is insignificant. Both of these results are strengthened and generalized in``On-line File Caching'' (1998).Comment: conference version: "On-Line Caching as Cache Size Varies", SODA (1991

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Dynamic Parameter Allocation in Parameter Servers

Author: Gemulla Rainer
Markl Volker
Renz-Wieland Alexander
Zeuch Steffen
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2020
Field of study

To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in distributed training---, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near-linear scaling and can be orders of magnitude faster than existing parameter servers

arXiv.org e-Print Archive

MAnnheim DOCument Server

The End of Slow Networks: It's Time for a Redesign

Author: Binnig Carsten
Crotty Andrew
Galakatos Alex
Kraska Tim
Zamanian Erfan
Publication venue
Publication date: 19/12/2015
Field of study

Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

arXiv.org e-Print Archive

TUbiblio