Search CORE

3,341 research outputs found

Checkpointing algorithms and fault prediction

Author: Aupy Guillaume
Robert Yves
Vivien Frédéric
Zaidouni Dounia
Publication venue: 'Elsevier BV'
Publication date: 14/02/2013
Field of study

This paper deals with the impact of fault prediction techniques on checkpointing strategies. We extend the classical first-order analysis of Young and Daly in the presence of a fault prediction system, characterized by its recall and its precision. In this framework, we provide an optimal algorithm to decide when to take predictions into account, and we derive the optimal value of the checkpointing period. These results allow to analytically assess the key parameters that impact the performance of fault predictors at very large scale.Comment: Supported in part by ANR Rescue. Published in Journal of Parallel and Distributed Computing. arXiv admin note: text overlap with arXiv:1207.693

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Robust Algorithms for the Secretary Problem

Author: Bradac Domagoj
Gupta Anupam
Singla Sahil
Zuzic Goran
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 27/11/2019
Field of study

In classical secretary problems, a sequence of n elements arrive in a uniformly random order, and we want to choose a single item, or a set of size K. The random order model allows us to escape from the strong lower bounds for the adversarial order setting, and excellent algorithms are known in this setting. However, one worrying aspect of these results is that the algorithms overfit to the model: they are not very robust. Indeed, if a few "outlier" arrivals are adversarially placed in the arrival sequence, the algorithms perform poorly. E.g., Dynkin’s popular 1/e-secretary algorithm is sensitive to even a single adversarial arrival: if the adversary gives one large bid at the beginning of the stream, the algorithm does not select any element at all. We investigate a robust version of the secretary problem. In the Byzantine Secretary model, we have two kinds of elements: green (good) and red (rogue). The values of all elements are chosen by the adversary. The green elements arrive at times uniformly randomly drawn from [0,1]. The red elements, however, arrive at adversarially chosen times. Naturally, the algorithm does not see these colors: how well can it solve secretary problems? We show that selecting the highest value red set, or the single largest green element is not possible with even a small fraction of red items. However, on the positive side, we show that these are the only bad cases, by giving algorithms which get value comparable to the value of the optimal green set minus the largest green item. (This benchmark reminds us of regret minimization and digital auctions, where we subtract an additive term depending on the "scale" of the problem.) Specifically, we give an algorithm to pick K elements, which gets within (1-ε) factor of the above benchmark, as long as K ≥ poly(ε^{-1} log n). We extend this to the knapsack secretary problem, for large knapsack size K. For the single-item case, an analogous benchmark is the value of the second-largest green item. For value-maximization, we give a poly log^* n-competitive algorithm, using a multi-layered bucketing scheme that adaptively refines our estimates of second-max over time. For probability-maximization, we show the existence of a good randomized algorithm, using the minimax principle. We hope that this work will spur further research on robust algorithms for the secretary problem, and for other problems in sequential decision-making, where the existing algorithms are not robust and often tend to overfit to the model.ISSN:1868-896

arXiv.org e-Print Archive

Repository for Publications and Research Data

Dagstuhl Research Online Publication Server

Improving Performance of Iterative Methods by Lossy Checkponting

Author: Acosta J. Mora
Agullo E.
Balay S.
Barrett R.
Barrett R.
Bode B.
Calhoun J.
Heath M. T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 28/05/2018
Field of study

Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks in parallel, they have to checkpoint the dynamic variables periodically in case of unavoidable fail-stop errors, requiring fast I/O systems and large storage space. To this end, significantly reducing the checkpointing overhead is critical to improving the overall performance of iterative methods. Our contribution is fourfold. (1) We propose a novel lossy checkpointing scheme that can significantly improve the checkpointing performance of iterative methods by leveraging lossy compressors. (2) We formulate a lossy checkpointing performance model and derive theoretically an upper bound for the extra number of iterations caused by the distortion of data in lossy checkpoints, in order to guarantee the performance improvement under the lossy checkpointing scheme. (3) We analyze the impact of lossy checkpointing (i.e., extra number of iterations caused by lossy checkpointing files) for multiple types of iterative methods. (4)We evaluate the lossy checkpointing scheme with optimal checkpointing intervals on a high-performance computing environment with 2,048 cores, using a well-known scientific computation package PETSc and a state-of-the-art checkpoint/restart toolkit. Experiments show that our optimized lossy checkpointing scheme can significantly reduce the fault tolerance overhead for iterative methods by 23%~70% compared with traditional checkpointing and 20%~58% compared with lossless-compressed checkpointing, in the presence of system failures.Comment: 14 pages, 10 figures, HPDC'1

arXiv.org e-Print Archive

Crossref

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Optimal Checkpointing for Secure Intermittently-Powered IoT Devices

Author: Garg Siddharth
Ghodsi Zahra
Karri Ramesh
Publication venue
Publication date: 04/11/2017
Field of study

Energy harvesting is a promising solution to power Internet of Things (IoT) devices. Due to the intermittent nature of these energy sources, one cannot guarantee forward progress of program execution. Prior work has advocated for checkpointing the intermediate state to off-chip non-volatile memory (NVM). Encrypting checkpoints addresses the security concern, but significantly increases the checkpointing overheads. In this paper, we propose a new online checkpointing policy that judiciously determines when to checkpoint so as to minimize application time to completion while guaranteeing security. Compared to state-of-the-art checkpointing schemes that do not account for the overheads of encrypted checkpoints we improve execution time up to 1.4x.Comment: ICCAD 201

arXiv.org e-Print Archive

Crossref

Palindrome Recognition In The Streaming Model

Author: Azer Erfan Sadeqi
Berenbrink Petra
Ergün Funda
Mallmann-Trenn Frederik
Publication venue
Publication date: 28/01/2016
Field of study

In the Palindrome Problem one tries to find all palindromes (palindromic substrings) in a given string. A palindrome is defined as a string which reads forwards the same as backwards, e.g., the string "racecar". A related problem is the Longest Palindromic Substring Problem in which finding an arbitrary one of the longest palindromes in the given string suffices. We regard the streaming version of both problems. In the streaming model the input arrives over time and at every point in time we are only allowed to use sublinear space. The main algorithms in this paper are the following: The first one is a one-pass randomized algorithm that solves the Palindrome Problem. It has an additive error and uses

O(\sqrt n

) space. The second algorithm is a two-pass algorithm which determines the exact locations of all longest palindromes. It uses the first algorithm as the first pass. The third algorithm is again a one-pass randomized algorithm, which solves the Longest Palindromic Substring Problem. It has a multiplicative error using only

O(\log(n))

space. We also give two variants of the first algorithm which solve other related practical problems

arXiv.org e-Print Archive

CiteSeerX

Checkpointing of parallel applications in a Grid environment

Author: Sajadah K.
Sajadah K.
Publication venue
Publication date: 01/01/2011
Field of study

The Grid environment is generic, heterogeneous, and dynamic with lots of unreliable resources making it very exposed to failures. The environment is unreliable because it is geographically dispersed involving multiple autonomous administrative domains and it is composed of a large number of components. Examples of failures in the Grid environment can be: application crash, Grid node crash, network failures, and Grid system component failures. These types of failures can affect the execution of parallel/distributed application in the Grid environment and so, protections against these faults are crucial. Therefore, it is essential to develop efficient fault tolerant mechanisms to allow users to successfully execute Grid applications. One of the research challenges in Grid computing is to be able to develop a fault tolerant solution that will ensure Grid applications are executed reliably with minimum overhead incurred. While checkpointing is the most common method to achieve fault tolerance, there is still a lot of work to be done to improve the efficiency of the mechanism. This thesis provides an in-depth description of a novel solution for checkpointing parallel applications executed on a Grid. The checkpointing mechanism implemented allows to checkpoint an application at regions where there is no interprocess communication involved and therefore reducing the checkpointing overhead and checkpoint size

WestminsterResearch