Search CORE

8 research outputs found

File Fragmentation over an Unreliable Channel

Author: Andreasson Martin
Andrew Lachlan L. H.
Doyle John C.
Low Steven H.
Nair Jayakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

It has been recently discovered that heavy-tailed file completion time can result from protocol interaction even when file sizes are light-tailed. A key to this phenomenon is the RESTART feature where if a file transfer is interrupted before it is completed, the transfer needs to restart from the beginning. In this paper, we show that independent or bounded fragmentation guarantees light-tailed file completion time as long as the file size is light-tailed, i.e., in this case, heavy-tailed file completion time can only originate from heavy-tailed file sizes. If the file size is heavy-tailed, then the file completion time is necessarily heavy-tailed. For this case, we show that when the file size distribution is regularly varying, then under independent or bounded fragmentation, the completion time tail distribution function is asymptotically upper bounded by that of the original file size stretched by a constant factor. We then prove that if the failure distribution has non-decreasing failure rate, the expected completion time is minimized by dividing the file into equal sized fragments; this optimal fragment size is unique but depends on the file size. We also present a simple blind fragmentation policy where the fragment sizes are constant and independent of the file size and prove that it is asymptotically optimal. Finally, we bound the error in expected completion time due to error in modeling of the failure process

CiteSeerX

Publikationer från KTH

Crossref

Caltech Authors

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swinburne Research Bank

On Channel Failures, File Fragmentation Policies, and Heavy-Tailed Completion Times

Author: Andreasson Martin
Andrew Lachlan L. H.
Doyle John C.
Low Steven H.
Nair Jayakrishnan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

It has been recently discovered that heavy-tailed completion times can result from protocol interaction even when file sizes are light-tailed. A key to this phenomenon is the use of a restart policy where if the file is interrupted before it is completed, it needs to restart from the beginning. In this paper, we show that fragmenting a file into pieces whose sizes are either bounded or independently chosen after each interruption guarantees light-tailed completion time as long as the file size is light-tailed; i.e., in this case, heavy-tailed completion time can only originate from heavy-tailed file sizes. If the file size is heavy-tailed, then the completion time is necessarily heavy-tailed. For this case, we show that when the file size distribution is regularly varying, then under independent or bounded fragmentation, the completion time tail distribution function is asymptotically bounded above by that of the original file size stretched by a constant factor. We then prove that if the distribution of times between interruptions has nondecreasing failure rate, the expected completion time is minimized by dividing the file into equal-sized fragments; this optimal fragment size is unique but depends on the file size. We also present a simple blind fragmentation policy where the fragment sizes are constant and independent of the file size and prove that it is asymptotically optimal. Both these policies are also shown to have desirable completion time tail behavior. Finally, we bound the error in expected completion time due to error in modeling of the failure process

CiteSeerX

Caltech Authors

Swinburne Research Bank

Dspace at IIT Bombay

On Channel Failures, File Fragmentation Policies, and Heavy-Tailed Completion Times

Author: Jayakrishnan Nair
John C. Doyle
Lachlan L. H. Andrew
Martin Andreasson
Steven H. Low
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Heavy Tails and Instabilities in Large-Scale Systems with Failures

Author: Skiani Evangelia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2015
Field of study

Modern engineering systems, e.g., wireless communication networks, distributed computing systems, etc., are characterized by high variability and susceptibility to failures. Failure recovery is required to guarantee the successful operation of these systems. One straight- forward and widely used mechanism is to restart the interrupted jobs from the beginning after a failure occurs. In network design, retransmissions are the primary building blocks of the network architecture that guarantee data delivery in the presence of channel failures. Retransmissions have recently been identified as a new origin of power laws in modern information networks. In particular, it was discovered that retransmissions give rise to long tails (delays) and possibly zero throughput. To this end, we investigate the impact of the ‘retransmission phenomenon’ on the performance of failure prone systems and propose adaptive solutions to address emerging instabilities. The preceding finding of power law phenomena due to retransmissions holds under the assumption that data sizes have infinite support. In practice, however, data sizes are upper bounded 0 ≤ L ≤ b, e.g., WaveLAN’s maximum transfer unit is 1500 bytes, YouTube videos are of limited duration, e-mail attachments cannot exceed 10MB, etc. To this end, we first provide a uniform characterization of the entire body of the distribution of the number of retransmissions, which can be represented as a product of a power law and the Gamma distribution. This rigorous approximation clearly demonstrates the transition from power law distributions in the main body to exponential tails. Furthermore, the results highlight the importance of wisely determining the size of data fragments in order to accommodate the performance needs in these systems as well as provide the appropriate tools for this fragmentation. Second, we extend the analysis to the practically important case of correlated channels using modulated processes, e.g., Markov modulated, to capture the underlying dependencies. Our study shows that the tails of the retransmission and delay distributions are asymptotically insensitive to the channel correlations and are determined by the state that generates the lightest tail in the independent channel case. This insight is beneficial both for capacity planning and channel modeling since the independent model is sufficient and the correlation details do not matter. However, the preceding finding may be overly optimistic when the best state is atypical, since the effects of ‘bad’ states may still downgrade the performance. Third, we examine the effects of scheduling policies in queueing systems with failures and restarts. Fair sharing, e.g., processor sharing (PS), is a widely accepted approach to resource allocation among multiple users. We revisit the well-studied M/G/1 PS queue with a new focus on server failures and restarts. Interestingly, we discover a new phenomenon showing that PS-based scheduling induces complete instability in the presence of retransmissions, regardless of how low the traffic load may be. This novel phenomenon occurs even when the job sizes are bounded/fragmented, e.g., deterministic. This work demonstrates that scheduling one job at a time, such as first-come-first-serve, achieves a larger stability region and should be preferred in these systems. Last, we delve into the area of distributed computing and study the effects of commonly used mechanisms, i.e., restarts, fragmentation, replication, especially in cloud computing services. We evaluate the efficiency of these techniques under different assumptions on the data streams and discuss the corresponding optimization problem. These findings are useful for optimal resource allocation and fault tolerance in rapidly developing computing networks. In addition to networking and distributed computing systems, the aforementioned results improve our understanding of failure recovery management in large manufacturing and service systems, e.g., call centers. Scalable solutions to this problem increase in significance as these systems continuously grow in scale and complexity. The new phenomena and the techniques developed herein provide new insights in the areas of parallel computing, probability and statistics, as well as financial engineering

Columbia University Academic Commons

Optimal job fragmentation

Author: Low Steven H.
Nair Jayakrishnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2009
Field of study

It has been recently discovered that on an unreliable server, the job completion time distribution function (df) can be heavy-tailed (HT) even when job size df is light-tailed (LT) [1, 5]. A key to this phenomenon is the RESTART feature where if a job is interrupted in the middle of its processing, the entire job needs to restart from the beginning, i.e., the work that is partially completed is lost. A standard mechanism for reducing the job completion time in an unreliable service environment is checkpointing [3, 4, 6]. We view checkpointing as a job fragmentation operation, where the server processes one fragment of the job at a time. If the server becomes unavailable, say due to failure, then only the work corresponding to the fragment being processed at the time of failure is lost. In this paper, we are motivated by the question: Can fragmentation ‘lighten’ the tail df of the completion time? In Section 3, we provide sufficient conditions on the fragmentation policy that gives rise to LT completion time so long as the job size df is LT. We then characterize the optimal fragmentation policy seeking to minimize the expected job completion time. This policy requires a priori knowledge of the job size. We then describe a sub-optimal fragmentation policy that is blind to the job size and is provably very close to optimal. We also describe the asymptotic tail behavior of the job completion time df under both policies. Assuming the server unavailability periods are LT, both policies produce LT completion times when the job size df is LT. For the case of regularly varying job size df, the job completion time under both policies is regularly varying with the same degree - this is the lightest possible asymptotic tail behavior (in the degree sense)

Caltech Authors

Optimal job fragmentation

Author: Low Steven H.
Nair Jayakrishnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2009
Field of study

Optimal job fragmentation

Author: Bingham N.H.
Jayakrishnan Nair
Steven H. Low
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref