1,080 research outputs found

    A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

    Full text link
    Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

    Resource and Application Models for Advanced Grid Schedulers

    Get PDF
    As Grid computing is becoming an inevitable future, managing, scheduling and monitoring dynamic, heterogeneous resources will present new challenges. Solutions will have to be agile and adaptive, support self-organization and autonomous management, while maintaining optimal resource utilisation. Presented in this paper are basic principles and architectural concepts for efficient resource allocation in heterogeneous Grid environment

    Multi-round Master-Worker Computing: a Repeated Game Approach

    Full text link
    We consider a computing system where a master processor assigns tasks for execution to worker processors through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that workers are rational in a game-theoretic sense, and that they randomize their strategic choice. Workers are assigned multiple tasks in subsequent rounds. We model the system as an infinitely repeated game of the mixed extension of the strategic game. In each round, the master decides stochastically whether to accept the answer of the majority or verify the answers received, at some cost. Incentives and/or penalties are applied to workers accordingly. Under the above framework, we study the conditions in which the master can reliably obtain tasks results, exploiting that the repeated games model captures the effect of long-term interaction. That is, workers take into account that their behavior in one computation will have an effect on the behavior of other workers in the future. Indeed, should a worker be found to deviate from some agreed strategic choice, the remaining workers would change their own strategy to penalize the deviator. Hence, being rational, workers do not deviate. We identify analytically the parameter conditions to induce a desired worker behavior, and we evaluate experi- mentally the mechanisms derived from such conditions. We also compare the performance of our mechanisms with a previously known multi-round mechanism based on reinforcement learning.Comment: 21 pages, 3 figure

    Report of the 2014 NSF Cybersecurity Summit for Large Facilities and Cyberinfrastructure

    Get PDF
    This event was supported in part by the National Science Foundation under Grant Number 1234408. Any opinions, findings, and conclusions or recommendations expressed at the event or in this report are those of the authors and do not necessarily reflect the views of the National Science Foundation

    Ransomware: Current Trend, Challenges, and Research Directions

    Get PDF
    Ransomware attacks have become a global incidence, with the primary aim of making monetary gains through illicit means. The attack started through e-mails and has expanded through spamming and phishing. Ransomware encrypts targets’ files and display notifications, requesting for payment before the data can be unlocked. Ransom demand is usually in form of virtual currency, bitcoin, because it is difficult to track. In this paper, we give a brief overview of the current trend, challenges, and research progress in the bid to finding lasting solutions to the menace of ransomware that currently challenge computer and network security, and data privacy
    • …
    corecore