9 research outputs found

    Checkpoint-based Fault-tolerant Infrastructure for Virtualized Service Providers

    Get PDF
    Crash and omission failures are common in service providers: a disk can break down or a link can fail anytime. In addition, the probability of a node failure increases with the number of nodes. Apart from reducing the provider’s computation power and jeopardizing the fulfillment of his contracts, this can also lead to computation time wasting when the crash occurs before finishing the task execution. In order to avoid this problem, efficient checkpoint infrastructures are required, especially in virtualized environments where these infrastructures must deal with huge virtual machine images. This paper proposes a smart checkpoint infrastructure for virtualized service providers. It uses Another Union File System to differentiate read-only from read-write parts in the virtual machine image. In this way, read-only parts can be checkpointed only once, while the rest of checkpoints must only save the modifications in read-write parts, thus reducing the time needed to make a checkpoint. The checkpoints are stored in a Hadoop Distributed File System. This allows resuming a task execution faster after a node crash and increasing the fault tolerance of the system, since checkpoints are distributed and replicated in all the nodes of the provider. This paper presents a running implementation of this infrastructure and its evaluation, demonstrating that it is an effective way to make faster checkpoints with low interference on task execution and efficient task recovery after a node failure.Peer ReviewedPostprint (published version

    Joint dimensioning of server and network infrastructure for resilient optical grids/clouds

    Get PDF
    We address the dimensioning of infrastructure, comprising both network and server resources, for large-scale decentralized distributed systems such as grids or clouds. We design the resulting grid/cloud to be resilient against network link or server failures. To this end, we exploit relocation: Under failure conditions, a grid job or cloud virtual machine may be served at an alternate destination (i.e., different from the one under failure-free conditions). We thus consider grid/cloud requests to have a known origin, but assume a degree of freedom as to where they end up being served, which is the case for grid applications of the bag-of-tasks (BoT) type or hosted virtual machines in the cloud case. We present a generic methodology based on integer linear programming (ILP) that: 1) chooses a given number of sites in a given network topology where to install server infrastructure; and 2) determines the amount of both network and server capacity to cater for both the failure-free scenario and failures of links or nodes. For the latter, we consider either failure-independent (FID) or failure-dependent (FD) recovery. Case studies on European-scale networks show that relocation allows considerable reduction of the total amount of network and server resources, especially in sparse topologies and for higher numbers of server sites. Adopting a failure-dependent backup routing strategy does lead to lower resource dimensions, but only when we adopt relocation (especially for a high number of server sites): Without exploiting relocation, potential savings of FD versus FID are not meaningful

    Trust Index Based Fault Tolerant Multiple Event Localization Algorithm for WSNs

    Get PDF
    This paper investigates the use of wireless sensor networks for multiple event source localization using binary information from the sensor nodes. The events could continually emit signals whose strength is attenuated inversely proportional to the distance from the source. In this context, faults occur due to various reasons and are manifested when a node reports a wrong decision. In order to reduce the impact of node faults on the accuracy of multiple event localization, we introduce a trust index model to evaluate the fidelity of information which the nodes report and use in the event detection process, and propose the Trust Index based Subtract on Negative Add on Positive (TISNAP) localization algorithm, which reduces the impact of faulty nodes on the event localization by decreasing their trust index, to improve the accuracy of event localization and performance of fault tolerance for multiple event source localization. The algorithm includes three phases: first, the sink identifies the cluster nodes to determine the number of events occurred in the entire region by analyzing the binary data reported by all nodes; then, it constructs the likelihood matrix related to the cluster nodes and estimates the location of all events according to the alarmed status and trust index of the nodes around the cluster nodes. Finally, the sink updates the trust index of all nodes according to the fidelity of their information in the previous reporting cycle. The algorithm improves the accuracy of localization and performance of fault tolerance in multiple event source localization. The experiment results show that when the probability of node fault is close to 50%, the algorithm can still accurately determine the number of the events and have better accuracy of localization compared with other algorithms

    How Practical Are Intrusion-Tolerant Distributed Systems?

    Get PDF
    Building secure, inviolable systems using traditional mechanisms is becoming increasingly an unattainable goal. The recognition of this fact has fostered the interest in alternative approaches to security such as intrusion tolerance, which applies fault tolerance concepts and techniques to security problems. Albeit this area is quite promising, intrusion-tolerant distributed systems typically rely on the assumption that the system components fail or are compromised independently. This is a strong assumption that has been repeatedly questioned. In this paper we discuss how this assumption can be implemented in practice using diversity of system components. We present a taxonomy of axes of diversity and discuss how they provide failure independence. Furthermore, we provide a practical example of an intrusion-tolerant system built using diversity

    Dependability through Assured Reconfiguration in Embedded System Software

    Full text link

    Checking for Application Vulnerabilities Using Fault Injection

    Get PDF
    This thesis introduces a fault injector, called "Pulad", specifically developed for finding application vulnerabilities. Most previous approaches for finding application vulnerabilities involved static verification methods. With these methods, the source code is not executed. Since vulnerabilities can only be revealed when they are exploited, the use of a dynamic verification method, executing the source code, seems needed. The main two dynamic verification areas are software testing and fault injection. This thesis focuses on fault injection. Pulad, the fault injector described in this thesis consists of two main parts called the "collector" and the "fault injector". The goal of the collector is to record all the environment-application interactions when the application is running. These interactions focusing on the environment files are then analyzed and the following fields are uploaded into a database including the file name, file extension, file size, file directory, number of times the file was used, file permission (includes symbolic link and ownership) and number of times an error occurred. The fault injector allows to inject faults either using a graphical user interface (GUI) or directly through a text file. The faults in the files include the file name, the directory name, the execution path, the library path, the file existence, the file ownership, the file permission, etc. For each of the faults, the specific type of fault needs to be indicated. Moreover, the interaction points where the faults should be injected are also provided by the user

    On the Use of Fault Injection to Discover Security Vulnerabilities in Applications

    Get PDF
    The advent of the Internet has enabled developers to write and share software components with each other more easily. Developers have become increasingly reliant on code other than their own for application development; code that is often not well tested, and lacking any kind of security review, thus exposing its consumers to security vulnerabilities. The goal of this thesis is to adapt existing techniques, and discover new approaches that can be used to discover security vulnerabilities in applications. We use fault injection in each of our techniques and define a set of criteria to evaluate these approaches. The hierarchy of approaches, starting from a black box and ending in a full white box approach, allows a security reviewer to choose a technique depending on the amount of information available about the application under review, time constraints, and extent of security analysis and confidence desired in the program

    Flerkonfigurasjonsruting (MRC) som utgangspunkt for resursallokering i datanettverk : Multiple Routing Configurations (MRC) As A Basis For Resource Allocation In Computer Networks

    Get PDF
    I denne oppgaven presenterer vi vÄre funn relatetert til bruk av Fler- konfigurasjonsruting (MRC, eng.: Multiple Routing Configurations) i data- nettverk som benytter en sentral nettverksadministrator, slik som en bÄnd- breddemegler (eng.: Bandwidth Broker) til hÄndtering av nettverksresur- ser og tjenester (IntServ, eng.: Integrated Services). Til dette har vi ut- viklet simulatoren SAK (Simulering av kapasitet). For Ä bedre formidle hvordan MRC fungerer, og hva som skjer under allokering av nettverksresurser har vi vektlagt visualisering. SAK viser derfor bÄde isolerings- og allokeringsprosessen i sanntid, hvis brukeren Þnsker kan disse prosessene ytterligere forsinkes (eller akselereres), slik at det blir lettere (vanskeligere) Ä fÞlge med pÄ hva som skjer. Vi har ogsÄ sett nÊrmere pÄ i hvilken grad allokeringskapasiteten pÄ- virkes av antall konfigurasjoner som produseres av MRC, hvorvidt mange eller fÄ konfigurasjoner gir den beste allokeringskapasiteten. VÄre eksperimenter gir en sterk indikasjon pÄ at MRC yter bedre enn ruting etter en enkelt rutetabell og topologiens korteste sti fra inngangs- til utgangsnode. Vi fant ogsÄ en klar indikasjon pÄ at antall konfigurasjoner Þker allokeringskapasiteten i nettverket

    Incident Prioritisation for Intrusion Response Systems

    Get PDF
    The landscape of security threats continues to evolve, with attacks becoming more serious and the number of vulnerabilities rising. To manage these threats, many security studies have been undertaken in recent years, mainly focusing on improving detection, prevention and response efficiency. Although there are security tools such as antivirus software and firewalls available to counter them, Intrusion Detection Systems and similar tools such as Intrusion Prevention Systems are still one of the most popular approaches. There are hundreds of published works related to intrusion detection that aim to increase the efficiency and reliability of detection, prevention and response systems. Whilst intrusion detection system technologies have advanced, there are still areas available to explore, particularly with respect to the process of selecting appropriate responses. Supporting a variety of response options, such as proactive, reactive and passive responses, enables security analysts to select the most appropriate response in different contexts. In view of that, a methodical approach that identifies important incidents as opposed to trivial ones is first needed. However, with thousands of incidents identified every day, relying upon manual processes to identify their importance and urgency is complicated, difficult, error-prone and time-consuming, and so prioritising them automatically would help security analysts to focus only on the most critical ones. The existing approaches to incident prioritisation provide various ways to prioritise incidents, but less attention has been given to adopting them into an automated response system. Although some studies have realised the advantages of prioritisation, they released no further studies showing they had continued to investigate the effectiveness of the process. This study concerns enhancing the incident prioritisation scheme to identify critical incidents based upon their criticality and urgency, in order to facilitate an autonomous mode for the response selection process in Intrusion Response Systems. To achieve this aim, this study proposed a novel framework which combines models and strategies identified from the comprehensive literature review. A model to estimate the level of risks of incidents is established, named the Risk Index Model (RIM). With different levels of risk, the Response Strategy Model (RSM) dynamically maps incidents into different types of response, with serious incidents being mapped to active responses in order to minimise their impact, while incidents with less impact have passive responses. The combination of these models provides a seamless way to map incidents automatically; however, it needs to be evaluated in terms of its effectiveness and performances. To demonstrate the results, an evaluation study with four stages was undertaken; these stages were a feasibility study of the RIM, comparison studies with industrial standards such as Common Vulnerabilities Scoring System (CVSS) and Snort, an examination of the effect of different strategies in the rating and ranking process, and a test of the effectiveness and performance of the Response Strategy Model (RSM). With promising results being gathered, a proof-of-concept study was conducted to demonstrate the framework using a live traffic network simulation with online assessment mode via the Security Incident Prioritisation Module (SIPM); this study was used to investigate its effectiveness and practicality. Through the results gathered, this study has demonstrated that the prioritisation process can feasibly be used to facilitate the response selection process in Intrusion Response Systems. The main contribution of this study is to have proposed, designed, evaluated and simulated a framework to support the incident prioritisation process for Intrusion Response Systems.Ministry of Higher Education in Malaysia and University of Malay
    corecore