72 research outputs found

    A Latency-driven Availability Assessment for Multi-Tenant Service Chains

    Get PDF
    Nowadays, most telecommunication services adhere to the Service Function Chain (SFC) paradigm, where network functions are implemented via software. In particular, container virtualization is becoming a popular approach to deploy network functions and to enable resource slicing among several tenants. The resulting infrastructure is a complex system composed by a huge amount of containers implementing different SFC functionalities, along with different tenants sharing the same chain. The complexity of such a scenario lead us to evaluate two critical metrics: the steady-state availability (the probability that a system is functioning in long runs) and the latency (the time between a service request and the pertinent response). Consequently, we propose a latency-driven availability assessment for multi-tenant service chains implemented via Containerized Network Functions (CNFs). We adopt a multi-state system to model single CNFs and the queueing formalism to characterize the service latency. To efficiently compute the availability, we develop a modified version of the Multidimensional Universal Generating Function (MUGF) technique. Finally, we solve an optimization problem to minimize the SFC cost under an availability constraint. As a relevant example of SFC, we consider a containerized version of IP Multimedia Subsystem, whose parameters have been estimated through fault injection techniques and load tests

    End-to-End Resilience Mechanisms for Network Transport Protocols

    Get PDF
    The universal reliance on and hence the need for resilience in network communications has been well established. Current transport protocols are designed to provide fixed mechanisms for error remediation (if any), using techniques such as ARQ, and offer little or no adaptability to underlying network conditions, or to different sets of application requirements. The ubiquitous TCP transport protocol makes too many assumptions about underlying layers to provide resilient end-to-end service in all network scenarios, especially those which include significant heterogeneity. Additionally the properties of reliability, performability, availability, dependability, and survivability are not explicitly addressed in the design, so there is no support for resilience. This dissertation presents considerations which must be taken in designing new resilience mechanisms for future transport protocols to meet service requirements in the face of various attacks and challenges. The primary mechanisms addressed include diverse end-to-end paths, and multi-mode operation for changing network conditions

    A holistic approach for measuring the survivability of SCADA systems

    Get PDF
    Supervisory Control and Data Acquisition (SCADA) systems are responsible for controlling and monitoring Industrial Control Systems (ICS) and Critical Infrastructure Systems (CIS) among others. Such systems are responsible to provide services our society relies on such as gas, electricity, and water distribution. They process our waste; manage our railways and our traffic. Nevertheless to say, they are vital for our society and any disruptions on such systems may produce from financial disasters to ultimately loss of lives. SCADA systems have evolved over the years, from standalone, proprietary solutions and closed networks into large-scale, highly distributed software systems operating over open networks such as the internet. In addition, the hardware and software utilised by SCADA systems is now, in most cases, based on COTS (Commercial Off-The-Shelf) solutions. As they evolved they became vulnerable to malicious attacks. Over the last few years there is a push from the computer security industry on adapting their security tools and techniques to address the security issues of SCADA systems. Such move is welcome however is not sufficient, otherwise successful malicious attacks on computer systems would be non-existent. We strongly believe that rather than trying to stop and detect every attack on SCADA systems it is imperative to focus on providing critical services in the presence of malicious attacks. Such motivation is similar with the concepts of survivability, a discipline integrates areas of computer science such as performance, security, fault-tolerance and reliability. In this thesis we present a new concept of survivability; Holistic survivability is an analysis framework suitable for a new era of data-driven networked systems. It extends the current view of survivability by incorporating service interdependencies as a key property and aspects of machine learning. The framework uses the formalism of probabilistic graphical models to quantify survivability and introduces new metrics and heuristics to learn and identify essential services automatically. Current definitions of survivability are often limited since they either apply performance as measurement metric or use security metrics without any survivability context. Holistic survivability addresses such issues by providing a flexible framework where performance and security metrics can be tailored to the context of survivability. In other words, by applying performance and security our work aims to support key survivability properties such as recognition and resistance. The models and metrics here introduced are applied to SCADA systems as such systems insecurity is one of the motivations of this work. We believe that the proposed work goes beyond the current status of survivability models. Holistic survivability is flexible enough to support the addition of other metrics and can be easily used with different models. Because it is based on a well-known formalism its definition and implementation are easy to grasp and to apply. Perhaps more importantly, this proposed work is aimed to a new era where data is being produced and consumed on a large-scale. Holistic survivability aims to be the catalyst to new models based on data that will provide better and more accurate insights on the survivability of systems

    RAIDX: RAID EXTENDED FOR HETEROGENEOUS ARRAYS

    Get PDF
    The computer hard drive market has diversified with the establishment of solid state disks (SSDs) as an alternative to magnetic hard disks (HDDs). Each hard drive technology has its advantages: the SSDs are faster than HDDs but the HDDs are cheaper. Our goal is to construct a parallel storage system with HDDs and SSDs such that the parallel system is as fast as the SSDs. Achieving this goal is challenging since the slow HDDs store more data and become bottlenecks, while the SSDs remain idle. RAIDX is a parallel storage system designed for disks of different speeds, capacities and technologies. The RAIDX hardware consists of an array of disks; the RAIDX software consists of data structures and algorithms that allow the disks to be viewed as a single storage unit that has capacity equal to the sum of the capacities of its disks, failure rate lower than the failure rate of its individual disks, and speeds close to that of its faster disks. RAIDX achieves its performance goals with the aid of its novel parallel data organization technique that allows storage data to be moved on the fly without impacting the upper level file system. We show that storage data accesses satisfy the locality of reference principle, whereby only a small fraction of storage data are accessed frequently. RAIDX has a monitoring program that identifies frequently accessed blocks and a migration program that moves frequently accessed blocks to faster disks. The faster disks are caches that store the solo copy of frequently accessed data. Experimental evaluation has shown that a HDD+SSD RAIDX array is as fast as an all-SSD array when the workload shows locality of reference

    ICASE

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in the areas of (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving Langley facilities and scientists; and (4) computer science

    Developing a distributed electronic health-record store for India

    Get PDF
    The DIGHT project is addressing the problem of building a scalable and highly available information store for the Electronic Health Records (EHRs) of the over one billion citizens of India

    Resilience of an embedded architecture using hardware redundancy

    Get PDF
    In the last decade the dominance of the general computing systems market has being replaced by embedded systems with billions of units manufactured every year. Embedded systems appear in contexts where continuous operation is of utmost importance and failure can be profound. Nowadays, radiation poses a serious threat to the reliable operation of safety-critical systems. Fault avoidance techniques, such as radiation hardening, have been commonly used in space applications. However, these components are expensive, lag behind commercial components with regards to performance and do not provide 100% fault elimination. Without fault tolerant mechanisms, many of these faults can become errors at the application or system level, which in turn, can result in catastrophic failures. In this work we study the concepts of fault tolerance and dependability and extend these concepts providing our own definition of resilience. We analyse the physics of radiation-induced faults, the damage mechanisms of particles and the process that leads to computing failures. We provide extensive taxonomies of 1) existing fault tolerant techniques and of 2) the effects of radiation in state-of-the-art electronics, analysing and comparing their characteristics. We propose a detailed model of faults and provide a classification of the different types of faults at various levels. We introduce an algorithm of fault tolerance and define the system states and actions necessary to implement it. We introduce novel hardware and system software techniques that provide a more efficient combination of reliability, performance and power consumption than existing techniques. We propose a new element of the system called syndrome that is the core of a resilient architecture whose software and hardware can adapt to reliable and unreliable environments. We implement a software simulator and disassembler and introduce a testing framework in combination with ERA’s assembler and commercial hardware simulators

    The 2nd Conference of PhD Students in Computer Science

    Get PDF

    Multi criteria risk analysis of a subsea BOP system

    Get PDF
    The Subsea blowout preventer (BOP) which is latched to a subsea wellhead is one of several barriers in the well to prevent kicks and blowouts and it is the most important and critical equipment, as it becomes the last line of protection against blowout. The BOP system used in Subsea drilling operations is considered a Safety – Critical System, with a high severity consequence following its failure. Following past offshore blowout incidents such as the most recent Macondo in the Gulf of Mexico, there have been investigations, research, and improvements sought for improved understanding of the BOP system and its operation. This informs the need for a systematic re-evaluation of the Subsea BOP system to understand its associated risk and reliability and identify critical areas/aspects/components. Different risk analysis techniques were surveyed and the Failure modes effect and criticality analysis (FMECA) selected to be used to drive the study in this thesis. This is due to it being a simple proven cost effective process that can add value to the understanding of the behaviours and properties of a system, component, software, function or other. The output of the FMECA can be used to inform or support other key engineering tasks such as redesigning, enhanced qualification and testing activity or maintenance for greater inherent reliability and reduced risk potential. This thesis underscores the application of the FMECA technique to critique associated risk of the Subsea BOP system. System Functional diagrams was developed with boundaries defined, a FMECA were carried out and an initial select list of critical component failure modes identified. The limitations surrounding the confidence of the FMECA failure modes ranking outcome based on Risk priority number (RPN) is presented and potential variations in risk interpretation are discussed. The main contribution in this thesis is an innovative framework utilising Multicriteria decision making (MCDA) analysis techniques with consideration of fuzzy interval data is applied to the Subsea BOP system critical failure modes from the FMECA analysis. It utilised nine criticality assessment criteria deduced from expert consultation to obtain a more reliable ranking of failure modes. The MCDA techniques applied includes the technique for order of Preference for similarity to the Ideal Solution (TOPSIS), Fuzzy TOPSIS, TOPSIS with interval data, and Preference Ranking Organization Method for Enrichment of Evaluations (PROMETHEE). The outcome of the Multi-criteria analysis of the BOP system clearly shows failures of the Wellhead connector, LMRP hydraulic connector and Control system related failure as the Top 3 most critical failure with respect to a well control. The critical failure mode and components outcome from the analysis in this thesis is validated using failure data from industry database and a sensitivity analysis carried out. The importance of maintenance, testing and redundancy to the BOP system criticality was established by the sensitivity analysis. The potential for MCDA to be used for more specific analysis of criteria for a technology was demonstrated. Improper maintenance, inspection, testing (functional and pressure) are critical to the BOP system performance and sustenance of a high reliability level. Material selection and performance of components (seals, flanges, packers, bolts, mechanical body housings) relative to use environment and operational conditions is fundamental to avoiding failure mechanisms occurrence. Also worthy of notice is the contribution of personnel and organisations (by way of procedures to robustness and verification structure to ensure standard expected practices/rules are followed) to failures as seen in the root cause discussion. OEMs, operators and drilling contractors to periodically review operation scenarios relative to BOP system product design through the use of a Failure reporting analysis and corrective action system. This can improve design of monitoring systems, informs requirement for re-qualification of technology and/or next generation designs. Operations personnel are to correctly log in failures in these systems, and responsible Authority to ensure root cause analysis is done to uncover underlying issue initiating and driving failures

    Research in progress in applied mathematics, numerical analysis, fluid mechanics, and computer science

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period October 1, 1993 through March 31, 1994. The major categories of the current ICASE research program are: (1) applied and numerical mathematics, including numerical analysis and algorithm development; (2) theoretical and computational research in fluid mechanics in selected areas of interest to LaRC, including acoustics and combustion; (3) experimental research in transition and turbulence and aerodynamics involving LaRC facilities and scientists; and (4) computer science
    • …
    corecore