143 research outputs found

    Academic Panel: Can Self-Managed Systems be trusted?

    Get PDF
    Trust can be defined as to have confidence or faith in; a form of reliance or certainty based on past experience; to allow without fear; believe; hope: expect and wish; and extend credit to. The issue of trust in computing has always been a hot topic, especially notable with the proliferation of services over the Internet, which has brought the issue of trust and security right into the ordinary home. Autonomic computing brings its own complexity to this. With systems that self-manage, the internal decision making process is less transparent and the ‘intelligence’ possibly evolving and becoming less tractable. Such systems may be used from anything from environment monitoring to looking after Granny in the home and thus the issue of trust is imperative. To this end, we have organised this panel to examine some of the key aspects of trust. The first section discusses the issues of self-management when applied across organizational boundaries. The second section explores predictability in self-managed systems. The third part examines how trust is manifest in electronic service communities. The final discussion demonstrates how trust can be integrated into an autonomic system as the core intelligence with which to base adaptivity choices upon

    Fault-Tolerant Adaptive Parallel and Distributed Simulation

    Full text link
    Discrete Event Simulation is a widely used technique that is used to model and analyze complex systems in many fields of science and engineering. The increasingly large size of simulation models poses a serious computational challenge, since the time needed to run a simulation can be prohibitively large. For this reason, Parallel and Distributes Simulation techniques have been proposed to take advantage of multiple execution units which are found in multicore processors, cluster of workstations or HPC systems. The current generation of HPC systems includes hundreds of thousands of computing nodes and a vast amount of ancillary components. Despite improvements in manufacturing processes, failures of some components are frequent, and the situation will get worse as larger systems are built. In this paper we describe FT-GAIA, a software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation middleware. FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some protection against byzantine failures since synchronization messages are replicated as well, so that the receiving entity can identify and discard corrupted messages. We provide an experimental evaluation of FT-GAIA on a running prototype. Results show that a high degree of fault tolerance can be achieved, at the cost of a moderate increase in the computational load of the execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2016

    Formal and Fault Tolerant Design

    Get PDF
    Software quality and reliability were verified for a long time at the post-implementation level (test, fault sce-nario ...). The design of embedded systems and digital circuits is more and more complex because of inte-gration density, heterogeneity. Now almost ¾ of the digital circuits contain at least one processor, that is, can execute software code. In other words, co-design is the most usual case and traditional verification by simu-lation is no more practical. Moreover, the increase in integration density comes with a decrease in the reliabil-ity of the components. So fault detection, diagnostics techniques, introspection are essential for defect toler-ance, fault tolerance and self repair of safety-critical systems. The use of a formal specification language is considered as the foundation of a real validation. What we would like to emphasize is that refinement (from an abstract model to the point where the system will be implemented) could be and should be formal too in order to ensure the traceability of requirements, to man-age such development projects and so to design fault-tolerant systems correct by proven construction. Such a thorough approach can be achieved by the automation or semi-automation of the refinement process. We have studied how to ensure the traceability of these requirements in a component-based approach. Re-liability, fault tolerance can be seen here as particular refinement steps. For instance, a given formal specifi-cation of a system/component may be refined by adding redundancy (data, computation, component) and be verified to be fault-tolerant w.r.t. some given fault scenarios. A self-repair component can be defined as the refinement of its original form enhanced with error detection. We describe in this paper the PCSI project (Zero Defect Systems) based on B Method, VHDL and PSL. The three modeling approaches can collaborate together and guarantee the codesign of embedded systems for which the requirements and the fault-tolerant aspects are taken into account for the beginning and formally verified all along the implementation process

    Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication

    Full text link
    This paper presents FT-GAIA, a software-based fault-tolerant parallel and distributed simulation middleware. FT-GAIA has being designed to reliably handle Parallel And Distributed Simulation (PADS) models, which are needed to properly simulate and analyze complex systems arising in any kind of scientific or engineering field. PADS takes advantage of multiple execution units run in multicore processors, cluster of workstations or HPC systems. However, large computing systems, such as HPC systems that include hundreds of thousands of computing nodes, have to handle frequent failures of some components. To cope with this issue, FT-GAIA transparently replicates simulation entities and distributes them on multiple execution nodes. This allows the simulation to tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some protection against Byzantine failures, since interaction messages among the simulated entities are replicated as well, so that the receiving entity can identify and discard corrupted messages. Results from an analytical model and from an experimental evaluation show that FT-GAIA provides a high degree of fault tolerance, at the cost of a moderate increase in the computational load of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731

    CLOUD COMPUTING APPLICATIONS AND SECURITY ISSUES: AN ANALYTICAL SURVEY

    Get PDF
    Cloud computing, a rapidly developing information technology has becoming a well-known the whole world. Cloud computing is Internet-based computing, whereby shared resources, software and information, are provided to computers and devices on-demand, like the electricity grid. Cloud computing is the product of the fusion of traditional computing technology and network technology like grid computing, distributed computing parallel computing and so on. Companies, such as Amazon, Google, Microsoft and so on, accelerate their paces in developing Cloud Computing systems and enhancing their services to provide for a larger amount of users. However, security and privacy issues present a strong barrier for users to adapt into Cloud Computing systems. It is a new technology that satisfies a user’s requirement for computing resources like networks, storage, servers, services and applications, without physically acquiring them. It reduces the overhead of the organization of marinating the large system but it has associated risks and threats also which include – security, data leakage, insecure interface and sharing of resources and inside attacks

    New Approach for a Fault Detect Model-Based Controller: New Approach for a Fault Detect Model-Based Controller

    Get PDF
    In this paper, we present a design of a new fault detect model-based (FDMB) controller system. The system is aimed to detect faults quickly and reconfigure the controller accordingly. Thus, such system can perform its function correctly even in the presence of internal faults. An FDMB controller consists of two main parts, the first is fault detection and diagnosis (FDD); and the second is controller reconfiguration (CR). Systems subject to such faults are modelled as stochastic hybrid dynamic model. Each fault is deterministically represented by a mode in a discrete set of models. The FDD is used with interacting multiple-model (IMM) estimator and the CR is used with generalized predictive control (GPC) algorithm. Simulations for the proposed controller are illustrated and analysed

    RoBuSt: A Crash-Failure-Resistant Distributed Storage System

    Full text link
    In this work we present the first distributed storage system that is provably robust against crash failures issued by an adaptive adversary, i.e., for each batch of requests the adversary can decide based on the entire system state which servers will be unavailable for that batch of requests. Despite up to γn1/loglogn\gamma n^{1/\log\log n} crashed servers, with γ>0\gamma>0 constant and nn denoting the number of servers, our system can correctly process any batch of lookup and write requests (with at most a polylogarithmic number of requests issued at each non-crashed server) in at most a polylogarithmic number of communication rounds, with at most polylogarithmic time and work at each server and only a logarithmic storage overhead. Our system is based on previous work by Eikel and Scheideler (SPAA 2013), who presented IRIS, a distributed information system that is provably robust against the same kind of crash failures. However, IRIS is only able to serve lookup requests. Handling both lookup and write requests has turned out to require major changes in the design of IRIS.Comment: Revised full versio

    Extended Model driven Architecture to B Method

    Get PDF
    International audienceModel Driven Architecture (MDA) design approach proposes to separate design into two stages: implementation independent stage then an implementation-dependent one. This improves the reusability, the reusability, the standability, the maintainability, etc. Here we show how MDA can be augmented using a formal refinement approach: B method. Doing so enables to gradually refine the development from the abstract specification to the executing implementation; furthermore it permits to prove the coherence between components in low levels even if they are implemented in different technologies
    corecore