143 research outputs found
Academic Panel: Can Self-Managed Systems be trusted?
Trust can be defined as to have confidence or faith in; a form of reliance or certainty based on past experience; to allow without fear; believe; hope: expect and wish; and extend credit to. The issue of trust in computing has always been a hot topic, especially notable with the proliferation of services over the Internet, which has brought the issue of trust and security right into the ordinary home. Autonomic computing brings its own complexity to this. With systems that self-manage, the internal decision making process is less transparent and the ‘intelligence’ possibly evolving and becoming less tractable. Such systems may be used from anything from environment monitoring to looking after Granny in the home and thus the issue of trust is imperative. To this end, we have organised this panel to examine some of the key aspects of trust. The first section discusses the issues of self-management when applied across organizational boundaries. The second section explores predictability in self-managed systems. The third part examines how trust is manifest in electronic service communities. The final discussion demonstrates how trust can be integrated into an autonomic system as the core intelligence with which to base adaptivity choices upon
Fault-Tolerant Adaptive Parallel and Distributed Simulation
Discrete Event Simulation is a widely used technique that is used to model
and analyze complex systems in many fields of science and engineering. The
increasingly large size of simulation models poses a serious computational
challenge, since the time needed to run a simulation can be prohibitively
large. For this reason, Parallel and Distributes Simulation techniques have
been proposed to take advantage of multiple execution units which are found in
multicore processors, cluster of workstations or HPC systems. The current
generation of HPC systems includes hundreds of thousands of computing nodes and
a vast amount of ancillary components. Despite improvements in manufacturing
processes, failures of some components are frequent, and the situation will get
worse as larger systems are built. In this paper we describe FT-GAIA, a
software-based fault-tolerant extension of the GAIA/ART\`IS parallel simulation
middleware. FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes; furthermore, FT-GAIA offers some
protection against byzantine failures since synchronization messages are
replicated as well, so that the receiving entity can identify and discard
corrupted messages. We provide an experimental evaluation of FT-GAIA on a
running prototype. Results show that a high degree of fault tolerance can be
achieved, at the cost of a moderate increase in the computational load of the
execution units.Comment: Proceedings of the IEEE/ACM International Symposium on Distributed
Simulation and Real Time Applications (DS-RT 2016
Formal and Fault Tolerant Design
Software quality and reliability were verified for a long time at the post-implementation level (test, fault sce-nario ...). The design of embedded systems and digital circuits is more and more complex because of inte-gration density, heterogeneity. Now almost ¾ of the digital circuits contain at least one processor, that is, can execute software code. In other words, co-design is the most usual case and traditional verification by simu-lation is no more practical. Moreover, the increase in integration density comes with a decrease in the reliabil-ity of the components. So fault detection, diagnostics techniques, introspection are essential for defect toler-ance, fault tolerance and self repair of safety-critical systems. The use of a formal specification language is considered as the foundation of a real validation. What we would like to emphasize is that refinement (from an abstract model to the point where the system will be implemented) could be and should be formal too in order to ensure the traceability of requirements, to man-age such development projects and so to design fault-tolerant systems correct by proven construction. Such a thorough approach can be achieved by the automation or semi-automation of the refinement process. We have studied how to ensure the traceability of these requirements in a component-based approach. Re-liability, fault tolerance can be seen here as particular refinement steps. For instance, a given formal specifi-cation of a system/component may be refined by adding redundancy (data, computation, component) and be verified to be fault-tolerant w.r.t. some given fault scenarios. A self-repair component can be defined as the refinement of its original form enhanced with error detection. We describe in this paper the PCSI project (Zero Defect Systems) based on B Method, VHDL and PSL. The three modeling approaches can collaborate together and guarantee the codesign of embedded systems for which the requirements and the fault-tolerant aspects are taken into account for the beginning and formally verified all along the implementation process
Fault Tolerant Adaptive Parallel and Distributed Simulation through Functional Replication
This paper presents FT-GAIA, a software-based fault-tolerant parallel and
distributed simulation middleware. FT-GAIA has being designed to reliably
handle Parallel And Distributed Simulation (PADS) models, which are needed to
properly simulate and analyze complex systems arising in any kind of scientific
or engineering field. PADS takes advantage of multiple execution units run in
multicore processors, cluster of workstations or HPC systems. However, large
computing systems, such as HPC systems that include hundreds of thousands of
computing nodes, have to handle frequent failures of some components. To cope
with this issue, FT-GAIA transparently replicates simulation entities and
distributes them on multiple execution nodes. This allows the simulation to
tolerate crash-failures of computing nodes. Moreover, FT-GAIA offers some
protection against Byzantine failures, since interaction messages among the
simulated entities are replicated as well, so that the receiving entity can
identify and discard corrupted messages. Results from an analytical model and
from an experimental evaluation show that FT-GAIA provides a high degree of
fault tolerance, at the cost of a moderate increase in the computational load
of the execution units.Comment: arXiv admin note: substantial text overlap with arXiv:1606.0731
CLOUD COMPUTING APPLICATIONS AND SECURITY ISSUES: AN ANALYTICAL SURVEY
Cloud computing, a rapidly developing information technology has becoming a well-known the whole world. Cloud computing is Internet-based computing, whereby shared resources, software and information, are provided to computers and devices on-demand, like the electricity grid. Cloud computing is the product of the fusion of traditional computing technology and network technology like grid computing, distributed computing parallel computing and so on. Companies, such as Amazon, Google, Microsoft and so on, accelerate their paces in developing Cloud Computing systems and enhancing their services to provide for a larger amount of users. However, security and privacy issues present a strong barrier for users to adapt into Cloud Computing systems. It is a new technology that satisfies a user’s requirement for computing resources like networks, storage, servers, services and applications, without physically acquiring them. It reduces the overhead of the organization of marinating the large system but it has associated risks and threats also which include – security, data leakage, insecure interface and sharing of resources and inside attacks
New Approach for a Fault Detect Model-Based Controller: New Approach for a Fault Detect Model-Based Controller
In this paper, we present a design of a new fault detect model-based (FDMB) controller system. The system is aimed to detect faults quickly and reconfigure the controller accordingly. Thus, such system can perform its function correctly even in the presence of internal faults. An FDMB controller consists of two main parts, the first is fault detection and diagnosis (FDD); and the second is controller reconfiguration (CR). Systems subject to such faults are modelled as stochastic hybrid dynamic model. Each fault is deterministically represented by a mode in a discrete set of models. The FDD is used with interacting multiple-model (IMM) estimator and the CR is used with generalized predictive control (GPC) algorithm. Simulations for the proposed controller are illustrated and analysed
RoBuSt: A Crash-Failure-Resistant Distributed Storage System
In this work we present the first distributed storage system that is provably
robust against crash failures issued by an adaptive adversary, i.e., for each
batch of requests the adversary can decide based on the entire system state
which servers will be unavailable for that batch of requests. Despite up to
crashed servers, with constant and
denoting the number of servers, our system can correctly process any batch of
lookup and write requests (with at most a polylogarithmic number of requests
issued at each non-crashed server) in at most a polylogarithmic number of
communication rounds, with at most polylogarithmic time and work at each server
and only a logarithmic storage overhead.
Our system is based on previous work by Eikel and Scheideler (SPAA 2013), who
presented IRIS, a distributed information system that is provably robust
against the same kind of crash failures. However, IRIS is only able to serve
lookup requests. Handling both lookup and write requests has turned out to
require major changes in the design of IRIS.Comment: Revised full versio
Extended Model driven Architecture to B Method
International audienceModel Driven Architecture (MDA) design approach proposes to separate design into two stages: implementation independent stage then an implementation-dependent one. This improves the reusability, the reusability, the standability, the maintainability, etc. Here we show how MDA can be augmented using a formal refinement approach: B method. Doing so enables to gradually refine the development from the abstract specification to the executing implementation; furthermore it permits to prove the coherence between components in low levels even if they are implemented in different technologies
- …