3,797 research outputs found
Recommended from our members
Fault diversity among off-the-shelf SQL database servers
Fault tolerance is often the only viable way of obtaining the required system dependability from systems built out of "off-the-shelf" (OTS) products. We have studied a sample of bug reports from four off-the-shelf SQL servers so as to estimate the possible advantages of software fault tolerance - in the form of modular redundancy with diversity - in complex off-the-shelf software. We checked whether these bugs would cause coincident failures in more than one of the servers. We found that very few bugs affected two of the four servers, and none caused failures in more than two. We also found that only four of these bugs would cause identical, undetectable failures in two servers. Therefore, a fault-tolerant server, built with diverse off-the-shelf servers, seems to have a good chance of delivering improvements in availability and failure rates compared with the individual off-the-shelf servers or their replicated, nondiverse configurations
Recommended from our members
Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers
If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present
A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems
Supercomputing systems today often come in the form of large numbers of
commodity systems linked together into a computing cluster. These systems, like
any distributed system, can have large numbers of independent hardware
components cooperating or collaborating on a computation. Unfortunately, any of
this vast number of components can fail at any time, resulting in potentially
erroneous output. In order to improve the robustness of supercomputing
applications in the presence of failures, many techniques have been developed
to provide resilience to these kinds of system faults. This survey provides an
overview of these various fault-tolerance techniques.Comment: 11 page
Middleware Technologies for Cloud of Things - a survey
The next wave of communication and applications rely on the new services
provided by Internet of Things which is becoming an important aspect in human
and machines future. The IoT services are a key solution for providing smart
environments in homes, buildings and cities. In the era of a massive number of
connected things and objects with a high grow rate, several challenges have
been raised such as management, aggregation and storage for big produced data.
In order to tackle some of these issues, cloud computing emerged to IoT as
Cloud of Things (CoT) which provides virtually unlimited cloud services to
enhance the large scale IoT platforms. There are several factors to be
considered in design and implementation of a CoT platform. One of the most
important and challenging problems is the heterogeneity of different objects.
This problem can be addressed by deploying suitable "Middleware". Middleware
sits between things and applications that make a reliable platform for
communication among things with different interfaces, operating systems, and
architectures. The main aim of this paper is to study the middleware
technologies for CoT. Toward this end, we first present the main features and
characteristics of middlewares. Next we study different architecture styles and
service domains. Then we presents several middlewares that are suitable for CoT
based platforms and lastly a list of current challenges and issues in design of
CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268,
Digital Communications and Networks, Elsevier (2017
Middleware Technologies for Cloud of Things - a survey
The next wave of communication and applications rely on the new services
provided by Internet of Things which is becoming an important aspect in human
and machines future. The IoT services are a key solution for providing smart
environments in homes, buildings and cities. In the era of a massive number of
connected things and objects with a high grow rate, several challenges have
been raised such as management, aggregation and storage for big produced data.
In order to tackle some of these issues, cloud computing emerged to IoT as
Cloud of Things (CoT) which provides virtually unlimited cloud services to
enhance the large scale IoT platforms. There are several factors to be
considered in design and implementation of a CoT platform. One of the most
important and challenging problems is the heterogeneity of different objects.
This problem can be addressed by deploying suitable "Middleware". Middleware
sits between things and applications that make a reliable platform for
communication among things with different interfaces, operating systems, and
architectures. The main aim of this paper is to study the middleware
technologies for CoT. Toward this end, we first present the main features and
characteristics of middlewares. Next we study different architecture styles and
service domains. Then we presents several middlewares that are suitable for CoT
based platforms and lastly a list of current challenges and issues in design of
CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268,
Digital Communications and Networks, Elsevier (2017
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Software engineering and middleware: a roadmap (Invited talk)
The construction of a large class of distributed systems can be simplified by leveraging middleware, which is layered between network operating systems and application components. Middleware resolves heterogeneity and facilitates communication and coordination of distributed components. Existing middleware products enable software engineers to build systems that are distributed across a local-area network. State-of-the-art middleware research aims to push this boundary towards Internet-scale distribution, adaptive and reconfigurable middleware and middleware for dependable and wireless systems. The challenge for software engineering research is to devise notations, techniques, methods and tools for distributed system construction that systematically build and exploit the capabilities that middleware deliver
- ā¦