3,797 research outputs found

    A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems

    Full text link
    Supercomputing systems today often come in the form of large numbers of commodity systems linked together into a computing cluster. These systems, like any distributed system, can have large numbers of independent hardware components cooperating or collaborating on a computation. Unfortunately, any of this vast number of components can fail at any time, resulting in potentially erroneous output. In order to improve the robustness of supercomputing applications in the presence of failures, many techniques have been developed to provide resilience to these kinds of system faults. This survey provides an overview of these various fault-tolerance techniques.Comment: 11 page

    Middleware Technologies for Cloud of Things - a survey

    Get PDF
    The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such as management, aggregation and storage for big produced data. In order to tackle some of these issues, cloud computing emerged to IoT as Cloud of Things (CoT) which provides virtually unlimited cloud services to enhance the large scale IoT platforms. There are several factors to be considered in design and implementation of a CoT platform. One of the most important and challenging problems is the heterogeneity of different objects. This problem can be addressed by deploying suitable "Middleware". Middleware sits between things and applications that make a reliable platform for communication among things with different interfaces, operating systems, and architectures. The main aim of this paper is to study the middleware technologies for CoT. Toward this end, we first present the main features and characteristics of middlewares. Next we study different architecture styles and service domains. Then we presents several middlewares that are suitable for CoT based platforms and lastly a list of current challenges and issues in design of CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268, Digital Communications and Networks, Elsevier (2017

    Middleware Technologies for Cloud of Things - a survey

    Full text link
    The next wave of communication and applications rely on the new services provided by Internet of Things which is becoming an important aspect in human and machines future. The IoT services are a key solution for providing smart environments in homes, buildings and cities. In the era of a massive number of connected things and objects with a high grow rate, several challenges have been raised such as management, aggregation and storage for big produced data. In order to tackle some of these issues, cloud computing emerged to IoT as Cloud of Things (CoT) which provides virtually unlimited cloud services to enhance the large scale IoT platforms. There are several factors to be considered in design and implementation of a CoT platform. One of the most important and challenging problems is the heterogeneity of different objects. This problem can be addressed by deploying suitable "Middleware". Middleware sits between things and applications that make a reliable platform for communication among things with different interfaces, operating systems, and architectures. The main aim of this paper is to study the middleware technologies for CoT. Toward this end, we first present the main features and characteristics of middlewares. Next we study different architecture styles and service domains. Then we presents several middlewares that are suitable for CoT based platforms and lastly a list of current challenges and issues in design of CoT based middlewares is discussed.Comment: http://www.sciencedirect.com/science/article/pii/S2352864817301268, Digital Communications and Networks, Elsevier (2017

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Software engineering and middleware: a roadmap (Invited talk)

    Get PDF
    The construction of a large class of distributed systems can be simplified by leveraging middleware, which is layered between network operating systems and application components. Middleware resolves heterogeneity and facilitates communication and coordination of distributed components. Existing middleware products enable software engineers to build systems that are distributed across a local-area network. State-of-the-art middleware research aims to push this boundary towards Internet-scale distribution, adaptive and reconfigurable middleware and middleware for dependable and wireless systems. The challenge for software engineering research is to devise notations, techniques, methods and tools for distributed system construction that systematically build and exploit the capabilities that middleware deliver
    • ā€¦
    corecore