3,718 research outputs found

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    The Atomic Manifesto: a Story in Four Quarks

    Get PDF
    This report summarizes the viewpoints and insights gathered in the Dagstuhl Seminar on Atomicity in System Design and Execution, which was attended by 32 people from four different scientific communities: database and transaction processing systems, fault tolerance and dependable systems, formal methods for system design and correctness reasoning, and hardware architecture and programming languages. Each community presents its position in interpreting the notion of atomicity and the existing state of the art, and each community identifies scientific challenges that should be addressed in future work. In addition, the report discusses common themes across communities and strategic research problems that require multiple communities to team up for a viable solution. The general theme of how to specify, implement, compose, and reason about extended and relaxed notions of atomicity is viewed as a key piece in coping with the pressing issue of building and maintaining highly dependable systems that comprise many components with complex interaction patterns

    Software engineering and middleware: a roadmap (Invited talk)

    Get PDF
    The construction of a large class of distributed systems can be simplified by leveraging middleware, which is layered between network operating systems and application components. Middleware resolves heterogeneity and facilitates communication and coordination of distributed components. Existing middleware products enable software engineers to build systems that are distributed across a local-area network. State-of-the-art middleware research aims to push this boundary towards Internet-scale distribution, adaptive and reconfigurable middleware and middleware for dependable and wireless systems. The challenge for software engineering research is to devise notations, techniques, methods and tools for distributed system construction that systematically build and exploit the capabilities that middleware deliver

    Hardware-Assisted Dependable Systems

    Get PDF
    Unpredictable hardware faults and software bugs lead to application crashes, incorrect computations, unavailability of internet services, data losses, malfunctioning components, and consequently financial losses or even death of people. In particular, faults in microprocessors (CPUs) and memory corruption bugs are among the major unresolved issues of today. CPU faults may result in benign crashes and, more problematically, in silent data corruptions that can lead to catastrophic consequences, silently propagating from component to component and finally shutting down the whole system. Similarly, memory corruption bugs (memory-safety vulnerabilities) may result in a benign application crash but may also be exploited by a malicious hacker to gain control over the system or leak confidential data. Both these classes of errors are notoriously hard to detect and tolerate. Usual mitigation strategy is to apply ad-hoc local patches: checksums to protect specific computations against hardware faults and bug fixes to protect programs against known vulnerabilities. This strategy is unsatisfactory since it is prone to errors, requires significant manual effort, and protects only against anticipated faults. On the other extreme, Byzantine Fault Tolerance solutions defend against all kinds of hardware and software errors, but are inadequately expensive in terms of resources and performance overhead. In this thesis, we examine and propose five techniques to protect against hardware CPU faults and software memory-corruption bugs. All these techniques are hardware-assisted: they use recent advancements in CPU designs and modern CPU extensions. Three of these techniques target hardware CPU faults and rely on specific CPU features: ∆-encoding efficiently utilizes instruction-level parallelism of modern CPUs, Elzar re-purposes Intel AVX extensions, and HAFT builds on Intel TSX instructions. The rest two target software bugs: SGXBounds detects vulnerabilities inside Intel SGX enclaves, and “MPX Explained” analyzes the recent Intel MPX extension to protect against buffer overflow bugs. Our techniques achieve three goals: transparency, practicality, and efficiency. All our systems are implemented as compiler passes which transparently harden unmodified applications against hardware faults and software bugs. They are practical since they rely on commodity CPUs and require no specialized hardware or operating system support. Finally, they are efficient because they use hardware assistance in the form of CPU extensions to lower performance overhead

    Automatic Software Repair: a Bibliography

    Get PDF
    This article presents a survey on automatic software repair. Automatic software repair consists of automatically finding a solution to software bugs without human intervention. This article considers all kinds of repairs. First, it discusses behavioral repair where test suites, contracts, models, and crashing inputs are taken as oracle. Second, it discusses state repair, also known as runtime repair or runtime recovery, with techniques such as checkpoint and restart, reconfiguration, and invariant restoration. The uniqueness of this article is that it spans the research communities that contribute to this body of knowledge: software engineering, dependability, operating systems, programming languages, and security. It provides a novel and structured overview of the diversity of bug oracles and repair operators used in the literature

    Improving the Scalability of DPWS-Based Networked Infrastructures

    Full text link
    The Devices Profile for Web Services (DPWS) specification enables seamless discovery, configuration, and interoperability of networked devices in various settings, ranging from home automation and multimedia to manufacturing equipment and data centers. Unfortunately, the sheer simplicity of event notification mechanisms that makes it fit for resource-constrained devices, makes it hard to scale to large infrastructures with more stringent dependability requirements, ironically, where self-configuration would be most useful. In this report, we address this challenge with a proposal to integrate gossip-based dissemination in DPWS, thus maintaining compatibility with original assumptions of the specification, and avoiding a centralized configuration server or custom black-box middleware components. In detail, we show how our approach provides an evolutionary and non-intrusive solution to the scalability limitations of DPWS and experimentally evaluate it with an implementation based on the the Web Services for Devices (WS4D) Java Multi Edition DPWS Stack (JMEDS).Comment: 28 pages, Technical Repor
    corecore