26,600 research outputs found

    Evaluation of containers as a virtualisation alternative for HEP workloads

    Get PDF
    In this paper the emerging technology of Linux containers is examined and evaluated for use in the High Energy Physics (HEP) community. Key technologies required to enable containerisation will be discussed along with emerging technologies used to manage container images. An evaluation of the requirements for containers within HEP will be made and benchmarking will be carried out to asses performance over a range of HEP workflows. The use of containers will be placed in a broader context and recommendations on future work will be given

    A Study of Concurrency Bugs and Advanced Development Support for Actor-based Programs

    Full text link
    The actor model is an attractive foundation for developing concurrent applications because actors are isolated concurrent entities that communicate through asynchronous messages and do not share state. Thereby, they avoid concurrency bugs such as data races, but are not immune to concurrency bugs in general. This study taxonomizes concurrency bugs in actor-based programs reported in literature. Furthermore, it analyzes the bugs to identify the patterns causing them as well as their observable behavior. Based on this taxonomy, we further analyze the literature and find that current approaches to static analysis and testing focus on communication deadlocks and message protocol violations. However, they do not provide solutions to identify livelocks and behavioral deadlocks. The insights obtained in this study can be used to improve debugging support for actor-based programs with new debugging techniques to identify the root cause of complex concurrency bugs.Comment: - Submitted for review - Removed section 6 "Research Roadmap for Debuggers", its content was summarized in the Future Work section - Added references for section 1, section 3, section 4.3 and section 5.1 - Updated citation

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Ada (trademark) projects at NASA. Runtime environment issues and recommendations

    Get PDF
    Ada practitioners should use this document to discuss and establish common short term requirements for Ada runtime environments. The major current Ada runtime environment issues are identified through the analysis of some of the Ada efforts at NASA and other research centers. The runtime environment characteristics of major compilers are compared while alternate runtime implementations are reviewed. Modifications and extensions to the Ada Language Reference Manual to address some of these runtime issues are proposed. Three classes of projects focusing on the most critical runtime features of Ada are recommended, including a range of immediately feasible full scale Ada development projects. Also, a list of runtime features and procurement issues is proposed for consideration by the vendors, contractors and the government

    Enabling the “Easy Button” for Broad, Parallel Optimization of Functions Evaluated by Simulation

    Get PDF
    Java Optimization by Simulation (JOBS) is presented: an open-source, object-oriented Java library designed to enable the study, research, and use of optimization for models evaluated by simulation. JOBS includes several novel design features that make it easy for a simulation modeler, without extensive expertise in optimization or parallel computation, to define an optimization model with deterministic and/or stochastic constraints, choose one or more metaheuristics to solve it and run, using massively parallel function evaluation to reduce wall-clock times. JOBS is supported by a new language independent, application programming interface (API) for remote simulation model evaluation and a serverless computing environment to provide massively parallel function evaluation, on demand. Dynamic loop scheduling methods are evaluated in the serverless environment with the opportunity for significant resource contention for master node computing power and network bandwidth. JOBS implements several population-based and single-solution improvement metaheuristics (solvers) for real, discrete, and mixed problems. The object-oriented design is extendible with classes that drastically reduce the amount of code required to implement a new solver and encourage re-use of solvers as building blocks for creating new multi-stage solvers or memetic algorithms

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3
    • …
    corecore