30 research outputs found

    A load-sharing architecture for high performance optimistic simulations on multi-core machines

    Get PDF
    In Parallel Discrete Event Simulation (PDES), the simulation model is partitioned into a set of distinct Logical Processes (LPs) which are allowed to concurrently execute simulation events. In this work we present an innovative approach to load-sharing on multi-core/multiprocessor machines, targeted at the optimistic PDES paradigm, where LPs are speculatively allowed to process simulation events with no preventive verification of causal consistency, and actual consistency violations (if any) are recovered via rollback techniques. In our approach, each simulation kernel instance, in charge of hosting and executing a specific set of LPs, runs a set of worker threads, which can be dynamically activated/deactivated on the basis of a distributed algorithm. The latter relies in turn on an analytical model that provides indications on how to reassign processor/core usage across the kernels in order to handle the simulation workload as efficiently as possible. We also present a real implementation of our load-sharing architecture within the ROme OpTimistic Simulator (ROOT-Sim), namely an open-source C-based simulation platform implemented according to the PDES paradigm and the optimistic synchronization approach. Experimental results for an assessment of the validity of our proposal are presented as well

    A fine-grain time-sharing Time Warp system

    Get PDF
    Although Parallel Discrete Event Simulation (PDES) platforms relying on the Time Warp (optimistic) synchronization protocol already allow for exploiting parallelism, several techniques have been proposed to further favor performance. Among them we can mention optimized approaches for state restore, as well as techniques for load balancing or (dynamically) controlling the speculation degree, the latter being specifically targeted at reducing the incidence of causality errors leading to waste of computation. However, in state of the art Time Warp systems, events’ processing is not preemptable, which may prevent the possibility to promptly react to the injection of higher priority (say lower timestamp) events. Delaying the processing of these events may, in turn, give rise to higher incidence of incorrect speculation. In this article we present the design and realization of a fine-grain time-sharing Time Warp system, to be run on multi-core Linux machines, which makes systematic use of event preemption in order to dynamically reassign the CPU to higher priority events/tasks. Our proposal is based on a truly dual mode execution, application vs platform, which includes a timer-interrupt based support for bringing control back to platform mode for possible CPU reassignment according to very fine grain periods. The latter facility is offered by an ad-hoc timer-interrupt management module for Linux, which we release, together with the overall time-sharing support, within the open source ROOT-Sim platform. An experimental assessment based on the classical PHOLD benchmark and two real world models is presented, which shows how our proposal effectively leads to the reduction of the incidence of causality errors, as compared to traditional Time Warp, especially when running with higher degrees of parallelism

    Time-Sharing Time Warp via Lightweight Operating System Support

    Get PDF
    The order according to which the different tasks are carried out within a Time Warp platform has a direct impact on performance, given that event processing is speculative, thus being subject to the possibility of being rolled-back. It is typically recognized that not-yet-executed events having lower timestamps should be given higher CPU-schedule priority, since this contributes to keep low the amount of rollbacks. However, common Time Warp platforms usually execute events as atomic actions. Hence control is bounced back to the underlying simulation platform only at the end of the current event processing routine. In other words, CPU-scheduling of events resembles classical batch-multitasking scheduling, which is recognized not to promptly react to variations of the priority of pending tasks (e.g. associated with the injection of new events in the system). In this article we present the design and implementation of a time-sharing Time Warp platform, to be run on multi-core machines, where the platform-level software is allowed to take back control on a periodical basis (with fine grain period), and to possibly preempt any ongoing event processing activity in favor of dispatching (along the same thread) any other event that is revealed to have higher priority. Our proposal is based on an ad-hoc kernel module for Linux, which implements a fine grain timer-interrupt mechanism with lightweight management, which is fully integrated with the modern top/bottom-half timer-interrupt Linux architecture, and which does not induce any bias in terms of relative CPU-usage planning across Time Warp vs non-Time Warp threads running on the machine. Our time-sharing architecture has been integrated within the open source ROOT-Sim optimistic simulation package, and we also report some experimental data for an assessment of our proposal

    Experiments in distributed memory time warp

    Get PDF

    Cache-Aware Memory Manager for Optimistic Simulations

    Get PDF
    Parallel Discrete Event Simulation is a well known technique for executing complex general-purpose simulations where models are described as objects the interaction of which is expressed through the generation of impulsive events. In particular, Optimistic Simulation allows full exploitation of the available computational power, avoiding the need to compute safety properties for the events to be executed. Optimistic Simulation platforms internally rely on several data structures, which are meant to support operations aimed at ensuring correctness, inter-kernel communication and/or event scheduling. These housekeeping and management operations access them according to complex patterns, commonly suffering from misuse of memory caching architectures. In particular, operations like log/restore access data structures on a periodic basis, producing the replacement of in-cache buffers related to the actual working set of the application logic, producing a non-negligible performance drop. In this work we propose generally-applicable design principles for a new memory management subsystem targeted at Optimistic Simulation platforms which can face this issue by wisely allocating memory buffers depending on their actual future access patterns, in order to enhance event-execution memory locality. Additionally, an application-transparent implementation within ROOT-Sim, an open-source generalpurpose optimistic simulation platform, is presented along with experimental results testing our proposal

    Master/worker parallel discrete event simulation

    Get PDF
    The execution of parallel discrete event simulation across metacomputing infrastructures is examined. A master/worker architecture for parallel discrete event simulation is proposed providing robust executions under a dynamic set of services with system-level support for fault tolerance, semi-automated client-directed load balancing, portability across heterogeneous machines, and the ability to run codes on idle or time-sharing clients without significant interaction by users. Research questions and challenges associated with issues and limitations with the work distribution paradigm, targeted computational domain, performance metrics, and the intended class of applications to be used in this context are analyzed and discussed. A portable web services approach to master/worker parallel discrete event simulation is proposed and evaluated with subsequent optimizations to increase the efficiency of large-scale simulation execution through distributed master service design and intrinsic overhead reduction. New techniques for addressing challenges associated with optimistic parallel discrete event simulation across metacomputing such as rollbacks and message unsending with an inherently different computation paradigm utilizing master services and time windows are proposed and examined. Results indicate that a master/worker approach utilizing loosely coupled resources is a viable means for high throughput parallel discrete event simulation by enhancing existing computational capacity or providing alternate execution capability for less time-critical codes.Ph.D.Committee Chair: Fujimoto, Richard; Committee Member: Bader, David; Committee Member: Perumalla, Kalyan; Committee Member: Riley, George; Committee Member: Vuduc, Richar

    Energy and Reliability Management in Parallel Real-Time Systems

    Get PDF
    Historically, slack time in real-time systems has been used as temporal redundancy by rollback recovery schemes to increase system reliability in the presence of faults. However, with advancedtechnologies, slack time can also be used by energy management schemes to save energy. For reliable real-time systems where higher levels of reliability are as important as lower levels of energy consumption, centralized management of slack time is desired.For frame-based parallel real-time applications, energy management schemes are first explored. Although the simple static power management that evenly allocates static slack over a schedule isoptimal for uni-processor systems, it is not optimal for parallel systems due to different levels of parallelism in a schedule. Taking parallelism variations into consideration, a parallel static power management scheme is proposed. When dynamic slack is considered,assuming global scheduling strategies, slack shifting and sharing schemes as well as speculation schemes are proposed for moreenergy savings.For simultaneous management of power and reliability, checkpointing techniques are first deployed to efficiently use slack time and theoptimal numbers of checkpoints needed to minimize energy consumption or to maximize system reliability are explored. Then, an energyefficient optimistic modular redundancy scheme is addressed. Finally, a framework that encompasses energy and reliability management isproposed for obtaining optimal redundant configurations. While exploring the trade-off between energy and reliability, the effects ofvoltage scaling on fault rates are considered

    Techniques for Transparent Parallelization of Discrete Event Simulation Models

    Get PDF
    Simulation is a powerful technique to represent the evolution of real-world phenomena or systems over time. It has been extensively used in different research fields (from medicine to biology, to economy, and to disaster rescue) to study the behaviour of complex systems during their evolution (symbiotic simulation) or before their actual realization (what-if analysis). A traditional way to achieve high performance simulations is the employment of Parallel Discrete Event Simulation (PDES) techniques, which are based on the partitioning of the simulation model into Logical Processes (LPs) that can execute events in parallel on different CPUs and/or different CPU cores, and rely on synchronization mechanisms to achieve causally consistent execution of simulation events. As it is well recognized, the optimistic synchronization approach, namely the Time Warp protocol, which is based on rollback for recovering possible timestamp-order violations due to the absence of block-until-safe policies for event processing, is likely to favour speedup in general application/ architectural contexts. However, the optimistic PDES paradigm implicitly relies on a programming model that shifts from traditional sequential-style programming, given that there is no notion of global address space (fully accessible while processing events at any LP). Furthermore, there is the underlying assumption that the code associated with event handlers cannot execute unrecoverable operations given their speculative processing nature. Nevertheless, even though no unrecoverable action is ever executed by event handlers, a means to actually undo the action if requested needs to be devised and implemented within the software stack. On the other hand, sequential-style programming is an easy paradigm for the development of simulation code, given that it does not require the programmer to reason about memory partitioning (and therefore message passing) and speculative (concurrent) processing of the application. In this thesis, we present methodological and technical innovations which will show how it is possible, by developing innovative runtime mechanisms, to allow a programmer to implement its simulation model in a fully sequential way, and have the underlying simulation framework to execute it in parallel according to speculative processing techniques. Some of the approaches we provide show applicability in either shared- or distributed-memory systems, while others will be specifically tailored to multi/many-core architectures. We will clearly show, during the development of these supports, what is the effect on performance of these solutions, which will nevertheless be negligible, allowing a fruitful exploitation of the available computing power. In the end, we will highlight which are the clear benefits on the programming model tha

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above

    Qualität und Nutzen - Über den Gebrauch von Zeit-Wert-Funktionen zur Integration qualitäts- und zeit-flexibler Aspekte in einer dynamischen Echtzeit-Einplanungsumgebung

    Get PDF
    Scheduling methodologies for real-time applications have been of keen interest to diverse research communities for several decades. Depending on the application area, algorithms have been developed that are tailored to specific requirements with respect to both the individual components of which an application is made up and the computational platform on which it is to be executed. Many real-time scheduling algorithms base their decisions solely or partly on timing constraints expressed by deadlines which must be met even under worst-case conditions. The increasing complexity of computing hardware means that worst-case execution time analysis becomes increasingly pessimistic. Scheduling hard real-time computations according to their worst-case execution times (which is common practice) will thus result, on average, in an increasing amount of spare capacity. The main goal of flexible real-time scheduling is to exploit this otherwise wasted capacity. Flexible scheduling schemes have been proposed to increase the ability of a real-time system to adapt to changing requirements and nondeterminism in the application behaviour. These models can be categorised as those whose source of flexibility is the quality of computations and those which are flexible regarding their timing constraints. This work describes a novel model which allows to specify both flexible timing constraints and quality profiles for an application. Furthermore, it demonstrates the applicability of this specification method to real-world examples and suggests a set of feasible scheduling algorithms for the proposed problem class.Einplanungsverfahren für Echtzeitanwendungen stehen seit Jahrzehnten im Interesse verschiedener Forschungsgruppen. Abhängig vom Anwendungsgebiet wurden Algorithmen entwickelt, welche an die spezifischen Anforderungen sowohl hinsichtlich der einzelnen Komponenten, aus welchen eine Anwendung besteht, als auch an die Rechnerplattform, auf der diese ausgeführt werden sollen, angepasst sind. Viele Echtzeit-Einplanungsverfahren gründen ihre Entscheidungen ausschließlich oder teilweise auf Zeitbedingungen, welche auch bei Auftreten maximaler Ausführungszeiten eingehalten werden müssen. Die zunehmende Komplexität von Rechner-Hardware bedeutet, dass die Worst-Case-Analyse in steigendem Maße pessimistisch wird. Die Einplanung harter Echtzeit-Berechnungen anhand ihrer maximalen Ausführungszeiten (was die gängige Praxis darstellt) resultiert daher im Regelfall in einer frei verfügbaren Rechenkapazität in steigender Höhe. Das Hauptziel flexibler Echtzeit-Einplanungsverfahren ist es, diese ansonsten verschwendete Kapazität auszunutzen. Flexible Einplanungsverfahren wurden vorgeschlagen, welche die Fähigkeit eines Echtzeitsystems erhöhen, sich an veränderte Anforderungen und Nichtdeterminismus im Verhalten der Anwendung anzupassen. Diese Modelle können unterteilt werden in solche, deren Quelle der Flexibilität die Qualität der Berechnungen ist, und jene, welche flexibel hinsichtlich ihrer Zeitbedingungen sind. Diese Arbeit beschreibt ein neuartiges Modell, welches es erlaubt, sowohl flexible Zeitbedingungen als auch Qualitätsprofile für eine Anwendung anzugeben. Außerdem demonstriert sie die Anwendbarkeit dieser Spezifikationsmethode auf reale Beispiele und schlägt eine Reihe von Einplanungsalgorithmen für die vorgestellte Problemklasse vor
    corecore