293 research outputs found

    RITSim: distributed systemC simulation

    Get PDF
    Parallel or distributed simulation is becoming more than a novel way to speedup design evaluation; it is becoming necessary for simulating modern processors in a reasonable timeframe. As architectural features become faster, smaller, and more complex, designers are interested in obtaining detailed and accurate performance and power estimations. Uniprocessor simulators may not be able to meet such demands. The RITSim project uses SystemC to model a processor microarchitecture and memory subsystem in great detail. SystemC is a C++ library built on a discrete-event simulation kernel. Many projects have successfully implemented parallel discrete-event simulation (PDES) frameworks to distribute simulation among several hosts. The field promises significant simulation speedup, possibly leading to faster turnaround time in design space exploration and commercial production. However, parallel implementation of such simulators is not an easy task. It requires modification of the simulation kernel for effective partitioning and synchronization. This thesis explores PDES techniques and presents a distributed version of the SystemC simulation environment. With minimal user interaction, SystemC models can executed on a cluster of workstations using a message-passing library such as the Message Passing Interface (MPI). The implementation is designed for transparency; distribution and synchronization happen with little intervention by the model author. Modification of SystemC is fashioned to promote maintainability with future releases. Furthermore, only freely available libraries are used for maximum flexibility and portability

    Transparent and efficient shared-state management for optimistic simulations on multi-core machines

    Get PDF
    Traditionally, Logical Processes (LPs) forming a simulation model store their execution information into disjoint simulations states, forcing events exchange to communicate data between each other. In this work we propose the design and implementation of an extension to the traditional Time Warp (optimistic) synchronization protocol for parallel/distributed simulation, targeted at shared-memory/multicore machines, allowing LPs to share parts of their simulation states by using global variables. In order to preserve optimism's intrinsic properties, global variables are transparently mapped to multi-version ones, so to avoid any form of safety predicate verification upon updates. Execution's consistency is ensured via the introduction of a new rollback scheme which is triggered upon the detection of an incorrect global variable's read. At the same time, efficiency in the execution is guaranteed by the exploitation of non-blocking algorithms in order to manage the multi-version variables' lists. Furthermore, our proposal is integrated with the simulation model's code through software instrumentation, in order to allow the application-level programmer to avoid using any specific API to mark or to inform the simulation kernel of updates to global variables. Thus we support full transparency. An assessment of our proposal, comparing it with a traditional message-passing implementation of variables' multi-version is provided as well. © 2012 IEEE

    Shared memory as a basis for conservative distributed architectural simulation

    Get PDF
    Journal ArticleThis paper describes experience in parallelizing an execution-driven architectural simulation system used in the development and evaluation of the Avalanche distributed architecture. It reports on a specific application of conservative distributed simulation on a shared memory platform. Various communication-intensive synchronization algorithms are described and evaluated. Performance results on a bus-based shared memory platform are reported, and extension and scalability of the implementation to larger distributed shared memory configurations are discussed. Also addressed are specific characteristics of architectural simulations that contribute to decisions relating to the conservatism of the approach and to the achievable performance

    Distributed simulation of building systems for legacy software reuse

    Get PDF
    The use of integrated building performance simulation can substantially help in improving a building design with regards to comfort levels and fuel consumption, while reducing emission of greenhouse gasses. However, the traditional tools that are closed for inter-communication, limit the modeler to use of components only available within that particular package. This paper gives an overview of distributed simulation approach that can alleviate above limitation. Each program can represent only a part of a building system that is able to model, exchanging the necessary information during the execution and bridging the gaps between the tools. Several important issues closely connected with its implementation, such as synchronization, are pointed out, and the sensitivity of a model on different coupling strategies is studied. The paper concludes with highlighting the gained flexibility in modeling and simulation of building performance that arises from the distributed approach

    DIstributed VIRtual System (DIVIRS) Project

    Get PDF
    The development of Prospero moved from the University of Washington to ISI and several new versions of the software were released from ISI during the contract period. Changes in the first release from ISI included bug fixes and extensions to support the needs of specific users. Among these changes was a new option to directory queries that allows attributes to be returned for all files in a directory together with the directory listing. This change greatly improves the performance of their server and reduces the number of packets sent across their trans-pacific connection to the rest of the internet. Several new access method were added to the Prospero file method. The Prospero Data Access Protocol was designed, to support secure retrieval of data from systems running Prospero

    Using Java for plasma PIC simulations

    Get PDF
    Plasma particle-in-cell (PIC) simulations model the interactions of charged particles with the surrounding fields. This application has been recognized as one of the grand challenge problems facing the high-performance computing community due to its huge computational requirements. Recently, with the explosive development of Internet, Java is receiving increasing attention and is thought as a potential candidate for high-performance computing. In this paper, we present our approach to developing 2- and 3-dimensional parallel PIC simulations in Java. We also report the execution times for both versions from performance experiments on a symmetric multi-processor (Sun E6500) and a Linux cluster of Pentium III machines. Those results are also compared with benchmark measurements of the corresponding Fortran version of the same algorithm

    Exploiting implicit parallelism in SPARC instruction execution

    Get PDF
    One way to increase the performance of a processing unit is to exploit implicit parallelism. Exploiting this parallelism requires a processor to dynamically select instructions in a serial instruction stream which can be executed in parallel. As operations are computed concurrently, an execution speedup will occur. This thesis studies how effectively implicit parallelism could be exploited in the Scalable Pro cessor Architecture (SPARC)[9], a reduced instruction set architecture developed by Sun Microsystems. First an analysis of SPARC instruction traces will determine the optimal speedup that would be realized by a processor with infinite resources. Next, an analytical model of a parallelizing processor will be developed and used to predict the effects of limited resources on optimal speedup. Lastly, a SPARC simulator will be employed to determine the actual speedup of resource limited configurations, and the results will be correlated with the analytical model

    Asynchronous Validity Resolution in Sequentially Consistent Shared Virtual Memory

    Get PDF
    Shared Virtual Memory (SVM) is an effort to provide a mechanism for a distributed system, such as a cluster, to execute shared memory parallel programs. Unfortunately, SVM has performance problems due to its underlying distributed architecture. Recent developments have increased performance of SVM by reducing communication. Unfortunately this performance gain was only possible by increasing programming complexity and by restricting the types of programs allowed to execute in the system. Validity resolution is the process of resolving the validity of a memory object such as a page. Current SVM systems use synchronous or deferred validity resolution techniques in which user processing is blocked during the validity resolution process. This is the case even when resolving validity of false shared variables. False-sharing occurs when two or more processes access unrelated variables stored within the same shared block of memory and at least one of the processes is writing. False sharing unnecessarily reduces overall performance of SVM systems?because user processing is blocked during validity resolution although no actual data dependencies exist. This thesis presents Asynchronous Validity Resolution (AVR), a new approach to SVM which reduces the performance losses associated with false sharing while maintaining the ease of programming found with regular shared memory parallel programming methodology. Asynchronous validity resolution allows concurrent user process execution and data validity resolution. AVR is evaluated by com-paring performance of an application suite using both an AVR sequentially con-sistent SVM system and a traditional sequentially consistent (SC) SVM system. The results show that AVR can increase performance over traditional sequentially consistent SVM for programs which exhibit false sharing. Although AVR outperforms regular SC by as much as 26%, performance of AVR is dependent on the number of false-sharing vs. true-sharing accesses, the number of pages in the program’s working set, the amount of user computation that completes per page request, and the internodal round-trip message time in the system. Overall, the results show that AVR could be an important member of the arsenal of tools available to parallel programmers
    • …
    corecore