4,102 research outputs found
Fast Accurate Simulation of Large Shared Memory Multiprocessors
Fast computer simulation is an essential tool in the design of large parallel computers. We discuss the design and performance of our Fast Accurate Simulation Tool, FAST. We start by summarizing the tradeoffs made in the designs of this and other simulators. The key ideas used in this simulator involve execution driven simulation techniques that modify the object code of the application program being studied. This produces an augmented version of the code that is directly executed and performs much of the work of the simulation. We extend the previous work in execution driven simulation by introducing several new uses for code augmentation. The result of these techniques is a simulator that is one to two orders of magnitude faster than previous simulators of comparable accuracy
Performance Debugging and Tuning using an Instruction-Set Simulator
Instruction-set simulators allow programmers a detailed level of insight into,
and control over, the execution of a program, including parallel programs and
operating systems. In principle, instruction set simulation can model any
target computer and gather any statistic. Furthermore, such simulators are
usually portable, independent of compiler tools, and deterministic-allowing
bugs to be recreated or measurements repeated. Though often viewed as being
too slow for use as a general programming tool, in the last several years
their performance has improved considerably.
We describe SIMICS, an instruction set simulator of SPARC-based
multiprocessors developed at SICS, in its rôle as a general programming tool.
We discuss some of the benefits of using a tool such as SIMICS to support
various tasks in software engineering, including debugging, testing, analysis,
and performance tuning. We present in some detail two test cases, where we've
used SimICS to support analysis and performance tuning of two applications,
Penny and EQNTOTT. This work resulted in improved parallelism in, and
understanding of, Penny, as well as a performance improvement for EQNTOTT of
over a magnitude. We also present some early work on analyzing SPARC/Linux,
demonstrating the ability of tools like SimICS to analyze operating systems
Accelerating Monte Carlo simulations with an NVIDIA® graphics processor
Modern graphics cards, commonly used in desktop computers, have evolved beyond a simple interface between processor and display to incorporate sophisticated calculation engines that can be applied to general purpose computing. The Monte Carlo algorithm for modelling photon transport in turbid media has been implemented on an NVIDIA® 8800gt graphics card using the CUDA toolkit. The Monte Carlo method relies on following the trajectory of millions of photons through the sample, often taking hours or days to complete. The graphics-processor implementation, processing roughly 110 million scattering events per second, was found to run more than 70 times faster than a similar, single-threaded implementation on a 2.67 GHz desktop computer
Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs
Computational methods and software systems for dynamics and control of large space structures
Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers
Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport
Monte Carlo simulation is the most accurate method for absorbed dose
calculations in radiotherapy. Its efficiency still requires improvement for
routine clinical applications, especially for online adaptive radiotherapy. In
this paper, we report our recent development on a GPU-based Monte Carlo dose
calculation code for coupled electron-photon transport. We have implemented the
Dose Planning Method (DPM) Monte Carlo dose calculation package (Sempau et al,
Phys. Med. Biol., 45(2000)2263-2291) on GPU architecture under CUDA platform.
The implementation has been tested with respect to the original sequential DPM
code on CPU in phantoms with water-lung-water or water-bone-water slab
geometry. A 20 MeV mono-energetic electron point source or a 6 MV photon point
source is used in our validation. The results demonstrate adequate accuracy of
our GPU implementation for both electron and photon beams in radiotherapy
energy range. Speed up factors of about 5.0 ~ 6.6 times have been observed,
using an NVIDIA Tesla C1060 GPU card against a 2.27GHz Intel Xeon CPU
processor.Comment: 13 pages, 3 figures, and 1 table. Paper revised. Figures update
- …