Search CORE

1,003 research outputs found

Performance Debugging and Tuning using an Instruction-Set Simulator

Author: Magnusson Peter S.
Montelius Johan
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1997
Field of study

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

SAPPORO: A way to turn your graphics cards into a GRAPE-6

Author: Aarseth
Anderson
Belleman
Dorband
Evghenii Gaburov
Fernando
Fernando
Ford
Gualandris
Harfst
Harfst
Heggie
Makino
Makino
Nitadori
Plummer
Portegies Zwart
Portegies Zwart
Portegies Zwart
Simon Portegies Zwart
Stefan Harfst
Sussman
van Meel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present Sapporo, a library for performing high-precision gravitational N-body simulations on NVIDIA Graphical Processing Units (GPUs). Our library mimics the GRAPE-6 library, and N-body codes currently running on GRAPE-6 can switch to Sapporo by a simple relinking of the library. The precision of our library is comparable to that of GRAPE-6, even though internally the GPU hardware is limited to single precision arithmetics. This limitation is effectively overcome by emulating double precision for calculating the distance between particles. The performance loss of this operation is small (< 20%) compared to the advantage of being able to run at high precision. We tested the library using several GRAPE-6-enabled N-body codes, in particular with Starlab and phiGRAPE. We measured peak performance of 800 Gflop/s for running with 10^6 particles on a PC with four commercial G92 architecture GPUs (two GeForce 9800GX2). As a production test, we simulated a 32k Plummer model with equal mass stars well beyond core collapse. The simulation took 41 days, during which the mean performance was 113 Gflop/s. The GPU did not show any problems from running in a production environment for such an extended period of time.Comment: 13 pages, 9 figures, accepted to New Astronom

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

A NoC-based hybrid message-passing/shared-memory approach to CMP design

Author: Agarwal
Daemen
Forsell
Grecu
Karniadakis
Lorensen
Mario R. Casu
Massimo Ruo Roch
Maurizio Zamboni
Owens
Paulin
Radulescu
Sergio V. Tota
Snir
Tota
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Lock-free Concurrent Data Structures

Author: Cederman Daniel
Gidenstam Anders
Ha Phuong
Papatriantafilou Marina
Sundell Håkan
Tsigas Philippas
Publication venue
Publication date: 01/01/2013
Field of study

Concurrent data structures are the data sharing side of parallel programming. Data structures give the means to the program to store data, but also provide operations to the program to access and manipulate these data. These operations are implemented through algorithms that have to be efficient. In the sequential setting, data structures are crucially important for the performance of the respective computation. In the parallel programming setting, their importance becomes more crucial because of the increased use of data and resource sharing for utilizing parallelism. The first and main goal of this chapter is to provide a sufficient background and intuition to help the interested reader to navigate in the complex research area of lock-free data structures. The second goal is to offer the programmer familiarity to the subject that will allow her to use truly concurrent methods.Comment: To appear in "Programming Multi-core and Many-core Computing Systems", eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computin

arXiv.org e-Print Archive

Chalmers Research

AER Spiking Neuron Computation on GPUs: The Frame-to-AER Generation

Author: Domínguez Morales Manuel Jesús
Díaz del Río Fernando
Jiménez Moreno Gabriel
Linares Barranco Alejandro
López-Torres Manuel Ramón
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Neuro-inspired processing tries to imitate the nervous system and may resolve complex problems, such as visual recognition. The spike-based philosophy based on the Address-Event-Representation (AER) is a neuromorphic interchip communication protocol that allows for massive connectivity between neurons. Some of the AER-based systems can achieve very high performances in real-time applications. This philosophy is very different from standard image processing, which considers the visual information as a succession of frames. These frames need to be processed in order to extract a result. This usually requires very expensive operations and high computing resource consumption. Due to its relative youth, nowadays AER systems are short of cost-effective tools like emulators, simulators, testers, debuggers, etc. In this paper the first results of a CUDA-based tool focused on the functional processing of AER spikes is presented, with the aim of helping in the design and testing of filters and buses management of these systems.Ministerio de Educación y Ciencia TEC2009-10639-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Highly accelerated simulations of glassy dynamics using GPUs: caveats on limited floating-point precision

Author: Anderson
Block
Dekker
Felix Höfling
Flenner
Frenkel
Götze
Hansen
Harvey
Knuth
Knuth
Kob
Kob
Kob
Lippert
Liu
Mosayebi
Peter H. Colberg
Plimpton
Preis
Rapaport
Sagan
Stone
van Meel
Voelz
Weeks
Xu
Yang
Zagha
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Modern graphics processing units (GPUs) provide impressive computing resources, which can be accessed conveniently through the CUDA programming interface. We describe how GPUs can be used to considerably speed up molecular dynamics (MD) simulations for system sizes ranging up to about 1 million particles. Particular emphasis is put on the numerical long-time stability in terms of energy and momentum conservation, and caveats on limited floating-point precision are issued. Strict energy conservation over 10^8 MD steps is obtained by double-single emulation of the floating-point arithmetic in accuracy-critical parts of the algorithm. For the slow dynamics of a supercooled binary Lennard-Jones mixture, we demonstrate that the use of single-floating point precision may result in quantitatively and even physically wrong results. For simulations of a Lennard-Jones fluid, the described implementation shows speedup factors of up to 80 compared to a serial implementation for the CPU, and a single GPU was found to compare with a parallelised MD simulation using 64 distributed cores.Comment: 12 pages, 7 figures, to appear in Comp. Phys. Comm., HALMD package licensed under the GPL, see http://research.colberg.org/projects/halm

arXiv.org e-Print Archive

Institute of Transport Research:Publications

Crossref

Parallel and Distributed Immersive Real-Time Simulation of Large-Scale Networks

Author: Jason Liu
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen