Search CORE

3,568 research outputs found

Performance Debugging and Tuning using an Instruction-Set Simulator

Author: Magnusson Peter S.
Montelius Johan
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1997
Field of study

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Fast, accurate and flexible data locality analysis

Author: González Colás Antonio María
Sánchez Navarro F. Jesús
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1998
Field of study

This paper presents a tool based on a new approach for analyzing the locality exhibited by data memory references. The tool is very fast because it is based on a static locality analysis enhanced with very simple profiling information, which results in a negligible slowdown. This feature allows the tool to be used for highly time-consuming applications and to include it as a step in a typical iterative analysis-optimization process. The tool can provide a detailed evaluation of the reuse exhibited by a program, quantifying and qualifying the different types of misses either globally or detailed by program sections, data structures, memory instructions, etc. The accuracy of the tool is validated by comparing its results with those provided by a simulator.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

GPU acceleration of time-domain fluorescence lifetime imaging

Author: Chen Yu
Li David Day-Uei
Nowotny Thomas
Wu Gang
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 06/01/2016
Field of study

Fluorescence lifetime imaging microscopy (FLIM) plays a significant role in biological sciences, chemistry, and medical research. We propose a Graphic Processing Units (GPUs) based FLIM analysis tool suitable for high-speed and flexible time-domain FLIM applications. With a large number of parallel processors, GPUs can significantly speed up lifetime calculations compared to CPU-OpenMP (parallel computing with multiple CPU cores) based analysis. We demonstrate how to implement and optimize FLIM algorithms on GPUs for both iterative and non-iterative FLIM analysis algorithms. The implemented algorithms have been tested on both synthesized and experimental FLIM data. The results show that at the same precision the GPU analysis can be up to 24-fold faster than its CPU-OpenMP counterpart. This means that even for high precision but time-consuming iterative FLIM algorithms, GPUs enable fast or even real-time analysis

Crossref

University of Strathclyde Institutional Repository

Sussex Research Online

PC-CUBE: A Personal Computer Based Hypercube

Author: Breaden Matt
Chang Douglas
Chen Stanley
Cole Terry
Fox Geoffrey
Ho Alex
Snyder Scott
Walker David
Publication venue
Publication date: 01/01/1988
Field of study

PC-CUBE is an ensemble of IBM PCs or close compatibles connected in the hypercube topology with ordinary computer cables. Communication occurs at the rate of 115.2 K-band via the RS-232 serial links. Available for PC-CUBE is the Crystalline Operating System III (CrOS III), Mercury Operating System, CUBIX and PLOTIX which are parallel I/O and graphics libraries. A CrOS performance monitor was developed to facilitate the measurement of communication and computation time of a program and their effects on performance. Also available are CXLISP, a parallel version of the XLISP interpreter; GRAFIX, some graphics routines for the EGA and CGA; and a general execution profiler for determining execution time spent by program subroutines. PC-CUBE provides a programming environment similar to all hypercube systems running CrOS III, Mercury and CUBIX. In addition, every node (personal computer) has its own graphics display monitor and storage devices. These allow data to be displayed or stored at every processor, which has much instructional value and enables easier debugging of applications. Some application programs which are taken from the book Solving Problems on Concurrent Processors (Fox 88) were implemented with graphics enhancement on PC-CUBE. The applications range from solving the Mandelbrot set, Laplace equation, wave equation, long range force interaction, to WaTor, an ecological simulation

Crossref

NASA Technical Reports Server

Design and Simulation of High Performance Parallel Architectures Using the ISAC Language

Author: . Adam Husár
. Dušan Kolář
. Jakub Křoustek
. Karel Masařík
. Tomáš Hruška
. Zdeněk Přikryl
Publication venue: GSTF Journal on Computing (JoC)
Publication date: 13/09/2014
Field of study

Most of modern embedded systems for multimediaand network applications are based on parallel data streamprocessing. The data processing can be done using very longinstruction word processors (VLIW), or using more than onehigh performance application-specific instruction set processor(ASIPs), or even by their combination on single chip.Design and testing of these complex systems is time-consumingand iterative process. Architecture description languages (ADLs)are one of the most effective solutions for single processor design.However, support for description of parallel architectures andmulti-processor systems is very low or completely missing innowadays ADLs. This article presents utilization of newextensions for existing architecture description language ISAC.These extensions are used for easy and fast prototyping andtesting of parallel based systems and processors

GSTF Digital Library (GSTF-DL): Open Journal Systems (Global Science and Technology Forum)