Search CORE

2,621 research outputs found

Selective optical broadcasting in reconfigurable multiprocessor interconnects - art. no. 61850J

Author: ARTUNDO I
Dambre Joni
DEBAES C
DESMET L
Heirman Wim
Van Campenhout Jan
Publication venue
Publication date: 01/01/2006
Field of study

Performance Debugging and Tuning using an Instruction-Set Simulator

Author: Magnusson Peter S.
Montelius Johan
Publication venue: Swedish Institute of Computer Science
Publication date: 01/01/1997
Field of study

Instruction-set simulators allow programmers a detailed level of insight into, and control over, the execution of a program, including parallel programs and operating systems. In principle, instruction set simulation can model any target computer and gather any statistic. Furthermore, such simulators are usually portable, independent of compiler tools, and deterministic-allowing bugs to be recreated or measurements repeated. Though often viewed as being too slow for use as a general programming tool, in the last several years their performance has improved considerably. We describe SIMICS, an instruction set simulator of SPARC-based multiprocessors developed at SICS, in its rôle as a general programming tool. We discuss some of the benefits of using a tool such as SIMICS to support various tasks in software engineering, including debugging, testing, analysis, and performance tuning. We present in some detail two test cases, where we've used SimICS to support analysis and performance tuning of two applications, Penny and EQNTOTT. This work resulted in improved parallelism in, and understanding of, Penny, as well as a performance improvement for EQNTOTT of over a magnitude. We also present some early work on analyzing SPARC/Linux, demonstrating the ability of tools like SimICS to analyze operating systems

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Assessing load-sharing within optimistic simulation platforms

Author: PELLEGRINI ALESSANDRO
QUAGLIA Francesco
VITALI Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture that has been implemented within the ROOT-Sim package, namely an open source simulation platform adhering to the optimistic synchronization paradigm. This experimental study is essentially aimed at evaluating possible sources of overheads when supporting load-sharing. It has been based on differentiated workloads allowing us to generate different execution profiles in terms of, e.g., granularity/locality of the simulation events. © 2012 IEEE

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

Simulation models of shared-memory multiprocessor systems

Author: Coe Paul.
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive

Experimental research of a shared memory subsystem with limited queue length for specialized reconfigurable multiprocessor systems

Author: Martens-Atyushev Dmitry S.
Martyshkin Alexey I.
Publication venue: Instituto Federal de Educação, Ciência e Tecnologia de São Paulo (IFSP)
Publication date: 01/01/2022
Field of study

Recently, reconfigurable systems based on field programmable logic devices (FPLDs) have been widely used in high-performance computing. The paper discusses issues related to the experimental research of a shared memory subsystem with a limited queue length of specialized reconfigurable multiprocessor systems using the developed mathematical modelling method. The paper presents the results of the method proposed by the authors for modelling multiprocessor systems based on open queuing networks with limited queue lengths. Based on these conditions, as well as the architectural features of the investigated processor-memory subsystem, expressions are calculated to estimate the exchange time and the resulting delays at each exchange stage. During the research, the main attention was paid to the dependence of the increase in the number of processor nodes in the processor-memory subsystem. As a result, the data obtained showed that the processor growth significantly affects the exchange time, creating a significant load on the common bus, as well as increasing delays at the stages when request transfer operation from the processor to the memory is performed. At the same time, the inadequate behaviour of experimental results and inaccuracy of their values when using the basic modelling method are explicitly tracked, which is reflected in the obtained graphs. Computational experiments were carried out to calculate the probabilistic-temporal characteristics of the "processor-memory" subsystem using the developed mathematical modelling methods. Based on the experimental results, it was determined that the delays occurring in subsystem's nodes and the time of exchange between the processor and memory modules depend on the query parameters and the processor-memory subsystem’s architectural characteristics

DIALNET

Independent Journal of Management & Production

Modelling Heterogeneous DSP–FPGA Based System Partitioning with Extensions to the Spinach Simulation Environment

Author: Brogioli Michael
Cavallaro Joseph R.
Publication venue: IEEE
Publication date: 01/01/2005
Field of study

In this paper we present system-on-a-chip extensions to the Spinach simulation environment for rapidly prototyping heterogeneous DSP/FPGA based architectures, specifically in the embedded domain. This infrastructure has been successfully used to model systems varying from multiprocessor gigabit ethernet controllers to Texas Instruments C6x series DSP based systems with tightly coupled FPGA based coprocessors for computational offloading. As an illustrative example of this toolsets functionality, we investigate workload partitioning in heterogeneous DSP/FPGA based embedded environments. Specifically, we focus on computational offloading of matrix multiplication kernels across DSP/FPGA based embedded architectures

CiteSeerX

Crossref

DSpace at Rice University

System-Level Design Methodologies for Networked Multiprocessor Systems-on-Chip

Author: Virk Kashif Munir
Publication venue
Publication date: 01/11/2008
Field of study

Online Research Database In Technology

Efficient parallel architecture for highly coupled real-time linear system applications

Author: Barua Soumavo
Carroll Chester C.
Homaifar Abdollah
Publication venue
Publication date
Field of study

A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics

NASA Technical Reports Server