Search CORE

3,070 research outputs found

Hierarchical clustered register file organization for VLIW processors

Author: Ayguadé Parra Eduard
Llosa Espuny José Francisco
Valero Cortés Mateo
Zalamea León Francisco Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Preliminary basic performance analysis of the Cedar multiprocessor memory system

Author: Gallivan K.
Jalby W.
Turner S.
Veidenbaum A.
Wijshoff H.
Publication venue
Publication date
Field of study

Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system

NASA Technical Reports Server

Analysis of memory use for improved design and compile-time allocation of local memory

Author: Davidson Edward S.
Mcniven Geoffrey D.
Publication venue
Publication date
Field of study

Trace analysis techniques are used to study memory referencing behavior for the purpose of designing local memories and determining how to allocate them for data and instructions. In an attempt to assess the inherent behavior of the source code, the trace analysis system described here reduced the effects of the compiler and host architecture on the trace by using a technical called flattening. The variables in the trace, their associated single-assignment values, and references are histogrammed on the basis of various parameters describing memory referencing behavior. Bounds are developed specifying the amount of memory space required to store all live values in a particular histogram class. The reduction achieved in main memory traffic by allocating local memory is specified for each class

NASA Technical Reports Server

Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

Author: Babich Ronald
Clark Michael A.
Joó Bálint
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing 2010 (submitted April 12, 2010

arXiv.org e-Print Archive

CiteSeerX

MGSim - Simulation tools for multi-core processor architectures

Author: Fu Jian
Jesshope Chris R.
Lankamp Mike
Poss Raphael
Uddin Irfan
Yang Qiang
Publication venue
Publication date: 01/01/2013
Field of study

MGSim is an open source discrete event simulator for on-chip hardware components, developed at the University of Amsterdam. It is intended to be a research and teaching vehicle to study the fine-grained hardware/software interactions on many-core and hardware multithreaded processors. It includes support for core models with different instruction sets, a configurable multi-core interconnect, multiple configurable cache and memory models, a dedicated I/O subsystem, and comprehensive monitoring and interaction facilities. The default model configuration shipped with MGSim implements Microgrids, a many-core architecture with hardware concurrency management. MGSim is furthermore written mostly in C++ and uses object classes to represent chip components. It is optimized for architecture models that can be described as process networks.Comment: 33 pages, 22 figures, 4 listings, 2 table

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Evaluation of the Cedar memory system: Configuration of 16 by 16

Author: Gallivan K.
Jalby W.
Wijshoff H.
Publication venue
Publication date
Field of study

Some basic results on the performance of the Cedar multiprocessor system are presented. Empirical results on the 16 processor 16 memory bank system configuration, which show the behavior of the Cedar system under different modes of operation are presented

NASA Technical Reports Server