Search CORE

24 research outputs found

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Author: Aristidis Sotiropoulos
C.-T. King
D. Patterson
E. Hodzic
E. Hodzic
F. Desprez
G. Goumas
Georgios Tsoukalas
H. R. Arabnia
J. Ramanujam
J. Xue
J. Xue
J.-P. Sheu
J.-P. Sheu
K. Hogstedt
M. Kandemir
Maria Athanasaki
N. J. Boden
N. Manjikian
N. Park
Nectarios Koziris
P. Boulet
P. Tsanakas
Panayiotis Tsanakas
S. M. Bhandarkar
T. Andronikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Crude geopolitics: territory and governance in post-peak oil imaginaries

Author: Agnew J.
Alexander N.
Bates A.
Blacksell M.
Buck H. J.
Cobb K.
Cranston M.
Curtis C.
Edwards C.
Elden S.
Eschbach A.
Flynn W. R.
Ghosh A.
Grubb A.
Held D.
Hobbes T.
Kaminski F.
Kunstler J. H.
Manjikian M.
Mcbay A.
McCarthy C.
Mitchell D.
O’hear N.
Paterson M.
Rousseau J.-J.
Said E.
Scarrow A.
Schumacher E.
Seymour J.
Seymour J.
Seymour J.
Urry J.
Publication venue: 'Informa UK Limited'
Publication date: 08/03/2017
Field of study

Crossref

Plymouth Electronic Archive and Research Library

A Comparison of Compiler Tiling Algorithms

Author: A. Lebeck
D. Gannon
J. Ferrante
K.S. McKinley
N. Manjikian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

Crossref

Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

Author: C. Amza
M. Hall
N. Manjikian
W. Blume
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Analyzing data reuse for cache reconfiguration

Author: Burger D.
Givargis T.
J. Hu
M. J. Irwin
M. Kandemir
Manjikian N.
N. Vijaykrishnan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Improving Cache Effectiveness through Array Data Layout Manipulation in SAC

Author: C. Grelck
C. Grelck
D. F. Bacon
D. Gannon
G. Rivera
K. McKinley
N. Manjikian
S. Ghosh
S.-B. Scholz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Design and implementation of the numachine multiprocessor

Author: A. Grbic
G. Lemieux
K. Loveless
M. Gusat
M. Stumm
N. Manjikian
R. Grindley
S. Brown
S. Caranci
S. Srbljic
Z. Vranesic
Z. Zilic
Publication venue: ACM
Publication date: 01/01/1998
Field of study

This paper describes the design and implementation of the NUMAchine multiprocessor. As the market for CC-NUMA multiprocessors expands, this research project provides a timely architectural design and cost-effective prototype. The key to the successful implementation of our 48-processor prototype is the use of off-the-shelf components and programmable logic devices. Since this machine will serve as a research vehicle for parallel software development, a number of hardware features to enhance experimentation have been included in the design.

CiteSeerX

Crossref

The Hector Multiprocessor

Author: A. Elkateeb
A. Grbic
B. Gamsa
G. Lemieux
K. Loveless
K. Sevcik
M. Gusat
M. Stumm
N. Manjikian
O. Krieger
P. Pereira
R. Grindley
S. Brown
S. Caranci
S. Srbljic
T. Abdelrahman
Z. Vranesic
Z. Zilic
Publication venue
Publication date
Field of study

NUMAchine is a cache-coherent shared-memory multiprocessor designed to have high-performance, be cost-effective, modular, and easy to program for efficient parallel execution. Processors, caches, and memory are distributed across a number of stations interconnected by a hierarchy of unidirectional bitparallel rings. The simplicity of the interconnection network permits the use of wide datapaths at each node, and a novel scheme for routing packets between stations enables high-speed operation of the rings in order to reduce latency. The ring hierarchy provides useful features, such as efficient multicasting and order-preserving message transfers, which are exploited by the cache coherence protocol, for low-latency invalidation of shared data. The hardware is designed so that cache coherence traffic is restricted to localized sections of the machine whenever possible. NUMAchine is optimized for applications with good locality, and system software is designed to maximize locality. Results from detailed behavioral simulations to evaluate architectural tradeoffs indicate that a prototype implementation will perform well for a variety of parallel applications.

CiteSeerX

The NUMAchine Multiprocessor

Author: A. Grbic
B. Gamsa
D. DeVries
G. Lemieux
K. Loveless
M. Gusat
M. Stumm
N. Manjikian
O. Krieger
P. McHardy
R. Grindley
R. Ho
S. Brown
S. Caranci
S. Srbljic
T. Abdelrahman
Z. Vranesic
Z. Zilic
Publication venue: IEEE Computer Society
Publication date
Field of study

Small-scale multiprocessors are becoming increasingly economical and common, whereas larger multiprocessors continue to have higher per-node costs. The NUMAchine multiprocessor project seeks to make large-scale multiprocessors more economical while maintaining high performance by exploring architectural and hardware features for low-cost, modular multiprocessors. To demonstrate our approach, we have implemented a prototype system that is scalable to 128 processors. An efficient directory-based cache coherence protocol exploits our hierarchical ringbased interconnect and supports sequential consistency. This paper documents the design choices and the resulting performance of the system using both simulation results and measurements on the prototype hardware

CiteSeerX