Search CORE

61 research outputs found

LU Decomposition on Cell Broadband Engine: An Empirical Study to Exploit Heterogeneous Chip Multiprocessors

Author: A.E. Eichenberger
J.A. Kahle
J.J. Buoni
O. Beaumont
Q. Yi
Publication venue: W&M ScholarWorks
Publication date: 01/01/2010
Field of study

To meet the needs of high performance computing, the Cell Broadband Engine owns many features that differ from traditional processors, such as the large number of synergistic processor elements, large register files, the ability to hide main-storage latency with concurrent computation and DMA transfers. The exploitation of those features requires the programmer to carefully tailor programs and simutaneously deal with various performance factors, including locality, load balance, communication overhead, and multi-level parallelism. These factors, unfortunately, are dependent on each other; an optimization that enhances one factor may degrade another. This paper presents our experience on optimizing LU decomposition, one of the commonly used algebra kernels in scientific computing, on Cell Broadband Engine. The optimizations exploit task-level, data-level, and communication-level parallelism. We study the effects of different task distribution strategies, prefetch, and software cache, and explore the tradeoff among different performance factors, stressing the interactions between different optimizations. This work offers some insights in the optimizations on heterogenous multi-core processors, including the selection of programming models, considerations in task distribution, and the holistic perspective required in optimizations

Crossref

College of William & Mary: W&M Publish

Dynamic Load Balancing of Matrix-Vector Multiplications on Roadrunner Compute Nodes

Author: C. Xu
G.R. Morris
J.A. Kahle
M.J. Zaki
R. Barrett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Crossref

Implementing Fine/Medium Grained TLP Support in a Many-Core Architecture

Author: C. Kyriacou
D. August
D.E. Culler
G. Almási
J.A. Kahle
K. Sankaralingam
K.M. Kavi
M.J. Flynn
M.R. Guthaus
P. Pierre
T. Harris
Publication venue
Publication date: 01/01/2009
Field of study

Abstract. We believe that future many-core architectures should support a simple and scalable way to execute many threads that are generated by parallel programs. A good candidate to implement an efficient and scalable execution of threads is the DTA (Decoupled Threaded Architecture), which is designed to exploit fine/medium grained Thread Level Parallelism (TLP) by using a hardware scheduling unit and relying on existing simple cores. In this paper, we present an initial implementation of DTA concept in a many-core architecture where it interacts with other architectural components designed from scratch in order to address the problem of scalability. We present initial results that show the scalability of the solution that were obtained using a many-core simulator written in SARCSim (a variant of UNISIM) with DTA support

CiteSeerX

Crossref

Archivio della Ricerca - Università degli Studi di Siena

Scalability Evaluation of a Polymorphic Register File: A CG Case Study

Author: A. Ramirez
D. Kuck
J.A. Kahle
L. Gwennap
R. Das
R. Ferrer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Author: D. Lenoski
D. Wentzlaff
Dimitrios S. Nikolopoulos
H. Shan
I. Schoinas
J. Leverich
J.A. Kahle
J.M. Mellor-Crummey
K. Gharachorloo
M. Wen
M.M.K. Martin
Manolis Katevenis
Michail Zampetakis
P.S. Magnusson
S.L. Scott
S.P. Amarasinghe
S.W. Keckler
Stamatis Kavadias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

Run-time Spatial Mapping of Streaming Applications to Heterogeneous Multi-Processor Systems

Author: A.D. Primentel
Ch. Ykman-Couvreur
D. Stiliadis
G.C. Buttazzo
Gerard J. M. Smit
J.A. Kahle
J.A. Stankovic
Jan Kuper
Johann M. Hurink
L. Benini
L. Kai
L. Keqin
Philip K. F. Hölzenspies
T. Ojanpera
Timon D. ter Braak
W.J. Dally
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Climate change and oak growth decline: Dendroecology and stand productivity of a Turkey oak (Quercus cerris L.) old stored coppice in Central Italy

Author: A. Alessandrini
A. Filippo Di
A. Mariotti
A. Mariotti
A. Ragazzi
A. Vannini
A. Vannini
A.P. Kirilenko
A.S. Jump
Alfredo Alessandrini
Alfredo Filippo
B. Efron
B. Marçais
B. Pokharel
B.S. Pedersen
Bravo D. Nogués
C. Barbaroux
C. Dittmar
C. Rosenzweig
C.D. Allen
D. Hirayama
D. LeBlanc
D. Martín-Benito
D. Sarris
D. Schröter
D.A. Friedrichs
D.A. Orwig
E. Amorini
E. Mosca
E.A. Lilleskov
E.R. Cook
E.R. Cook
E.R. Cook
F. Biondi
F. Biondi
F. Biondi
F. Lebourgeois
F. Lebourgeois
F.D. Meyer
Franco Biondi
G. Piovesan
G. Piovesan
G. Piovesan
G. Piovesan
Gianluca Piovesan
H.P. Kahle
I. Drobyshev
I. Drobyshev
J. Guiot
J. Vieira
J.A. Brakel
J.C. Linares
J.C. Tardif
J.M. Metsaranta
J.P. Dwyer
J.W. Hurrell
K. Čufar
L. Corcuera
L. Corcuera
Luigi Portoghesi
M. Agrimi
M. Bascietto
M. Bianchi
M. Brunetti
M. Dobbertin
M. Luis De
M. Nahm
M. Parry
M. Romagnoli
M.A. Stokes
M.L. Desprez-Loustau
Mantgem P.J. Van
Marca O. La
Marca O. La
N. Anselmi
N. Bréda
N. McDowell
O. Bouriaud
O. Ciancio
O. Planchon
P. Cherubini
P. Claps
P. Corona
P. Weber
P.J. Salter
P.V. Arrigoni
R Development Core Team
R. Zweifel
R.L. Swaty
S. Chhin
S. Helama
S.L. Voelker
S.W. Running
Silvia Blasi
T. Standovár
T.M.L. Wigley
T.S. David
V. Rozas
V. Selås
Y. Ohno
Ü. Akkemik
Publication venue: 'EDP Sciences'
Publication date
Field of study

Crossref

Advances in Simultaneous Multithreading Testcase Generation Methods

Author: D.W. Victor
J.A. Kahle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Asymmetry-Aware Scheduling in Heterogeneous Multi-core Architectures

Author: E. Ayguade
J.A. Kahle
M. Hill
N. Binkert
R.D. Blumofe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Part 4: Session 4: Multi-core Computing and GPUInternational audienceAs threads of execution in a multi-programmed computing environment have different characteristics and hardware resource requirements, heterogeneous multi-core processors can achieve higher performance as well as power efficiency than homogeneous multi-core processors. To fully tap into that potential, OS schedulers need to be heterogeneity-aware, so they can match threads to cores according to characteristics of both. We propose two heterogeneity-aware thread schedulers, PBS and LCSS. PBS makes scheduling based on applications’ sensitivity on large cores, and assigns large cores to applications that can achieve better performance gains. LCSS balances the large core resource among all applications. We have implemented these two schedulers in Linux and evaluated their performance with the PARSEC benchmark on different heterogeneous architectures. Overall, PBS outperforms Linux scheduler by 13.3% on average and up to 18%. LCSS achieves a speedup of 5.3% on average and up to 6% over Linux scheduler. Besides, PBS brings good performance with both asymmetric and symmetric workloads, while LCSS is more suitable for scheduling symmetric workloads. In summary, PBS and LCSS provide repeatability of performance measurement and better performance than the Linux OS scheduler

Crossref