Search CORE

30 research outputs found

Loop Distribution and Fusion with Timing and Code Size Optimization for Embedded DSPs

Author: E.H.-M. Sha
K. Kennedy
K.S. McKinley
M. Wolfe
Q. Zhuge
R. Allen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

International Conference on Embedded and Ubiquitous Computing (EUC 2005), Nagasaki, Japan,6-9 Dec 2005Loop distribution and loop fusion are two e.ective loop transformation techniques to optimize the execution of the programs in DSP applications. In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. We .rst develop the loop distribution theorems that state the legality conditions of loop distribution for multi-level nested loops. We show that if the summation of the edge weights of the dependence cycle satis.es a certain condition, then the statements involved in the dependence cycle can be distributed; otherwise, they should be put in the same loop after loop distribution. Then, we propose the technique of maximum loop distribution with direct loop fusion. The experimental results show that the execution time of the transformed loops by our technique is reduced 21.0compared to the original loops and the code size of the transformed loops is reduced 7.0% on average compared to the original loops.Department of Computin

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

Evidence of Color Coherence Effects in W+jets Events from ppbar Collisions at sqrt(s) = 1.8 TeV

Author: A. Baden
A. Belyaev
A. Boehnlein
A. Brandt
A. Bross
A. Dyshkant
A. Goussiou
A. Gupta
A. Jonckheere
A. Leflat
A. Lucotte
A. Mincer
A. Narayanan
A. Para
A. Santoro
A. Sznajder
A. Sánchez-Hernández
A. White
A. Zieminski
A. Zylberstejn
A.A. Mayorov
A.A. Volkov
A.K.A. Maciel
A.N. Galyaev
A.P. Heinson
A.P. Vorobiev
A.R. Clark
A.S. Ito
A.V. Kostritskiy
A.V. Kotwal
A.V. Kozelov
Abe
Acciarri
Althoff
Andersson
Andersson
B. Abbott
B. Baldin
B. Gibbard
B. Gobbi
B. Gómez
B. Hoeneisen
B. Klima
B. Knuteson
B. May
B. Pawlik
B.C. Choudhary
B.G. Pope
B.S. Acharya
Bassetto
Bassetto
C. Boswell
C. Cretsinger
C. Hays
C. Hebert
C. Klopfenstein
C. Miao
C. Shaffer
C. Yoshikawa
C.E. Gerber
C.S. Mishra
D. Buchholz
D. Casey
D. Chakraborty
D. Claes
D. Coppage
D. Cullen-Vidal
D. Cutts
D. Denisov
D. Edmunds
D. Fein
D. Hedin
D. Karmanov
D. Karmgard
D. Koltick
D. Lincoln
D. Norman
D. Owen
D. Shpakov
D. Stoker
D. Toback
D. Zieminska
D.A. Stoyanova
D.K. Cho
D.L. Adams
D.R. Green
D.R. Wood
E. Barberis
E. Flattum
E. Gallas
E. Shabalina
E. Smith
E.A. Kozlovsky
E.G. Zverev
E.W. Anderson
E.W. Varnes
F. Borcherding
F. Hsieh
F. Landry
F. Nang
F. Stichelbaut
G. Blazey
G. Briskin
G. Di Loreto
G. Eppley
G. Ginther
G. Guglielmo
G. Gutierrez
G. Gómez
G. Landsberg
G. Steinbrück
G. Watts
G.A. Alves
G.E. Forden
G.R. Snow
H. Castilla-Valdez
H. da Motta
H. Evans
H. Gordon
H. Greenlee
H. Haggerty
H. Jöstlein
H. Miettinen
H. Piekarz
H. Schellman
H. Singh
H. Weerts
H.A. Neal
H.B. Prosper
H.C. Shankar
H.D. Wahl
H.E. Fisk
H.E. Montgomery
H.L. Melanson
H.S. Mao
H.T. Diehl
I. Adam
I. Bertram
J. Bantly
J. Cochran
J. Ellison
J. Estrada
J. Jaques
J. Kotcher
J. Krane
J. Li
J. Linnemann
J. McDonald
J. McKinley
J. Perkins
J. Qian
J. Rutherfoord
J. Sculli
J. Snow
J. Solomon
J. Tarazi
J. Thompson
J. Warchol
J. Womersley
J. Yu
J.A. Green
J.A. Guida
J.A. Wightman
J.B. Singh
J.D. Hobbs
J.F. Bartlett
J.G. Lu
J.G.R. Lima
J.H. Christenson
J.L. González Solı́s
J.M. Butler
J.M. Guida
J.M. Hauptman
J.M. Kohli
J.P. Negret
J.S. Hoftun
J.T. White
J.V.D. Wirjawan
K. Davis
K. De
K. Del Signore
K. Genser
K. Gounder
K. Johns
K. Streets
K. Yip
K.C. Frame
K.M. Chan
K.M. Mauritz
K.S. Hahn
K.W. Merritt
Khoze
L. Babukhadia
L. Coney
L. Lueking
L. Magaña-Mendoza
L. Oesch
L. Sawyer
L.T. Goss
L.V. Dudko
M. Abolins
M. Adams
M. Bhattacharjee
M. Chung
M. Demarteau
M. Diesburg
M. Fortner
M. Johnson
M. Jones
M. Kubantsev
M. Merkin
M. Mostafa
M. Narain
M. Paterno
M. Peters
M. Rijssenbeek
M. Roco
M. Shupe
M. Sosebee
M. Souza
M. Strauss
M. Strovink
M. Tartaglia
M. Wayne
M. Zanabria
M. Zielinski
M.A.C. Cummings
M.I. Martin
M.K. Fatyga
M.L. Stevenson
M.M. Baarmand
M.R. Krishnaswamy
Marchesini
N. Amos
N. Graf
N. Mokhov
N. Oshima
N. Parashar
N. Parua
N. Sotnikova
N. Varelas
N.I. Bojko
N.J. Hadley
N.K. Mondal
N.R. Stanton
N.W. Reay
O. Ramirez
O.I. Dahl
O.V. Eroshin
P. Baringer
P. Bloom
P. Draper
P. Ermolov
P. Gartung
P. Grudberg
P. Gutierrez
P. Hanlet
P. Nemethy
P. Padley
P. Rubinov
P. Slattery
P. Tamburello
P. Yamin
P. Yepes
P.C. Bhat
P.D. Grannis
P.I. Goncharov
P.M. Tuts
P.Z. Quintas
Q.Z. Li
R. Breedon
R. Brock
R. Engelmann
R. Hernández-Montoya
R. Hirosky
R. Jesik
R. Kehoe
R. Lipton
R. Madden
R. Markeloff
R. McCarthy
R. Partridge
R. Piegaia
R. Raja
R. Ruchti
R. Snihur
R. Yamada
R.A. Sidwell
R.D. Martin
R.D. Schamberger
R.E. Hall
R.J. Genik II
R.J. Madaras
R.K. Shivpuri
R.P. Smith
R.W. Stephens
S. Ahn
S. Banerjee
S. Blessing
S. Choi
S. Chopra
S. Eno
S. Feher
S. Fuess
S. Grinstein
S. Grünendahl
S. Hagopian
S. Hansen
S. Kahn
S. Krzywdzinski
S. Kuleshov
S. Kunori
S. Mani
S. Protopopescu
S. Rajagopalan
S. Reucroft
S. Snyder
S. Willis
S. Youssef
S.A. Jerger
S.B. Beri
S.J. Wimpenny
S.K. Kim
S.L. Linn
S.N. Gurzhiev
S.P. Denisov
S.R. Dugad
S.V. Chekulaev
S.Y. Jun
Sheldon
Sjöstrand
T. Fahland
T. Ferbel
T. Heuring
T. Joffe-Minor
T. Marshall
T. McKibben
T. McMahon
T. Rockwell
T. Yasuda
T.G. Trippe
T.L. Geld
T.L.T. Thomas
Tong Hu
U. Heintz
V. Abramov
V. Akimov
V. Bhatnagar
V. Gavrilov
V. Hagopian
V. Manankov
V. Oguri
V. Sirotenko
V. Stolin
V. Vaniev
V. Zutshi
V.A. Bezzubov
V.D. Elvira
V.N. Evdokimov
V.S. Burtovoi
V.S. Narasimham
V.V. Babintsev
W. Carvalho
W. Chen
W. Ko
W.E. Cooper
W.G. Cobau
X.F. Song
Y. Ducros
Y. Fisyak
Y. Gershtein
Y. Kulik
Y. Pischalnikov
Y. Yu
Y.M. Park
Z. Casilum
Z. Zhou
Z.H. Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/1999
Field of study

We report the results of a study of color coherence effects in ppbar collisions based on data collected by the D0 detector during the 1994-1995 run of the Fermilab Tevatron Collider, at a center of mass energy sqrt(s) = 1.8 TeV. Initial-to-final state color interference effects are studied by examining particle distribution patterns in events with a W boson and at least one jet. The data are compared to Monte Carlo simulations with different color coherence implementations and to an analytic modified-leading-logarithm perturbative calculation based on the local parton-hadron duality hypothesis.Comment: 13 pages, 6 figures. Submitted to Physics Letters

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Crossref

HAL-IN2P3

Hal - Université Grenoble Alpes

UCL Discovery

Oxford University Research Archive

The GLEAM 200-MHz local radio luminosity function for AGN and star-forming galaxies

Author: Bell M.E.
Callingham J.R.
Chhetri R.
Dwarakanath K.S.
For B.
Franzen T.M.O.
Gaensler B.M.
Hancock P.J.
Hindson L.
Hurley-Walker N.
Jackson C.A.
Johnston-Hollitt M.
Kapińska A.D.
Lenc E.
Mauch T.
McKinley B.
Morgan J.
Offringa A.R.
Procopio P.
Quici B.
Sadler E.M.
Seymour N.
Staveley-Smith L.
Wayth R.B.
White S.V.
Wu C.
Zheng Q.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2021
Field of study

Galaxie

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Leiden University Scholary Publications

Compiler Directed Parallelization of Loops in Scale for Shared-Memory Multiprocessors

Author: K.S. McKinley
K.S. McKinley
W. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A Comparison of Compiler Tiling Algorithms

Author: A. Lebeck
D. Gannon
J. Ferrante
K.S. McKinley
N. Manjikian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

Crossref

Path-based reuse distance analysis

Author: D.C. Burger
K. Beyls
K. Beyls
K.S. McKinley
R.L. Mattson
W. Pugh
Publication venue
Publication date: 01/01/2006
Field of study

Profiling can effectively analyze program behavior and provide critical information for feedback-directed or dynamic optimizations. Based on memory profiling, reuse distance analysis has shown much promise in predicting data locality for a program using inputs other than the profiled ones. Both wholeprogram and instruction-based locality can be accurately predicted by reuse distance analysis. Reuse distance analysis abstracts a cluster of memory references for a particular instruction having similar reuse distance values into a locality pattern. Prior work has shown that a significant number of memory instructions have multiple locality patterns, a property not desirable for many instruction-based memory optimizations. This paper investigates the relationship between locality patterns and execution paths by analyzing reuse distance distribution along each dynamic path to an instruction. Here a path is defined as the program execution trace from the previous access of a memory location to the current access. By differentiating locality patterns with the context of execution paths, the proposed analysis can expose optimization opportunities tailored only to a specific subset of paths leading to an instruction. In this paper, we present an effective method for path-based reuse distance profiling and analysis. We have observed that a significant percentage of the multiple locality patterns for an instruction can be uniquely related to a particular execution path in the program. In addition, we have also investigated the influence of inputs on reuse distance distribution for each path/instruction pair. The experimental results show that the path-based reuse distance is highly predictable, as a function of the data size, for a set of SPEC CPU2000 programs

CiteSeerX

Crossref

Michigan Technological University

Compiler Optimizations for Improving Data Locality

Author: Cart S.
Chau-Wen Tseng
Ferrante J.
Kathryn S. McKinley
Kennedy K.
McKinley K.S.
Steve Carr
Wolf M.E.
Wolfe M.J.
Publication venue
Publication date: 01/01/1994
Field of study

In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effective when programs exhibit data locality. In this paper, we present compiler optimizations to improve data locality basedon a simple yet accurate cost model. The model computes both temporal and spatial reuse of cache lines to find desirable loop organizations. The cost model drives the application of compound transformations consisting of loop permutation, loop fusion, loop distribution, and loop reversal. We demonstrate that these program transformations are useful for optimizing many programs. To validate our optimization strategy, we implemented our algorithms and ran experiments on a large collection of scientific programs and kernels. Experiments with kernels illustrate that our model and algorithm can select and achieve the best performance. For over thirty complete applications, we executed the origi..

CiteSeerX

Crossref

Michigan Technological University

Programming for Locality and Parallelism with Hierarchically Tiled Arrays

Author: C. McKellar
F.G. Gustavson
K.E. Iverson
K.S. McKinley
M.E. Wolf
R. Chandra
R.W. Numrich
S. Hiranandani
W. Gropp
W.A. Abu-Sufah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

RDVIS: A tool that visualizes the causes of low locality and hints program optimizations

Author: A. Goldberg
A.K. Jain
A.R. Lebeck
E. Berg
E. Vanderdeijl
J. Mellor-Crummey
J. Weidendorfer
K. Beyls
K.S. McKinley
R. Bosch
T. Sherwood
Publication venue: Springer
Publication date: 01/01/2005
Field of study

The visualization tool rdvis is presented which aims at helping the programmer to find program transformations to improve temporal data locality. We present a number of locality metrics that capture the necessary information. Based on a cluster analysis of basic block vectors, the tool gives strong hints on which program transformations are needed. The visualizer allowed us to find the necessary transformations for three SPEC2000 programs in just a few minutes. After performing these transformations, the programs run 3 times faster on average on a number of different platforms

CiteSeerX

Crossref

Ghent University Academic Bibliography

Loop Transformation Recipes for Code Generation and Auto-Tuning

Author: B. Norris
B. Pugh
D.K. Kaushik
J.R. Herrero
K.D. Cooper
K.S. McKinley
M. Frigo
M. Jiménez
M. Wolfe
M.E. Wolf
P.M.W. Knijnenburg
R.C. Whaley
S. Carr
S. Donadio
S. Girbal
Publication venue
Publication date: 01/01/2010
Field of study

Abstract. In this paper, we describe transformation recipes, which provide a high-level interface to the code transformation and code generation capability of a compiler. These recipes can be generated by compiler decision algorithms or savvy software developers. This interface is part of an auto-tuning framework that explores a set of different implementations of the same computation and automatically selects the best-performing implementation. Along with the original computation, a transformation recipe specifies a range of implementations of the computation resulting from composing a set of high-level code transformations. In our system, an underlying polyhedral framework coupled with transformation algorithms takes this set of transformations, composes them and automatically generates correct code. We first describe an abstract interface for transformation recipes, which we propose to facilitate interoperability with other transformation frameworks. We then focus on the specific transformation recipe interface used in our compiler and present performance results on its application to kernel and library tuning and tuning of key computations in high-end applications. We also show how this framework can be used to generate and auto-tune parallel OpenMP or CUDA code from a high-level specification.

CiteSeerX

Crossref