Search CORE

53 research outputs found

Formal Verification of Parallel Stream Compaction and Summed-Area Table Algorithms

Author: A Amighi
B Jacobs
D Horn
GE Blelloch
J Boyland
L de Moura
M Harris
M Safari
M Zheng
P Collingbourne
P Müller
PM Kogge
S Blom
S Blom
Publication venue: Springer
Publication date: 25/11/2020
Field of study

Crossref

University of Twente Research Information

GPU-Accelerated Large-Eddy Simulation of Turbulent Channel Flows

Author: Antoniou A.S.
Briggs W. L.
Cheng W.
Chorin A.J.
Chung D.
Deardorff J.W.
Driest E.V.
Geveler M.
Griebel M.
Hoyas S.
Jacobsen D.A.
Jacobsen D.A.
Jacobsen D.A.
Kerr A.
Kogge P. M.
Meneveau C.
Smagorinksy J.
The Portland Group
Thibault J.C.
Publication venue: 'IUScholarWorks'
Publication date: 09/01/2012
Field of study

High performance computing clusters that are augmented with cost and power efficient graphics processing unit (GPU) provide new opportunities to broaden the use of large-eddy simulation technique to study high Reynolds number turbulent flows in fluids engineering applications. In this paper, we extend our earlier work on multi-GPU acceleration of an incompressible Navier-Stokes solver to include a large-eddy simulation (LES) capability. In particular, we implement the Lagrangian dynamic subgrid scale model and compare our results against existing direct numerical simulation (DNS) data of a turbulent channel flow at Reτ = 180. Overall, our LES results match fairly well with the DNS data. Our results show that the Reτ = 180 case can be entirely simulated on a single GPU, whereas higher Reynolds cases can benefit from a GPU cluster

Crossref

Boise State University - ScholarWorks

Active memory controller

Author: A Ailamaki
A Gottlieb
A Saulsbury
Ali Ibrahim
C Batten
C Cascaval
D Kim
D Patterson
DH Albonesi
DJ Sorin
DJ Sorin
DS Nikolopoulos
F Petrini
G Blelloch
G Marin
I Zotov
J Kuskin
J Laudon
J Torrellas
J Torrellas
JB Brockman
JH Ahn
JM Mellor-Crummey
John B. Carter
K Keeton
KM Chandy
L Zhang
L Zhang
L Zhao
LA Barroso
Lixin Zhang
M Garzaran
M Hall
M Hao
M Oskin
Michael A. Parker
P Kogge
PA Boncz
R Kalla
RE Kessler
S Chatterjee
S Kumar
S Scott
Sally A. McKee
T Anderson
T Eicken von
V Tipparaju
Xiaowei Jiang
Y Solihin
Y Solihin
Z Fang
Zhen Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Inability to hide main memory latency has been increasingly limiting the performance of modern processors. The problem is worse in large-scale shared memory systems, where remote memory latencies are hundreds, and soon thousands, of processor cycles. To mitigate this problem, we propose an intelligent memory and cache coherence controller (AMC) that can execute Active Memory Operations (AMOs). AMOs are select operations sent to and executed on the home memory controller of data. AMOs can eliminate a significant number of coherence messages, minimize intranode and internode memory traffic, and create opportunities for parallelism. Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips. In this paper, we present the microarchitecture design of AMC, and the programming model of AMOs. We compare AMOs\u27 performance to that of several other memory architectures on a variety of scientific and commercial benchmarks. Through simulation, we show that AMOs offer dramatic performance improvements for an important set of data-intensive operations, e.g., up to 50x faster barriers, 12x faster spinlocks, 8.5x-15x faster stream/array operations, and 3x faster database queries. We also present an analytical model that can predict the performance benefits of using AMOs with decent accuracy. The silicon cost required to support AMOs is less than 1% of the die area of a typical high performance processor, based on a standard cell implementation

Crossref

Chalmers Research

Future research directions in design of reliable communication systems

Author: A Avižienis
A Casaca
A Gumaste
A Somani
AMCA Koster
AP Wierzbicki
Arie M. C. A. Koster
B Ahlgren
C Büsing
C Büsing
C Develder
D Alderson
D Bertsimas
D Bertsimas
D Colle
D Dörner
D-L Truong
Dimitri Staessens
E Marshall
Egemen K. Çetinkaya
G Claßen
G Karagiannis
G Maier
GJ Holzmann
H Haddadi
H Hartenstein
J Blum
J Clímaco
J Gozalvez
J Rak
J Rak
J Rak
J Tapolcai
Jacek Rak
James P. G. Sterbenz
Javier Alonso Lopez
JM Coutinho-Rodrigues
JPG Sterbenz
K Kanoun
K Walkowiak
K Walkowiak
K Walkowiak
Kishor S. Trivedi
Krzysztof Walkowiak
L Iannone
LM Contreras
LW Beineke
M Fischetti
M Grottke
M Grottke
M Jinno
M Klinkowski
Mario Pickavet
Matthias Gunkel
MC Balbuena
ML Sichitiu
P Kogge
Q Zhang
R Bhandari
R Crane
R Matos
RE Steuer
S Maesschalck De
S Ramamurthy
S Sen
S Zeadally
T Gomes
Teresa Gomes
TJ Xia
V Castelli
W Molisz
W Molisz
W Venters
X Huang
Y Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Parallel Solution of Recurrence Problems

Author: P. M. Kogge
Publication venue: IEEE
Publication date: 01/01/1974
Field of study

Abstract:. An mth-order recurrence problem is defined as the computation of the sequence x,;.., xN, where xi =f(ai, xi-,;. and ai,is some vector of parameters. This paper investigates general algorithms for solving such problems on highly parallel computers. We show that if the recurrence functionfhas associated with it two other functions that satisfy certain composition properties, then we can construct elegant and efficient parallel algorithms that can compute all N elements of the series in time proportional to [log,N]. The class of problems having this property includes linear recurrences of all orders- both homogeneous and inhomogeneous, recurrences involving matrix or binary quantities, and various nonlinear problems involving operations such as computation with matrix inverses, exponentiation, and modulo division

CiteSeerX

The Pascal-XT code generator

Author: K. H. Drechsler
Kogge P. M.
M. P. Stadel
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

High throughput and low power dissipation in QCA pipelines using bennett clocking

Author: DeBenedictis E
Kogge P
Lombardi F
Ottavi M
Pontarelli S
Salsano A
Publication venue
Publication date: 01/01/2010
Field of study

This paper presents a detailed analysis of an architectural pipeline scheme for Quantum-dot Cellular Automata (QCA); this scheme utilizes the so-called Bennett clocking for attaining high throughput and low power dissipation. In this arrangement, computation stages (utilizing Bennett clocking) and memory stages combine the low power dissipation of reversible computing with the high throughput feature of a pipeline. An example of the application of the proposed scheme to an XOR tree circuit (parity generator) is presented; a detailed analysis of throughput and power consumption is provided to show the effectiveness of the proposed architectural solution for QCA

Crossref

Scipedia

ART