Search CORE

Optimizing hardware function evaluation

Author: Gaffar AA
Lee DU
Luk W
Mencer O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Published versio

Object-oriented domain specific compilers for programming FPGAs

Author: Flynn MJ
Mencer O
Morf M
Platzner M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Published versio

Hotspot detection of SPEC CPU 2006 benchmarks with performance event counters

Author: Atasu K
Mencer O
Tavares C
Wu Q
Publication venue: Department of Computing, Imperial College London
Publication date: 01/01/2008
Field of study

Hotspot is the part of a program where most execution time is spent. Detecting the hotspot enables the optimization of the program. The performance event counters embedded in modern processors provide the hardware support for the hotspot detection. By sampling the instruc- tion addresses of the running program with performance event counters, hotspot of the program can be statistically detected. This technical re- port describes our tool to find the sections of the code that are detected as the hotspot of the program with performance event counters. SPEC CPU 2006 benchmarks are tested with our tool and the results show the hotspot sections and overhead of the hotspot detection tool

Dataflow Design for Optimal Incremental SVM Training

Author: Luk W
Mencer O
Shao S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/12/2016
Field of study

This paper proposes a new parallel architecture for incremental training of a Support Vector Machine (SVM), which produces an optimal solution based on manipulating the Karush-Kuhn-Tucker (KKT) conditions. Compared to batch training methods, our approach avoids re-training from scratch when training dataset changes. The proposed architecture is the first to adopt an efficient dataflow organisation. The main novelty is a parametric description of the parallel dataflow architecture, which deploys customisable arithmetic units for dense linear algebraic operations involved in updating the KKT conditions. The proposed architecture targets on-line SVM training applications. Experimental evaluation with real world financial data shows that our architecture implemented on Stratix-V FPGA achieved significant speedup against LIBSVM on Core i7-4770 CPU

Designing a Posture Analysis System with Hardware Implementation

Author: B. L. Lo
C. C. Cheung
C. Wren
D. M. Gavrila
G. Stitt
G. Z. Yang
I. Haritaoglu
J. G. F. Coutinho
J. L. Wang
J. Verghese
L. Wang
M. P. T. Juvonen
O. Mencer
O. Mencer
S. Ghiasi
W. Luk
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Optimizing logarithmic arithmetic on FPGAs

Author: Fu H
Luk W
Mencer O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

This paper proposes optimizations of the methods and parameters used in both mathematical approximation and hardware design for logarithmic number system (LNS) arithmetic. First, we introduce a general polynomial ap-proximation approach with an adaptive divide-in-halves segmentation method for evaluation of LNS arithmetic functions. Second, we develop a library generator that au-tomatically generates optimized LNS arithmetic units with a wide bit-width range from 21 to 64 bits, to support LNS application development and design exploration. The ba-sic arithmetic units are tested on practical FPGA boards as well as software simulation. When compared with exist-ing LNS designs, our generated units provide in most case

Parameterized function evaluation for FPGAs

Author: M. Weinhardt
M.J. Schulte
N. Boullis
O. Mencer
O. Mencer
W.F. Wong
Publication venue
Publication date: 01/01/2001
Field of study

This paper presents parameterized module-generators for pipelined function evaluation using lookup tables, adders, shifters and multipliers. We discuss trade-offs involved between (1) full-lookup tables, (2) bipartite (lookup-add) units, (3) lookup-multiply units, and (4) shift-and-add based CORDIC units. For lookup-multiply units we provide equations estimating approximation errors and rounding errors which are used to parameterize the hardware units. The resources and performance of the resulting design can be estimated given the input parameters. The method is implemented as part of the PAM-Blox module generation environment. An example shows that the table-multiply unit produces competitive designs with data widths up to 20 bits when compared with shiftand-add based CORDIC units. Additionally, the table-multiply method can be used for larger data widths when evaluating functions not supported by CORDIC.

Memory mapping for multi-die FPGAs

Author: Gaydadjiev G
Luk W
Mencer O
Quintana P
Voss N
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/03/2019
Field of study

This paper proposes an algorithm for mappinglogical to physical memory resources on Field-ProgrammableGate Arrays (FPGAs). Our greedy strategy based algorithmis specifically designed to facilitate timing closure on modernmulti-die FPGAs for static-dataflow accelerators utilising mostof the on-chip resources. The main objective of the proposedalgorithm is to ensure that specific sub-parts of the design underconsideration can fully reside within a single die to limit inter-die communication. The above is achieved by performing thememory mapping for each sub-part of the design separately whilekeeping allocation of the available physical resources balanced.As a result the number of inter-die connections is reduced onaverage by 50% compared to an algorithm targeting minimalarea usage for real, complex applications using most of the on-chip’s resources. Additionally, our algorithm is the only one outof the four evaluated approaches which successfully producesplace and route results for all 33 applications and benchmarks

StReAm: Object-Oriented Programming of Stream Architectures Using PAM-Blox

Author: C. Ebeling
M.B. Gokhale
O. Mencer
O. Mencer
O. Mencer
P. Bellows
R. Laufer
T.J. Callahan
W. Luk
X. Lai
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study