Search CORE

26 research outputs found

Automatic generation of high-throughput systolic tree-based solvers for modern FPGAs

Author: Tavakkoli Aryan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/06/2019
Field of study

Tree-based models are a class of numerical methods widely used in financial option pricing, which have a computational complexity that is quadratic with respect to the solution accuracy. Previous research has employed reconfigurable computing with small degrees of parallelism to provide faster hardware solutions compared with general-purpose processing software designs. However, due to the nature of their vector hardware architectures, they cannot scale their compute resources efficiently, leaving them with pricing latency figures which are quadratic with respect to the problem size, and hence to the solution accuracy. Also, their solutions are not productive as they require hardware engineering effort, and can only solve one type of tree problems, known as the standard American option. This thesis presents a novel methodology in the form of a high-level design framework which can capture any common tree-based problem, and automatically generates high-throughput field-programmable gate array (FPGA) solvers based on proposed scalable hardware architectures. The thesis has made three main contributions. First, systolic architectures were proposed for solving binomial and trinomial trees, which due to their custom systolic data-movement mechanisms, can scale their compute resources efficiently to provide linear latency scaling for medium-size trees and improved quadratic latency scaling for large trees. Using the proposed systolic architectures, throughput speed-ups of up to 5.6X and 12X were achieved for modern FPGAs, compared to previous vector designs, for medium and large trees, respectively. Second, a productive high-level design framework was proposed, that can capture any common binomial and trinomial tree problem, and a methodology was suggested to generate high-throughput systolic solvers with custom data precision, where the methodology requires no hardware design effort from the end user. Third, a fully-automated tool-chain methodology was proposed that, compared to previous tree-based solvers, improves user productivity by removing the manual engineering effort of applying the design framework to option pricing problems. Using the productive design framework, high-throughput systolic FPGA solvers have been automatically generated from simple end-user C descriptions for several tree problems, such as American, Bermudan, and barrier options.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

A study of aspects of synchronisation and communication in certain parallel computer architectures

Author: Whitbread Martin John
Publication venue
Publication date: 01/01/1989
Field of study

This paper examines methods for synchronisation and communication between tasks in highly parallel arrays of processors. The development of various methods is researched and simulation techniques are applied to specific structures, to examine their effectiveness. Two approaches to simulation are presented, in the first case a discrete event simulator is applied to task synchronisation implemented with semaphores in a close coupled environment. Secondly the concurrent programming language Occam is used to simulate a systolic configuration of processors. In this case the design is verified, through actual system construction. Conclusions are drawn regarding the design disciplines and structure imposed by the use of these simulation techniques. A close relationship is found between the behaviour of a simulation written in Occam and the same structure constructed from multiple processors. Further research is suggested into the subject of dataflow processors, to find suitable means for simulating such systems, prior to implementation. A type of test vehicle is proposed that would operate a dataflow processor under the control of the development system

Open Research Online (The Open University)

Energy efficient hardware acceleration of multimedia processing tools

Author: Kinane Andrew
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/2006
Field of study

The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

Irish Universities

DCU Online Research Access Service

Predicting Cardiovascular Risk with Objective Physical Activity Measurements

Author: van Dooren B.J.T.A.
Publication venue
Publication date: 31/08/2021
Field of study

Pure OAI Repository

TraTSA: A Transprecision Framework for Efficient Time Series Analysis

Author: Fernández-Vega Iván
Gonzalez-Navarro Sonia
Gutierrez-Carrasco Eladio Damian
Plata Óscar G.
Quislant Ricardo
Publication venue: 'Elsevier BV'
Publication date: 01/09/2022
Field of study

Time series analysis (TSA) comprises methods for extracting information in domains as diverse as medicine, seismology, speech recognition and economics. Matrix Profile (MP) is the state-of-the-art TSA technique, which provides the most similar neighbor to each subsequence of the time series. However, this computation requires a huge amount of floating-point (FP) operations, which are a major contributor ( 50%) to the energy consumption in modern computing platforms. In this sense, Transprecision Computing has recently emerged as a promising approach to improve energy efficiency and performance by using fewer bits in FP operations while providing accurate results. In this work, we present TraTSA, the first transprecision framework for efficient time series analysis based on MP. TraTSA allows the user to deploy a high-performance and energy-efficient computing solution with the exact precision required by the TSA application. To this end, we first propose implementations of TraTSA for both commodity CPU and FPGA platforms. Second, we propose an accuracy metric to compare the results with the double-precision MP. Third, we study MP’s accuracy when using a transprecision approach. Finally, our evaluation shows that, while obtaining results accurate enough, the FPGA transprecision MP (i) is 22.75 faster than a 72-core server, and (ii) the energy consumption is up to 3.3 lower than the double-precision executions.This work has been supported by the Government of Spain under project PID2019-105396RB-I00, and Junta de Andalucia under projects P18-FR-3433 and UMA18-FEDERJA-197. Funding for open access charge: Universidad de Málaga / CBUA

Repositorio Institucional Universidad de Málaga

Serial-data computation in VLSI

Author: Smith Stewart Gresty
Publication venue: The University of Edinburgh
Publication date: 01/01/1987
Field of study

Edinburgh Research Archive

Venous hemodynamics in neurological disorders: an analytical review with hydrodynamic analysis.

Author: A Furuta
A Kutzelnigg
A Larsson
A Laupacis
A Minagar
A Tsunoda
AC Guyton
AF Frydrychowski
AG Kermode
AI Katz
AJ Wakefield
AM Shaaban
AW Varga
B Schaller
B Schlesinger
B Yamout
BF Popescu
BK Owler
C Baracchini
C Baracchini
C Beggs
C Krogias
C Magnano
CA Mayer
CB Beggs
CB Beggs
CH Tator
CJ Fowler
CK Lyon
CK Lyon
Clive B Beggs
CM Dalton
CM Dalton
CP Chung
CP Chung
CP Chung
CP Gilmore
CR Jack Jr
CR Thore
CW Adams
CW Adams
D Centonze
D Inzitari
D Inzitari
D Inzitari
D Inzitari
D Kidd
D Pitt
D San Millan Ruiz
D Utriainen
DG Talbert
DJ Brooks
DJ Werring
DM Moody
DM Moody
DM Moody
DP Briley
DS Kim
E Menegatti
E Mori
EC Tallantyre
EM Haacke
F Aboul-Enein
F Doepp
F Doepp
F Fazekas
F Schelling
G Schroth
GA Bateman
GA Bateman
GA Bateman
GA Bateman
GA Bateman
GA Bateman
GA Bateman
GA Bateman
GJ Hankey
GL Schwartz
H Kitagaki
H Lassmann
H Williams
H Williams
HE Hulst
HL Rekate
HM Duvernoy
HR Parvey
HS Markus
IL Tan
J De Keyser
J Ekstedt
J Helenius
J Kawamura
J Kawamura
J Kim
J Kirk
J Lycke
J Plumb
J Simon
J van Gijn
J Wuerfel
J Wuerfel
JA Sloane
JC van Swieten
JH Moyer
JJ Lochhead
JM Luce
JM McCormick
JN Higgins
JN Higgins
JP Holt
JR Vignes
JW Dawson
K Ambarki
KA Witt
KA Witt
L Monti
L Pantoni
M D'Haeseleer
M Egnor
M Egnor
M Fernandez
M Inglese
M Kiefer
M Kitano
M Kobari
M Law
M Mancini
M Mase
M O'Sullivan
M O'Sullivan
M Saito
M Simka
M Sospedra
M Tullberg
M Tullberg
M Wiszniewska
M Zaniewski
ME Wagshul
MG Benedetti
MH Al-Omari
MJ Taffoni
MP Wattjes
MP Wattjes
MS Fernando
N Alperin
ND Chiaravalloti
NP Young
NR Graff-Radford
O Algin
O Khan
P Gideon
P Sundstrom
P Zamboni
P Zamboni
P Zamboni
P Zamboni
P Zamboni
P Zamboni
P Zamboni
PH Luetmer
PJ Chai
R Martin
R Sakakibara
R Sakakibara
R Schmidt
R Ylikoski
R Zivadinov
R Zivadinov
RL Swank
RW Cutler
S Adhya
S Bastianello
S El Sankari
S Hakim
S Leech
S Momjian
T Fog
T Miyati
TJ Putnam
TJ Putnam
U Graumann
VH ten Dam
W Bruck
W Rashid
WC Olivero
WG Bradley
WG Bradley
WG Bradley Jr
WG Bradley Jr
WG Bradley Jr
WI McDonald
WP Dillon
WR Brown
WR Brown
WR Brown
WR Brown
X Sun
Y Ge
Y Ge
Y Ge
Y Ge
Y Ge
Y Nakahara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Venous abnormalities contribute to the pathophysiology of several neurological conditions. This paper reviews the literature regarding venous abnormalities in multiple sclerosis (MS), leukoaraiosis, and normal-pressure hydrocephalus (NPH). The review is supplemented with hydrodynamic analysis to assess the effects on cerebrospinal fluid (CSF) dynamics and cerebral blood flow (CBF) of venous hypertension in general, and chronic cerebrospinal venous insufficiency (CCSVI) in particular.CCSVI-like venous anomalies seem unlikely to account for reduced CBF in patients with MS, thus other mechanisms must be at work, which increase the hydraulic resistance of the cerebral vascular bed in MS. Similarly, hydrodynamic changes appear to be responsible for reduced CBF in leukoaraiosis. The hydrodynamic properties of the periventricular veins make these vessels particularly vulnerable to ischemia and plaque formation.Venous hypertension in the dural sinuses can alter intracranial compliance. Consequently, venous hypertension may change the CSF dynamics, affecting the intracranial windkessel mechanism. MS and NPH appear to share some similar characteristics, with both conditions exhibiting increased CSF pulsatility in the aqueduct of Sylvius.CCSVI appears to be a real phenomenon associated with MS, which causes venous hypertension in the dural sinuses. However, the role of CCSVI in the pathophysiology of MS remains unclear

Crossref

Springer - Publisher Connector

PubMed Central

Bradford Scholars

Leeds Beckett Repository

Experimental survey of FPGA-based monolithic switches and a novel queue balancer

Author: Adhi BA
Luk W
Papaphilippou P
Sano K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/01/2023
Field of study

This paper studies small to medium-sized monolithic switches for FPGA implementation and presents a novel switch design that achieves high algorithmic performance and FPGA implementation efficiency. Crossbar switches based on virtual output queues (VOQs) and variations have been rather popular for implementing switches on FPGAs, with applications in network switches, memory interconnects, network-on-chip (NoC) routers etc. The implementation efficiency of crossbar-based switches is well-documented on ASICs, though we show that their disadvantages can outweigh their advantages on FPGAs. One of the most important challenges in such input-queued switches is the requirement for iterative scheduling algorithms. In contrast to ASICs, this is more harmful on FPGAs, as the reduced operating frequency and narrower packets cannot “hide” multiple iterations of scheduling that are required to achieve a modest scheduling performance.Our proposed design uses an output-queued switch internally for simplifying scheduling, and a queue balancing technique to avoid queue fragmentation and reduce the need for memory-sharing VOQs. Its implementation approaches the scheduling performance of a state-of-the-art FPGA-based switch, while requiring considerably fewer resources

Spiral - Imperial College Digital Repository

Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud

Author: Boroumand Amirali
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Oliveira Geraldo F.
Publication venue
Publication date: 19/09/2022
Field of study

Neural networks (NNs) are growing in importance and complexity. A neural network's performance (and energy efficiency) can be bound either by computation or memory resources. The processing-in-memory (PIM) paradigm, where computation is placed near or within memory arrays, is a viable solution to accelerate memory-bound NNs. However, PIM architectures vary in form, where different PIM approaches lead to different trade-offs. Our goal is to analyze, discuss, and contrast DRAM-based PIM architectures for NN performance and energy efficiency. To do so, we analyze three state-of-the-art PIM architectures: (1) UPMEM, which integrates processors and DRAM arrays into a single 2D chip; (2) Mensa, a 3D-stack-based PIM architecture tailored for edge devices; and (3) SIMDRAM, which uses the analog principles of DRAM to execute bit-serial operations. Our analysis reveals that PIM greatly benefits memory-bound NNs: (1) UPMEM provides 23x the performance of a high-end GPU when the GPU requires memory oversubscription for a general matrix-vector multiplication kernel; (2) Mensa improves energy efficiency and throughput by 3.0x and 3.1x over the Google Edge TPU for 24 Google edge NN models; and (3) SIMDRAM outperforms a CPU/GPU by 16.7x/1.4x for three binary NNs. We conclude that the ideal PIM architecture for NN models depends on a model's distinct attributes, due to the inherent architectural design choices.Comment: This is an extended and updated version of a paper published in IEEE Micro, pp. 1-14, 29 Aug. 2022. arXiv admin note: text overlap with arXiv:2109.1432

arXiv.org e-Print Archive

Geneva Health Forum 2020 Poster Book

Author
Publication venue: 'MDPI AG'
Publication date: 24/02/2022
Field of study

From 16 to 18 November 2020, the eighth edition of the Geneva Health Forum, which took place in the difficult context of the Covid 19 pandemic, hosted 165 posters. The present collection offers through 65 posters a wide range of topics discussed

Directory of Open Access Books (DOAB)