946 research outputs found
Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles
The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has
received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking
received support from the European Unionâs Horizon 2020 research and innovation programme and Germany,
Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy,
Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL
Joint Undertaking under grant agreement No. 692455-2
Improving low latency applications for reconfigurable devices
This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed.
Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods.
Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture.
Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
Automated Trading Systems Statistical and Machine Learning Methods and Hardware Implementation: A Survey
Automated trading, which is also known as algorithmic trading, is a method of using a predesigned computer program to submit a large number of trading orders to an exchange. It is substantially a real-time decision-making system which is under the scope of Enterprise Information System (EIS). With the rapid development of telecommunication and computer technology, the mechanisms underlying automated trading systems have become increasingly diversified. Considerable effort has been exerted by both academia and trading firms towards mining potential factors that may generate significantly higher profits. In this paper, we review studies on trading systems built using various methods and empirically evaluate the methods by grouping them into three types: technical analyses, textual analyses and high-frequency trading. Then, we evaluate the advantages and disadvantages of each method and assess their future prospects
Low power and high performance heterogeneous computing on FPGAs
L'abstract Ăš presente nell'allegato / the abstract is in the attachmen
High-Performance Computing for SKA Transient Search: Use of FPGA based Accelerators -- a brief review
This paper presents the High-Performance computing efforts with FPGA for the
accelerated pulsar/transient search for the SKA. Case studies are presented
from within SKA and pathfinder telescopes highlighting future opportunities. It
reviews the scenario that has shifted from offline processing of the radio
telescope data to digitizing several hundreds/thousands of antenna outputs over
huge bandwidths, forming several 100s of beams, and processing the data in the
SKA real-time pulsar search pipelines. A brief account of the different
architectures of the accelerators, primarily the new generation Field
Programmable Gate Array-based accelerators, showing their critical roles to
achieve high-performance computing and in handling the enormous data volume
problems of the SKA is presented here. It also presents the power-performance
efficiency of this emerging technology and presents potential future scenarios.Comment: Accepted for JoAA, SKA Special issue on SKA (2022
Making the case: The role of FPGAs for efficiency-driven quantitative financial modelling
Aiming at low latency, Field-Programmable Gate Arrays (FPGAs) have a long history in High-Frequency Trading (HFT) and have traditionally been used for operations such as networking, routing and market feed handling.While the low-latency benefits of FPGAs have been obvious in HFT, there have been high barriers to programming such devices and the required specialised knowledge on behalf of the developers has slowed down the adoption of FPGAs in the wider community. That is until recently when vendors such as Intel or Xilinx have invested high efforts both in creating highly-performant generations of FPGAs, such as Intel's Stratix line and Xilinx's generation of Alveo cards and more important in crafting toolchains and software environments to lower entry barriers for software developers. Operating at lower clock frequencies than traditional computing hardware such as CPUs and GPUs, and therefore requiring less power, FPGAs are now programmable directly from higher level programming languages and therefore software developers can write software code to configure such devices making performance and energy-efficiency advantages of FPGAs available to a wider user group such as the quantitative finance community. The challenge is now in the design of the algorithms to best suit the FPGA, moving away significantly from the CPU version.We illustrate the usability, performance and energy-efficiency advantages of FPGAs with three financial use cases: Having developed our experiments on a Xilinx Alveo U280 FPGA card, our Credit Default Swap (CDS) engine achieves a 1.5 times performance increase, whilst increasing power efficiency by 7.1 times compared to the parallel version on a 24-core Intel Xeon CPU. Our Black-Scholes hedging strategy, operating over discrete time intervals on the same Alveo U280 FPGA, achieves 69.9 times the performance of the ubiquitous reference version and our FPGA option price discovery algorithm performs between 1.5 and 8 times faster than on the CPU, delivering between 8.1 and 185.1 times improvement in energy efficiency respectively
Accelerating Reconfigurable Financial Computing
This thesis proposes novel approaches to the design, optimisation, and management of reconfigurable
computer accelerators for financial computing. There are three contributions. First, we propose novel
reconfigurable designs for derivative pricing using both Monte-Carlo and quadrature methods. Such
designs involve exploring techniques such as control variate optimisation for Monte-Carlo, and multi-dimensional
analysis for quadrature methods. Significant speedups and energy savings are achieved
using our Field-Programmable Gate Array (FPGA) designs over both Central Processing Unit (CPU)
and Graphical Processing Unit (GPU) designs. Second, we propose a framework for distributing computing
tasks on multi-accelerator heterogeneous clusters. In this framework, different computational
devices including FPGAs, GPUs and CPUs work collaboratively on the same financial problem based
on a dynamic scheduling policy. The trade-off in speed and in energy consumption of different accelerator
allocations is investigated. Third, we propose a mixed precision methodology for optimising
Monte-Carlo designs, and a reduced precision methodology for optimising quadrature designs. These
methodologies enable us to optimise throughput of reconfigurable designs by using datapaths with
minimised precision, while maintaining the same accuracy of the results as in the original designs
- âŠ