6 research outputs found
Exploring the acceleration of the Met Office NERC Cloud model using FPGAs
The use of Field Programmable Gate Arrays (FPGAs) to accelerate computational
kernels has the potential to be of great benefit to scientific codes and the
HPC community in general. With the recent developments in FPGA programming
technology, the ability to port kernels is becoming far more accessible.
However, to gain reasonable performance from this technology it is not enough
to simple transfer a code onto the FPGA, instead the algorithm must be
rethought and recast in a data-flow style to suit the target architecture. In
this paper we describe the porting, via HLS, of one of the most computationally
intensive kernels of the Met Office NERC Cloud model (MONC), an atmospheric
model used by climate and weather researchers, onto an FPGA. We describe in
detail the steps taken to adapt the algorithm to make it suitable for the
architecture and the impact this has on kernel performance. Using a PCIe
mounted FPGA with on-board DRAM, we consider the integration on this kernel
within a larger infrastructure and explore the performance characteristics of
our approach in contrast to Intel CPUs that are popular in modern HPC machines,
over problem sizes involving very large grids. The result of this work is an
experience report detailing the challenges faced and lessons learnt in porting
this complex computational kernel to FPGAs, as well as exploring the role that
FPGAs can play and their fundamental limits in accelerating traditional HPC
workloads.Comment: Preprint of article in proceedings, ISC High Performance 2019.
Lecture Notes in Computer Science, vol 1188
Securing the IoT ecosystem: ASIC-based hardware realization of Ascon lightweight cipher
The Internet of Things (IoT) nodes consist of sensors that collect environmental data and then perform data exchange with surrounding nodes and gateways. Cybersecurity attacks pose a threat to the data security that is being transmitted in any IoT network. Cryptographic primitives are widely adopted to address these threats; however, the substantial computation demands limit their applicability in the IoT ecosystem. In addition, each IoT node varies with respect to the area and throughput (TP) requirements, thus demanding flexible implementation for encryption/decryption processes. To solve these issues, this work implements the NIST lightweight cryptography standard, Ascon, on a SAED 32 nm process design kit (PDK) library by employing loop folded, loop unrolled and fully unrolled architectures. The fully unrolled architecture can achieve the highest TP but at the cost of higher area utilisation. Unrolling by a lower factor results in lower area implementations, enabling the exploration of design space to tackle the trade-off between area and TP performance of the design. The implementation results show that, for loop folded architecture, Ascon-128 and Ascon-128a require 36.7k μm2 and 38.5k μm2 chip area, respectively compared to 277.1k μm2 and 306.6k μm2 required by their fully unrolled implementations. The proposed implementation strategies can adjust the number of rounds to accommodate the varied requirements of IoT ecosystems. An implementation with an open-source 45 nm PDK library is also undertaken for enhanced generalization and reproducibility of the results
Performance and energy-efficient implementation of a smart city application on FPGAs
The continuous growth of modern cities and the request for better quality of life, coupled with the increased availability of computing resources, lead to an increased attention to smart city services. Smart cities promise to deliver a better life to their inhabitants while simultaneously reducing resource requirements and pollution. They are thus perceived as a key enabler to sustainable growth. Out of many other issues, one of the major concerns for most cities in the world is traffic, which leads to a huge waste of time and energy, and to increased pollution. To optimize traffic in cities, one of the first steps is to get accurate information in real time about the traffic flows in the city. This can be achieved through the application of automated video analytics to the video streams provided by a set of cameras distributed throughout the city. Image sequence processing can be performed both peripherally and centrally. In this paper, we argue that, since centralized processing has several advantages in terms of availability, maintainability and cost, it is a very promising strategy to enable effective traffic management even in large cities. However, the computational costs are enormous, and thus require an energy-efficient High-Performance Computing approach. Field Programmable Gate Arrays (FPGAs) provide comparable computational resources to CPUs and GPUs, yet require much lower amounts of energy per operation (around 6 and 10 for the application considered in this case study). They are thus preferred resources to reduce both energy supply and cooling costs in the huge datacenters that will be needed by Smart Cities. In this paper, we describe efficient implementations of high-performance algorithms that can process traffic camera image sequences to provide traffic flow information in real-time at a low energy and power cost