4 research outputs found
An FPGA accelerator of the wavefront algorithm for genomics pairwise alignment
In the last years, advances in next-generation sequencing technologies have enabled the proliferation of genomic applications that guide personalized medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only implementation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.ed medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only imple- mentation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C21/AEI/10.13039/501100011033), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), by the IBM/BSC Deep Learning Center initiative, and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. Ll. Alvarez has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Juan de la Cierva Formacion fellowship No. FJCI-2016-30984. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft
A High-performance, Energy-efficient Modular DMA Engine Architecture
Data transfers are essential in today's computing systems as latency and
complex memory access patterns are increasingly challenging to manage. Direct
memory access engines (DMAEs) are critically needed to transfer data
independently of the processing elements, hiding latency and achieving high
throughput even for complex access patterns to high-latency memory. With the
prevalence of heterogeneous systems, DMAEs must operate efficiently in
increasingly diverse environments. This work proposes a modular and highly
configurable open-source DMAE architecture called intelligent DMA (iDMA), split
into three parts that can be composed and customized independently. The
front-end implements the control plane binding to the surrounding system. The
mid-end accelerates complex data transfer patterns such as multi-dimensional
transfers, scattering, or gathering. The back-end interfaces with the on-chip
communication fabric (data plane). We assess the efficiency of iDMA in various
instantiations: In high-performance systems, we achieve speedups of up to 15.8x
with only 1 % additional area compared to a base system without a DMAE. We
achieve an area reduction of 10 % while improving ML inference performance by
23 % in ultra-low-energy edge AI systems over an existing DMAE solution. We
provide area, timing, latency, and performance characterization to guide its
instantiation in various systems.Comment: 14 pages, 14 figures, accepted by an IEEE journal for publicatio
pAElla: Edge-AI based Real-Time Malware Detection in Data Centers
The increasing use of Internet-of-Things (IoT) devices for monitoring a wide
spectrum of applications, along with the challenges of "big data" streaming
support they often require for data analysis, is nowadays pushing for an
increased attention to the emerging edge computing paradigm. In particular,
smart approaches to manage and analyze data directly on the network edge, are
more and more investigated, and Artificial Intelligence (AI) powered edge
computing is envisaged to be a promising direction. In this paper, we focus on
Data Centers (DCs) and Supercomputers (SCs), where a new generation of
high-resolution monitoring systems is being deployed, opening new opportunities
for analysis like anomaly detection and security, but introducing new
challenges for handling the vast amount of data it produces. In detail, we
report on a novel lightweight and scalable approach to increase the security of
DCs/SCs, that involves AI-powered edge computing on high-resolution power
consumption. The method -- called pAElla -- targets real-time Malware Detection
(MD), it runs on an out-of-band IoT-based monitoring system for DCs/SCs, and
involves Power Spectral Density of power measurements, along with AutoEncoders.
Results are promising, with an F1-score close to 1, and a False Alarm and
Malware Miss rate close to 0%. We compare our method with State-of-the-Art MD
techniques and show that, in the context of DCs/SCs, pAElla can cover a wider
range of malware, significantly outperforming SoA approaches in terms of
accuracy. Moreover, we propose a methodology for online training suitable for
DCs/SCs in production, and release open dataset and code