Search CORE

718 research outputs found

ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

Author: Arora Aman
Bacellar Alan T. L.
Breternitz Jr. Mauricio
de Araujo Leandro S.
Dutra Diego L. C.
Franca Felipe M. G.
John Lizy K.
Katopodis Rafael F.
Lima Priscila M. V.
Miranda Igor D. S.
Susskind Zachary
Villon Luis A. Q.
Publication venue
Publication date: 20/04/2023
Field of study

The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21

\mu

s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31

\mu

s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.Comment: 14 pages, 14 figures Portions of this article draw heavily from arXiv:2203.01479, most notably sections 5E and 5F.

arXiv.org e-Print Archive

Hardware Acceleration of Network Intrusion Detection System Using FPGA

Author: Hashmi Adeel
Publication venue
Publication date: 01/01/2011
Field of study

This thesis presents new algorithms and hardware designs for Signature-based Network Intrusion Detection System (SB-NIDS) optimisation exploiting a hybrid hardwaresoftware co-designed embedded processing platform. The work describe concentrates on optimisation of a complete SB-NIDS Snort application software on a FPGA based hardware-software target rather than on the implementation of a single functional unit for hardware acceleration. Pattern Matching Hardware Accelerator (PMHA) based on Bloom filter was designed to optimise SB-NIDS performance for execution on a Xilinx MicroBlaze soft-core processor. The Bloom filter approach enables the potentially large number of network intrusion attack patterns to be efficiently represented and searched primarily using accesses to FPGA on-chip memory. The thesis demonstrates, the viability of hybrid hardware-software co-designed approach for SB-NIDS. Future work is required to investigate the effects of later generation FPGA technology and multi-core processors in order to clearly prove the benefits over conventional processor platforms for SB-NIDS. The strengths and weaknesses of the hardware accelerators and algorithms are analysed, and experimental results are examined to determine the effectiveness of the implementation. Experimental results confirm that the PMHA is capable of performing network packet analysis for gigabit rate network traffic. Experimental test results indicate that our SB-NIDS prototype implementation on relatively low clock rate embedded processing platform performance is approximately 1.7 times better than Snort executing on a general purpose processor on PC when comparing processor cycles rather than wall clock time

E-space: Manchester Metropolitan University's Research Repository

OpenGrey Repository

Energy Efficient Hardware Accelerators for Packet Classification and String Matching

Author: Kennedy Alan
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 21/09/2010
Field of study

This thesis focuses on the design of new algorithms and energy efficient high throughput hardware accelerators that implement packet classification and fixed string matching. These computationally heavy and memory intensive tasks are used by networking equipment to inspect all packets at wire speed. The constant growth in Internet usage has made them increasingly difficult to implement at core network line speeds. Packet classification is used to sort packets into different flows by comparing their headers to a list of rules. A flow is used to decide a packet’s priority and the manner in which it is processed. Fixed string matching is used to inspect a packet’s payload to check if it contains any strings associated with known viruses, attacks or other harmful activities. The contributions of this thesis towards the area of packet classification are hardware accelerators that allow packet classification to be implemented at core network line speeds when classifying packets using rulesets containing tens of thousands of rules. The hardware accelerators use modified versions of the HyperCuts packet classification algorithm. An adaptive clocking unit is also presented that dynamically adjusts the clock speed of a packet classification hardware accelerator so that its processing capacity matches the processing needs of the network traffic. This keeps dynamic power consumption to a minimum. Contributions made towards the area of fixed string matching include a new algorithm that builds a state machine that is used to search for strings with the aid of default transition pointers. The use of default transition pointers keep memory consumption low, allowing state machines capable of searching for thousands of strings to be small enough to fit in the on-chip memory of devices such as FPGAs. A hardware accelerator is also presented that uses these state machines to search through the payloads of packets for strings at core network line speeds

Irish Universities

DCU Online Research Access Service

GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

Author: Alser Mohammed
Alserr Nour Almadhoun
Baranwal Akanksha
Cali Damla Senol
Firtina Can
Manglik Aditya
Mao Haiyu
Mutlu Onur
Sadrosadati Mohammad
Publication venue
Publication date: 18/09/2022
Field of study

Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure

arXiv.org e-Print Archive

Living IoT: A Flying Wireless Platform on Live Insects

Author: Almarshadi M. H.
Brooks R. A.
Brunet J.
Dudley R.
Hessar M.
Jang T.
Kearns C. A.
Lok M.
Matic J.
Merker B. H
Naderiparizi S.
Osborne J.
Ozkurt A.
Russell W. M. S.
Tenczar P. C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/12/2018
Field of study

Sensor networks with devices capable of moving could enable applications ranging from precision irrigation to environmental sensing. Using mechanical drones to move sensors, however, severely limits operation time since flight time is limited by the energy density of current battery technology. We explore an alternative, biology-based solution: integrate sensing, computing and communication functionalities onto live flying insects to create a mobile IoT platform. Such an approach takes advantage of these tiny, highly efficient biological insects which are ubiquitous in many outdoor ecosystems, to essentially provide mobility for free. Doing so however requires addressing key technical challenges of power, size, weight and self-localization in order for the insects to perform location-dependent sensing operations as they carry our IoT payload through the environment. We develop and deploy our platform on bumblebees which includes backscatter communication, low-power self-localization hardware, sensors, and a power source. We show that our platform is capable of sensing, backscattering data at 1 kbps when the insects are back at the hive, and localizing itself up to distances of 80 m from the access points, all within a total weight budget of 102 mg.Comment: Co-primary authors: Vikram Iyer, Rajalakshmi Nandakumar, Anran Wang, In Proceedings of Mobicom. ACM, New York, NY, USA, 15 pages, 201

arXiv.org e-Print Archive

Crossref

Graphics Processor-based High Performance Pattern Matching Mechanism for Network Intrusion Detection

Author: Hsien-Wen Hsu
Nen-Fu Huang
Yen-Ming Chu
Publication venue: 'IntechOpen'
Publication date: 22/03/2011
Field of study

IntechOpen