679 research outputs found
Smart technologies for effective reconfiguration: the FASTER approach
Current and future computing systems increasingly require that their functionality stays flexible after the system is operational, in order to cope with changing user requirements and improvements in system features, i.e. changing protocols and data-coding standards, evolving demands for support of different user applications, and newly emerging applications in communication, computing and consumer electronics. Therefore, extending the functionality and the lifetime of products requires the addition of new functionality to track and satisfy the customers needs and market and technology trends. Many contemporary products along with the software part incorporate hardware accelerators for reasons of performance and power efficiency. While adaptivity of software is straightforward, adaptation of the hardware to changing requirements constitutes a challenging problem requiring delicate solutions. The FASTER (Facilitating Analysis and Synthesis Technologies for Effective Reconfiguration) project aims at introducing a complete methodology to allow designers to easily implement a system specification on a platform which includes a general purpose processor combined with multiple accelerators running on an FPGA, taking as input a high-level description and fully exploiting, both at design time and at run time, the capabilities of partial dynamic reconfiguration. The goal is that for selected application domains, the FASTER toolchain will be able to reduce the design and verification time of complex reconfigurable systems providing additional novel verification features that are not available in existing tool flows
Adaptive OFDM System Design For Cognitive Radio
Recently, Cognitive Radio has been proposed as a promising technology to improve spectrum utilization. A highly flexible OFDM system is considered to be a good candidate for the Cognitive Radio baseband processing where individual carriers can be switched off for frequencies occupied by a licensed user. In order to support such an adaptive OFDM system, we propose a Multiprocessor System-on-Chip (MPSoC) architecture which can be dynamically reconfigured. However, the complexity and flexibility of the baseband processing makes the MPSoC design a difficult task. This paper presents a design technology for mapping flexible OFDM baseband for Cognitive Radio on a multiprocessor System-on-Chip (MPSoC)
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chip
Process Variation (PV) is increasingly threatening the reliability of Networks-on-Chips. Thus, various resilient router designs have been recently proposed and evaluated. However, these evaluations assume random fault distributions, which result in 52%--81% inaccuracy. We propose an accurate circuit-level fault-modeling tool, which can be plugged into any system-level NoC simulator, quantify the system-level impact of PV-induced faults at runtime, pinpoint fault-prone router components that should be protected, and accurately evaluate alternative resilient multi-core designs.GigaScale Systems Research CenterFocus Center Research Program. Focus Center for Circuit & System Solutions. Semiconductor Research Corporation. Interconnect Focus Cente
Recommended from our members
Securing Network Processors with Hardware Monitors
As an essential part of modern society, the Internet has fundamentally changed our lives during the last decade. Novel applications and technologies, such as online shopping, social networking, cloud computing, mobile networking, etc, have sprung up at an astonishing pace. These technologies not only influence modern life styles but also impact Internet infrastructure. Numerous new network applications and services require better programmability and flexibility for network devices, such as routers and switches. Since traditional fixed function network routers based on application specific integrated circuits (ASICs) have difficulty keeping pace with the growing demands of next-generation Internet applications, there is an ongoing shift in the industry toward implementing network devices using programmable network processors (NPs).
While network processors offer great benefits in terms of flexibility, their reprogrammable nature exposes potential security risks. Similar to network end-systems, such as general-purpose computers, software-based network processors have security vulnerabilities that can be attacked remotely. Recent research has shown that a new type of data plane attack is able to modify the functionality of a network processor and cause a denial-of-service (DoS) attack by sending a single malformed UDP packet. Since this attack relies solely on data plane access and does not need access to the control plane, it can be particularly difficult to control.
Hardware security monitors have been introduced to identify and eliminate these malicious packets before they can propagate and cause devastating effects in the network. However, previous work on hardware monitors only focus on single core systems with static (or very slowly changing) workloads. In network processors that use up to hundreds of parallel processor cores and have processing workloads that can change dynamically based on the network traffic, the realization of a complete multicore hardware monitoring system remains a critical challenge. Our research work in this thesis provides a comprehensive solution to this problem.
Our first contribution is the design and prototype implementation of a Scalable Hardware Monitoring Grid (SHMG). This scalable architecture balances area cost and performance overhead by using a clustered approach for multicore NP systems. In order to adapt to dynamically changing network traffic, a resource reallocation algorithm is designed to reassign the processing resources in SHMG to different network applications at runtime. An evaluation of the prototype SHMG on an Altera DE4 board demonstrates low resource and performance overheads. The functionality and performance of a runtime resource reallocation algorithm are tested using a simulation environment.
A second significant contribution of this work is a network system-level security solution for multicore network processors with hardware monitors. It addresses two key problems: (1) how to securely manage and reprogram processor cores and monitors in a deployed router in the network, and (2) how to prevent the large number of identical router devices in the network from an attack that can circumvent one specific monitoring system and lead to Internet-scale failures. A Secure Dynamic Multicore Hardware Monitoring System (SDMMon) is designed based on cryptographic principles and suitable key management to ensure the secure installation of processor binaries and monitor graphs. We present a Merkle tree based parameterizable high performance hash function that can be configured to perform a variety of functions in different devices via a 32-bit configuration parameter. A prototype system composed of both the SDMMon and the parameterizable hash is implemented and evaluated on an Altera DE4 board.
Finally, a fully-functional, comprehensive Multicore NP Security Platform, which integrates both the SHMG and the SDMMon security features, has been implemented on an Altera DE5 board
Multi-Softcore Architecture on FPGA
To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication)
EMVS: Embedded Multi Vector-core System
With the increase in the density and performance of digital electronics, the demand for a power-efficient high-performance computing (HPC) system has been increased for embedded applications. The existing embedded HPC systems suffer from issues like programmability, scalability, and portability. Therefore, a parameterizable and programmable high-performance processor system architecture is required to execute the embedded HPC applications. In this work, we proposed an Embedded Multi Vector-core System (EMVS) which executes the embedded application by managing the multiple vectorized tasks and their memory operations. The system is designed and ported on an Altera DE4 FPGA development board. The performance of EMVS is compared with the Heterogeneous Multi-Processing Odroid XU3, Parallela and GPU Jetson TK1 embedded systems. In contrast to the embedded systems, the results show that EMVS improves 19.28 and 10.22 times of the application and system performance respectively and consumes 10.6 times less energy.Peer ReviewedPostprint (author's final draft
Assessing load-sharing within optimistic simulation platforms
The advent of multi-core machines has lead to the need for revising the architecture of modern simulation platforms. One recent proposal we made attempted to explore the viability of load-sharing for optimistic simulators run on top of these types of machines. In this article, we provide an extensive experimental study for an assessment of the effects on run-time dynamics by a load-sharing architecture that has been implemented within the ROOT-Sim package, namely an open source simulation platform adhering to the optimistic synchronization paradigm. This experimental study is essentially aimed at evaluating possible sources of overheads when supporting load-sharing. It has been based on differentiated workloads allowing us to generate different execution profiles in terms of, e.g., granularity/locality of the simulation events. © 2012 IEEE
HULK-V: a Heterogeneous Ultra-low-power Linux capable RISC-V SoC
IoT applications span a wide range in performance and memory footprint, under
tight cost and power constraints. High-end applications rely on power-hungry
Systems-on-Chip (SoCs) featuring powerful processors, large LPDDR/DDR3/4/5
memories, and supporting full-fledged Operating Systems (OS). On the contrary,
low-end applications typically rely on Ultra-Low-Power ucontrollers with a
"close to metal" software environment and simple micro-kernel-based runtimes.
Emerging applications and trends of IoT require the "best of both worlds":
cheap and low-power SoC systems with a well-known and agile software
environment based on full-fledged OS (e.g., Linux), coupled with extreme energy
efficiency and parallel digital signal processing capabilities. We present
HULK-V: an open-source Heterogeneous Linux-capable RISC-V-based SoC coupling a
64-bit RISC-V processor with an 8-core Programmable Multi-Core Accelerator
(PMCA), delivering up to 13.8 GOps, up to 157 GOps/W and accelerating the
execution of complex DSP and ML tasks by up to 112x over the host processor.
HULK-V leverages a lightweight, fully digital memory hierarchy based on
HyperRAM IoT DRAM that exposes up to 512 MB of DRAM memory to the host CPU.
Featuring HyperRAMs, HULK-V doubles the energy efficiency without significant
performance loss compared to featuring power-hungry LPDDR memories, requiring
expensive and large mixed-signal PHYs. HULK-V, implemented in Global Foundries
22nm FDX technology, is a fully digital ultra-low-cost SoC running a 64-bit
Linux software stack with OpenMP host-to-PMCA offload within a power envelope
of just 250 mW.Comment: This paper has been accepted as full paper at DATE23
https://www.date-conference.com/date-2023-accepted-papers#Regular-Paper
Combined Integer and Floating Point Multiplication Architecture(CIFM) for FPGAs and Its Reversible Logic Implementation
In this paper, the authors propose the idea of a combined integer and
floating point multiplier(CIFM) for FPGAs. The authors propose the replacement
of existing 18x18 dedicated multipliers in FPGAs with dedicated 24x24
multipliers designed with small 4x4 bit multipliers. It is also proposed that
for every dedicated 24x24 bit multiplier block designed with 4x4 bit
multipliers, four redundant 4x4 multiplier should be provided to enforce the
feature of self repairability (to recover from the faults). In the proposed
CIFM reconfigurability at run time is also provided resulting in low power. The
major source of motivation for providing the dedicated 24x24 bit multiplier
stems from the fact that single precision floating point multiplier requires
24x24 bit integer multiplier for mantissa multiplication. A reconfigurable,
self-repairable 24x24 bit multiplier (implemented with 4x4 bit multiply
modules) will ideally suit this purpose, making FPGAs more suitable for integer
as well floating point operations. A dedicated 4x4 bit multiplier is also
proposed in this paper. Moreover, in the recent years, reversible logic has
emerged as a promising technology having its applications in low power CMOS,
quantum computing, nanotechnology, and optical computing. It is not possible to
realize quantum computing without reversible logic. Thus, this paper also paper
provides the reversible logic implementation of the proposed CIFM. The
reversible CIFM designed and proposed here will form the basis of the
completely reversible FPGAs.Comment: Published in the proceedings of the The 49th IEEE International
Midwest Symposium on Circuits and Systems (MWSCAS 2006), Puerto Rico, August
2006. Nominated for the Student Paper Award(12 papers are nominated for
Student paper Award among all submissions
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
Research has shown that convolutional neural networks contain significant
redundancy, and high classification accuracy can be obtained even when weights
and activations are reduced from floating point to binary values. In this
paper, we present FINN, a framework for building fast and flexible FPGA
accelerators using a flexible heterogeneous streaming architecture. By
utilizing a novel set of optimizations that enable efficient mapping of
binarized neural networks to hardware, we implement fully connected,
convolutional and pooling layers, with per-layer compute resources being
tailored to user-provided throughput requirements. On a ZC706 embedded FPGA
platform drawing less than 25 W total system power, we demonstrate up to 12.3
million image classifications per second with 0.31 {\mu}s latency on the MNIST
dataset with 95.8% accuracy, and 21906 image classifications per second with
283 {\mu}s latency on the CIFAR-10 and SVHN datasets with respectively 80.1%
and 94.9% accuracy. To the best of our knowledge, ours are the fastest
classification rates reported to date on these benchmarks.Comment: To appear in the 25th International Symposium on Field-Programmable
Gate Arrays, February 201
- …