956 research outputs found

    A dynamically reconfigurable pattern matcher for regular expressions on FPGA

    Get PDF
    In this article we describe how to expand a partially dynamic reconfig- urable pattern matcher for regular expressions presented in previous work by Di- vyasree and Rajashekar [2]. The resulting, extended, pattern matcher is fully dynamically reconfigurable. First, the design is adapted for use with parameterisable configurations, a method for Dynamic Circuit Specialization. Using parameteris- able configurations allows us to achieve the same area gains as the hand crafted reconfigurable design, with the benefit that parameterisable configurations can be applied automatically. This results in a design that is more easily adaptable to spe- cific applications and allows for an easier design exploration. Additionally, the pa- rameterisable configuration implementation is also generated automatically, which greatly reduces the design overhead of using dynamic reconfiguration. Secondly, we propose a number of expansions to the original design to overcome several limitations in the original design that constrain the dynamic reconfigurability of the pattern matcher. We propose two different solutions to dynamically change the character that is matched in a certain block. The resulting pattern matcher, after these changes, is fully dynamically reconfigurable, all aspects of the implemented regular expression can be changed at run-time

    Internet of Things Based Reconfigurable SIMD Processor for High-Speed End Devices in FPGA

    Get PDF
    This research article proposed the reconfigurable Single Instruction Multi Data (SIMD) processor design to speed up the accelerated computing task in IoT operations. Single Instruction Multi Data models leverage the parallel real source to speed up computing accelerated tasks. It proposes the utilization of reconfigurable Kogge Stone-dependent hybrid adder structures, now referred to as KS-CPA, in which reconfiguration occurs during the addition operation. The Least Significant Bits (LSB) are processed using a carry propagate adder, while the Most Significant Bits (MSB) are computed using the Kogge Stone adder. Depending on the data width and device-accessible energy resources, the hybrid configuration of the adder offers the 4-bit, 8-bit, and 16-bit addition. The adder form is identified by a shift in the configuration of its Carry Look-ahead and then by a Kogge Stone Adder (KSA). Throughout the activity, the KS-CLA crossbreed configuration is used to attain the fastest speed and low energy usage. The effectiveness, including its proposed hybrid adder, is evaluated by looking at the speed, energy, and area parameters, including a suitable area use during rapid applications in which both less delay and low power adders are required. Considering these, we are structuring an IoT processor that can be reconfigured to gain from SIMD. We have demonstrated that our hybrid adder-enhanced processor saves energy up to 13% and reduces 27% latency. The proposed 16 and 32-bit adders will boost time, power, and Area Delay Product (ADP) by almost 18-24% and 13-19% respectively

    An Energy-Efficient Reconfigurable Circuit Switched Network-on-Chip

    Get PDF
    Network-on-Chip (NoC) is an energy-efficient on-chip communication architecture for multi-tile System-on-Chip (SoC) architectures. The SoC architecture, including its run-time software, can replace inflexible ASICs for future ambient systems. These ambient systems have to be flexible as well as energy-efficient. To find an energy-efficient solution for the communication network we analyze three wireless applications. Based on their communication requirements we observe that revisiting of the circuit switching techniques is beneficial. In this paper we propose a new energy-efficient reconfigurable circuit-switched Network-on-Chip. By physically separating the concurrent data streams we reduce the overall energy consumption. The circuit-switched router has been synthesized and analyzed for its power consumption in 0.13 ¿m technology. A 5-port circuit-switched router has an area of 0.05 mm2 and runs at 1075 MHz. The proposed architecture consumes 3.5 times less energy compared to its packet-switched equivalen

    Performance Improvement for Reconfigurable Processor System Design in IoT Health Care Monitoring Applications

    Get PDF
    This research focuses on critical hardware components of an Internet of Things (IoT) system for reconfigurable processing systems. Single-Instruction Multiple-Data (SIMD) processors have recently been utilized to preprocess data at energy-constrained sensor nodes or IoT gateways, saving significant energy and bandwidth for transmission. Using traditional CPU-based systems to implement machine learning algorithms is inefficient in terms of energy consumption. In the proposed method Single-Instruction Multiple-Data (SIMD) processors are assembled by scaling the largest possible operand value subunits into direct access to the internal memory, where the carry output of each unit is conditionally fed into the next unit based on the implementation of the SIMD Processor design for Internet of Things applications. Each method has evaluated sub-operations that contribute considerably to the overall potential of the design. If the single register file can complete the intended action, a zero (one)-signal is applied to each unit\u27s carry input. Multiplexers combine two or more adders, sending the carry signal from one unit into another if additional units are necessary to compute the sum. The outcome results compare high-speed end device techniques in terms of area and power consumption. The proposed SIMD processor-based IoT healthcare monitoring system with a MIMD processor\u27s performance analysis of comparison clearly demonstrates that the system produces decent outcomes. The suggested system has an area overhead of 85 m2, a power usage of 4.10 W, and a time delay of 20 ns

    ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION

    Get PDF
    Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large

    AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

    Full text link
    CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to reprogram the FPGAs for flexible acceleration of many workloads. Nonetheless, this advantage is often overshadowed by the poor programmability of FPGAs whose programming is conventionally a RTL design practice. Although recent advances in high-level synthesis (HLS) significantly improve the FPGA programmability, it still leaves programmers facing the challenge of identifying the optimal design configuration in a tremendous design space. This paper aims to address this challenge and pave the path from software programs towards high-quality FPGA accelerators. Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs. Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space. Also, we introduce an analytical model to capture the performance and resource trade-offs among different design configurations of the CPP microarchitecture, which lays the foundation for fast design space exploration. On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated. AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture. Our experiments show that the AutoAccel-generated accelerators outperform their corresponding software implementations by an average of 72x for a broad class of computation kernels

    LTE implementation on CGRA based SiLago Platform

    Get PDF
    Abstract. This thesis implements long term evolution (LTE) transmission layer on a coarse grained reconfigurable called, dynamically reconfigurable resource array (DRRA). Specifically, we implement physical downlink shared channel baseband signal processing blocks (PDSCH) at high level. The overall implementation follows silicon large grain object (SiLago) design methodology. The methodology employs SiLago blocks instead of mainstream standard cells. The main ambition of this thesis was to prove that a standard as complex as LTE can be implemented using the in-house SiLago framework. The work aims to prove that customized design with efficiency close to application specific integrated circuit (ASIC) for LTE can be generated with the programming ease of MATLAB. During this thesis, we have generated a completely parametrizable LTE standard at high level
    corecore