477 research outputs found

    Empowering a helper cluster through data-width aware instruction selection policies

    Get PDF
    Narrow values that can be represented by less number of bits than the full machine width occur very frequently in programs. On the other hand, clustering mechanisms enable cost- and performance-effective scaling of processor back-end features. Those attributes can be combined synergistically to design special clusters operating on narrow values (a.k.a. helper cluster), potentially providing performance benefits. We complement a 32-bit monolithic processor with a low-complexity 8-bit helper cluster. Then, in our main focus, we propose various ideas to select suitable instructions to execute in the data-width based clusters. We add data-width information as another instruction steering decision metric and introduce new data-width based selection algorithms which also consider dependency, inter-cluster communication and load imbalance. Utilizing those techniques, the performance of a wide range of workloads are substantially increased; helper cluster achieves an average speedup of 11% for a wide range of 412 apps. When focusing on integer applications, the speedup can be as high as 22% on averagePeer ReviewedPostprint (published version

    Internet of Things Based Reconfigurable SIMD Processor for High-Speed End Devices in FPGA

    Get PDF
    This research article proposed the reconfigurable Single Instruction Multi Data (SIMD) processor design to speed up the accelerated computing task in IoT operations. Single Instruction Multi Data models leverage the parallel real source to speed up computing accelerated tasks. It proposes the utilization of reconfigurable Kogge Stone-dependent hybrid adder structures, now referred to as KS-CPA, in which reconfiguration occurs during the addition operation. The Least Significant Bits (LSB) are processed using a carry propagate adder, while the Most Significant Bits (MSB) are computed using the Kogge Stone adder. Depending on the data width and device-accessible energy resources, the hybrid configuration of the adder offers the 4-bit, 8-bit, and 16-bit addition. The adder form is identified by a shift in the configuration of its Carry Look-ahead and then by a Kogge Stone Adder (KSA). Throughout the activity, the KS-CLA crossbreed configuration is used to attain the fastest speed and low energy usage. The effectiveness, including its proposed hybrid adder, is evaluated by looking at the speed, energy, and area parameters, including a suitable area use during rapid applications in which both less delay and low power adders are required. Considering these, we are structuring an IoT processor that can be reconfigured to gain from SIMD. We have demonstrated that our hybrid adder-enhanced processor saves energy up to 13% and reduces 27% latency. The proposed 16 and 32-bit adders will boost time, power, and Area Delay Product (ADP) by almost 18-24% and 13-19% respectively

    Increasing adder efficiency by exploiting input statistics

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2008.Includes bibliographical references (p. 49-50).Current techniques for characterizing the power consumption of adders rely on assuming that the inputs are completely random. However, the inputs generated by realistic applications are not random, and in fact include a great deal of structure. Input bits are more likely to remain in the same logical states from addition to addition than would be expected by chance and bits, especially the most significant bits, are very likely to be in the same state as their neighbors. Taking this data, I look at ways that it can be used to improve the design of adders. The first method I look at involves looking at how different adder architectures respond to the different characteristics of input data from the more significant and less significant bits of the adder, and trying to use these responses to create a hybrid adder. Unfortunately the differences are not sufficient for this approach to be effective. I next look at the implications of the data I collected for the optimization of Kogge- Stone adder trees, and find that in certain circumstances the use of experimentally derived activity maps rather than ones based on simple assumptions can increase adder performance by as much as 30%.by Andrew Lawrence Clough.M.Eng

    Design and Implementation of Hybrid Multiplier Using ZFC

    Get PDF
    The field of research has recently been driven to build systems with low power consumption and high speed due to the increasing number of portable devices. The rapid development of semiconductor technology has contributed to a growing need for portable and embedded digital signal processing (DSP) devices. All DSP applications, multipliers are essential components. For high speed DSP, low power, high speed multipliers are therefore required. All current commercial DSP processors have at least one dedicated multiplier unit since the capacity to compute at a quicker pace is necessary to achieve excellent performance in many DSP and graphic processing algorithms. Numerous researchers have developed a number of multipliers, including modified Booth multipliers, array, Booth, carry save, and Wallace tree. However, today’s computational circuits such as high performance processors, digital signal processing, and cryptographic algorithms require highly effective and speed multipliers. Hence, In this work, Design and Implementation of Hybrid Multiplier using ZFC (Zero Finding Logic) is presented. This Hybrid Multiplier is the combination of Finite Field Multiplier and Modified Kogee Stone Multiplier. The Zero Finding Logic is used to identify the zeros from the resultant product

    High speed modified carry save adder using a structure of multiplexers

    Get PDF
    Adders are the heart of data path circuits for any processor in digital computer and signal processing systems. Growth in technology keeps supporting efficient design of binary adders for high speed applications. In this paper, a fast and area-efficient modified carry save adder (CSA) is presented. A multiplexer based design of full adder is proposed to implement the structure of the CSA. The proposed design of full adder is employed in designing all stages of traditional CSA. By modifying the design of full adder in CSA, the complexity and area of the design can be reduced, resulting in reduced delay time. The VHDL implementations of CSA adders including (the proposed version, traditional CSA, and modified CSAs presented in literature) are simulated using Quartus II synthesis software tool with the altera FPGA EP2C5T144C6 device (Cyclone II). Simulation results of 64-bit adder designs demonstrate the average improvement of 17.75%, 1.60%, and 8.81% respectively for the worst case time, thermal power dissipation and number of FPGA logic elements

    The use of reversible logic gates in the design of residue number systems

    Get PDF
    Reversible computing is an emerging technique to achieve ultra-low-power circuits. Reversible arithmetic circuits allow for achieving energy-efficient high-performance computational systems. Residue number systems (RNS) provide parallel and fault-tolerant additions and multiplications without carry propagation between residue digits. The parallelism and fault-tolerance features of RNS can be leveraged to achieve high-performance reversible computing. This paper proposed RNS full reversible circuits, including forward converters, modular adders and multipliers, and reverse converters used for a class of RNS moduli sets with the composite form {2k, 2p-1}. Modulo 2n-1, 2n, and 2n+1 adders and multipliers were designed using reversible gates. Besides, reversible forward and reverse converters for the 3-moduli set {2n-1, 2n+k, 2n+1} have been designed. The proposed RNS-based reversible computing approach has been applied for consecutive multiplications with an improvement of above 15% in quantum cost after the twelfth iteration, and above 27% in quantum depth after the ninth iteration. The findings show that the use of the proposed RNS-based reversible computing in convolution results in a significant improvement in quantum depth in comparison to conventional methods based on weighted binary adders and multipliers

    Performance Improvement for Reconfigurable Processor System Design in IoT Health Care Monitoring Applications

    Get PDF
    This research focuses on critical hardware components of an Internet of Things (IoT) system for reconfigurable processing systems. Single-Instruction Multiple-Data (SIMD) processors have recently been utilized to preprocess data at energy-constrained sensor nodes or IoT gateways, saving significant energy and bandwidth for transmission. Using traditional CPU-based systems to implement machine learning algorithms is inefficient in terms of energy consumption. In the proposed method Single-Instruction Multiple-Data (SIMD) processors are assembled by scaling the largest possible operand value subunits into direct access to the internal memory, where the carry output of each unit is conditionally fed into the next unit based on the implementation of the SIMD Processor design for Internet of Things applications. Each method has evaluated sub-operations that contribute considerably to the overall potential of the design. If the single register file can complete the intended action, a zero (one)-signal is applied to each unit\u27s carry input. Multiplexers combine two or more adders, sending the carry signal from one unit into another if additional units are necessary to compute the sum. The outcome results compare high-speed end device techniques in terms of area and power consumption. The proposed SIMD processor-based IoT healthcare monitoring system with a MIMD processor\u27s performance analysis of comparison clearly demonstrates that the system produces decent outcomes. The suggested system has an area overhead of 85 m2, a power usage of 4.10 W, and a time delay of 20 ns
    • …
    corecore