9 research outputs found

    CONSTRUCTING A FA FOR HARDWARE HASTENING FOR DSP

    Get PDF
    CS representation continues to be broadly accustomed to design fast arithmetic circuits because of its natural benefit of getting rid of the big carry-propagation chains. Hardware acceleration continues to be demonstrated a very promising implementation technique for digital signal processing (DSP) domain. However, research activities have proven the arithmetic optimizations at greater abstraction levels compared to structural circuit one considerably effect on the data path performance. Instead of adopting a monolithic application-specific integrated circuit design approach, within this brief, we present a manuscript accelerator architecture composed of flexible computational models that offer the execution of a big group of operation templates present in DSP popcorn kernels. Extensive experimental evaluations reveal that the suggested accelerator architecture provides average gains as high as 61.91% in area-delay product and 54.43% in energy consumption in comparison using the condition-of-art flexible data paths. We differentiate from previous creates flexible accelerators by enabling computations to become strongly carried out with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are employed enabling CS optimizations to become carried out inside a bigger scope compared to previous approaches

    VLSI DESIGN FOR CARRY-PROTECT FORMATTED DATA

    Get PDF
    However, research activities have proven the arithmetic optimizations at greater abstraction levels compared to structural circuit one considerably effect on the datapath performance. CS representation continues to be broadly accustomed to design fast arithmetic circuits because of its natural benefit of getting rid of the big carry-propagation chains. Hardware acceleration continues to be demonstrated a very promising implementation technique for digital signal processing (DSP) domain. Instead of adopting a monolithic application-specific integrated circuit design approach, within this brief, we present a manuscript accelerator architecture composed of flexible computational models that offer the execution of a big group of operation templates present in DSP popcorn kernels. Extensive experimental evaluations reveal that the suggested accelerator architecture provides average gains as high as 61.91% in area-delay product and 54.43% in energy consumption in comparison using the condition-of-art flexible datapaths. We differentiate from previous creates flexible accelerators by enabling computations to become strongly carried out with carry-save (CS) formatted data. Advanced arithmetic design concepts, i.e., recoding techniques, are employed enabling CS optimizations to become carried out inside a bigger scope compared to previous approaches

    IMPLEMENTATION OF LOW POWER AND DELAY SCALABLE CHANNEL PARALLEL NAND FLASH MEMORY CONTROLLER ARCHITECTURE USING ALU

    Get PDF
    RISC refers to Reduced Instruction Set Computer. Which means the computer that consists of RISC processor contains reduced (simple) instructions for performing necessary and required operations. Any chip if considered as processor, it should have the capability of performing certain operations like arithmetic, logical, control and data transfer. For performing these operations, a processor should contain some major blocks as Control unit (CU), Flexible computational unit (FCU), Program counter (PC), Accumulator, Instruction register, Memory and additional logic. RISC actually enhances the performance of processor by considering the factors like simple architecture construction and instruction set, easy instruction set for decoding and simplified control architecture. This paper proposes a simple 32 bit RISC processor by using Peres reversible logic gates, which is expected to reduce the size then the conventional architecture that is based on carry save logic adder approach. The synthesis and simulation is carried out using XILINX ISE 12.3i and HDL is developed using VERILOG language

    OPTIMIZING HIGH SPEED AND POWER CARRY SAVE ARITHMETIC CIRCUITS USING RISC PROCESSOR

    Get PDF
    RISC refers to Reduced Instruction Set Computer. Which means the computer that consists of RISC processor contains reduced (simple) instructions for performing necessary and required operations. Any chip if considered as processor, it should have the capability of performing certain operations like arithmetic, logical, control and data transfer. For performing these operations, a processor should contain some major blocks as Control unit (CU), Flexible computational unit (FCU), Program counter (PC), Accumulator, Instruction register, Memory and additional logic. RISC actually enhances the performance of processor by considering the factors like simple architecture construction and instruction set, easy instruction set for decoding and simplified control architecture. This paper proposes a simple 32 bit RISC processor by using Peres reversible logic gates, which is expected to reduce the size then the conventional architecture that is based on carry save logic adder approach. The synthesis and simulation is carried out using XILINX ISE 12.3i and HDL is developed using VHDL language

    Implementation of RISC Processor for DSPAcceleratorArchitectureExploiting Carry Save Arithmetic

    Get PDF
    Hardware acceleration has been proved an extremely promisingimplementation strategyforthedigitalsignal processing(DSP) domain.Ratherthanadoptingamonolithicapplication-specificintegrated circuit designapproach,  in thisbrief, we present a  novel accelerator architecture comprising flexiblecomputational  units that support the executionofalargesetofoperationtemplatesfoundinDSPkernels. Wedifferentiatefrompreviousworksonflexibleacceleratorsbyenabling computations tobeaggressivelyperformedwithcarry-save(CS)formatteddata.Advancedarithmeticdesignconcepts, i.e.,recodingtechniques, areutilizedenabling CSoptimizationstobeperformedinalargerscope thaninpreviousapproaches.Extensiveexperimentalevaluationsshow thattheproposedacceleratorarchitecturedeliversaveragegainsofup to 61.91%in area-delay productand54.43%in energy consumption comparedwiththestate-of-artflexibledatapaths. In this paper, their concentration is on 16 bit operations but here in the proposed scheme, the focus is on 32 bit operations.Hardware Acceleration basically refers to the usage of computer hardware to perform some functions faster than they are actually possible within the software running on general purpose CPU. TheRISCor ReducedInstructionSetComputerisadesignphilosophythathasbecomeamainstreaminScientificandengineeringapplications.Themainobjectiveofthispaperis to design and implement of 32 – bit RISC(ReducedInstruction Set Computer) processor forflexible DSP Accelerator Architecture.Thedesignwillhelp to improve the speed of the processor, and to give thehigherperformance of the processor. The most important featureofthe RISC processor is that this processor is very simpleandsupport load/store architecture. The important componentsofthis processor include the Arithmetic Logic Unit,Shifter,Rotator and Control unit. The module functionalityandperformance issues like area, power dissipationandpropagation delay are analyzed. Therefore, here we meet some of the main constraints likeComplexity of the instruction set, which will reduce the amount of space, time, cost, power, heat and other things that it takes to implement the instruction set part of a processor. As the Time of execution decreases, the Speed of execution automatically increases.Hardware acceleration has been proved an extremely promisingimplementation strategyforthedigitalsignal processing(DSP) domain.Ratherthanadoptingamonolithicapplication-specificintegrated circuit designapproach,  in thisbrief, we present a  novel accelerator architecture comprising flexiblecomputational  units that support the executionofalargesetofoperationtemplatesfoundinDSPkernels. Wedifferentiatefrompreviousworksonflexibleacceleratorsbyenabling computations tobeaggressivelyperformedwithcarry-save(CS)formatteddata.Advancedarithmeticdesignconcepts, i.e.,recodingtechniques, areutilizedenabling CSoptimizationstobeperformedinalargerscope thaninpreviousapproaches.Extensiveexperimentalevaluationsshow thattheproposedacceleratorarchitecturedeliversaveragegainsofup to 61.91%in area-delay productand54.43%in energy consumption comparedwiththestate-of-artflexibledatapaths. In this paper, their concentration is on 16 bit operations but here in the proposed scheme, the focus is on 32 bit operations.Hardware Acceleration basically refers to the usage of computer hardware to perform some functions faster than they are actually possible within the software running on general purpose CPU. TheRISCor ReducedInstructionSetComputerisadesignphilosophythathasbecomeamainstreaminScientificandengineeringapplications.Themainobjectiveofthispaperis to design and implement of 32 – bit RISC(ReducedInstruction Set Computer) processor forflexible DSP Accelerator Architecture.Thedesignwillhelp to improve the speed of the processor, and to give thehigherperformance of the processor. The most important featureofthe RISC processor is that this processor is very simpleandsupport load/store architecture. The important componentsofthis processor include the Arithmetic Logic Unit,Shifter,Rotator and Control unit. The module functionalityandperformance issues like area, power dissipationandpropagation delay are analyzed. Therefore, here we meet some of the main constraints likeComplexity of the instruction set, which will reduce the amount of space, time, cost, power, heat and other things that it takes to implement the instruction set part of a processor. As the Time of execution decreases, the Speed of execution automatically increases

    Data-flow transformations to maximize the use of carry-save representation in arithmetic circuits

    No full text
    The increasing importance,of datapath circuits in complex systems-on-chip calls for special arithmetic optimizations. The goal is to automatically achieve the handcrafted results which escape classic logic optimizations. Some work has been done in the recent years to infer the use of the carry-save representation in the synthesis of arithmetic circuits. Yet, many cases of practical interest cannot be handled due to the scattering of logic operations among the arithmetic ones-particularly in arithmetic computations which are originally described at the bit level in high-level languages such as C. We therefore introduce an algorithm to restructure dataflow graphs so that they can be synthesized as high-quality arithmetic circuits, close to those that an expert designer would conceive. On typical embedded software benchmarks which could be advantageously implemented with hardware accelerators, our technique always reduces tangibly the critical path by up to 46% and generally achieves the quality of manual implementations. In many cases, our algorithm also manages to reduce the cell area by up to 10%-20%

    Closing the Gap between FPGA and ASIC:Balancing Flexibility and Efficiency

    Get PDF
    Despite many advantages of Field-Programmable Gate Arrays (FPGAs), they fail to take over the IC design market from Application-Specific Integrated Circuits (ASICs) for high-volume and even medium-volume applications, as FPGAs come with significant cost in area, delay, and power consumption. There are two main reasons that FPGAs have huge efficiency gap with ASICs: (1) FPGAs are extremely flexible as they have fully programmable soft-logic blocks and routing networks, and (2) FPGAs have hard-logic blocks that are only usable by a subset of applications. In other words, current FPGAs have a heterogeneous structure comprised of the flexible soft-logic and the efficient hard-logic blocks that suffer from inefficiency and inflexibility, respectively. The inefficiency of the soft-logic is a challenge for any application that is mapped to FPGAs, and lack of flexibility in the hard-logic results in a waste of resources when an application cannot use the hard-logic. In this thesis, we approach the inefficiency problem of FPGAs by bridging the efficiency/flexibility gap of the hard- and soft-logic. The main goal of this thesis is to compromise on efficiency of the hard-logic for flexibility, on the one hand, and to compromise on flexibility of the soft-logic for efficiency, on the other hand. In other words, this thesis deals with two issues: (1) adding more generality to the hard-logic of FPGAs, and (2) improving the soft-logic by adapting it to the generic requirements of applications. In the first part of the thesis, we introduce new techniques that expand the functionality of FPGAs hard-logic. The hard-logic includes the dedicated resources that are tightly coupled with the soft-logic –i.e., adder circuitry and carry chains –as well as the stand-alone ones –i.e., DSP blocks. These specialized resources are intended to accelerate critical arithmetic operations that appear in the pre-synthesis representation of applications; we introduce mapping and architectural solutions, which enable both types of the hard-logic to support additional arithmetic operations. We first present a mapping technique that extends the application of FPGAs carry chains for carry-save arithmetic, and then to increase the generality of the hard-logic, we introduce novel architectures; using these architectures, more applications can take advantage of FPGAs hard-logic. In the second part of the thesis, we improve the efficiency of FPGAs soft-logic by exploiting the circuit patterns that emerge after logic synthesis, i.e., connection and logic patterns. Using these patterns, we design new soft-logic blocks that have less flexibility, but more efficiency than current ones. In this part, we first introduce logic chains, fixed connections that are integrated between the soft-logic blocks of FPGAs and are well-suited for long chains of logic that appear post-synthesis. Logic chains provide fast and low cost connectivity, increase the bandwidth of the logic blocks without changing their interface with the routing network, and improve the logic density of soft-logic blocks. In addition to logic chains and as a complementary contribution, we present a non-LUT soft-logic block that comprises simple and pre-connected cells. The structure of this logic block is inspired from the logic patterns that appear post-synthesis. This block has a complexity that is only linear in the number of inputs, it sports the potential for multiple independent outputs, and the delay is only logarithmic in the number of inputs. Although this new block is less flexible than a LUT, we show (1) that effective mapping algorithms exist, (2) that, due to their simplicity, poor utilization is less of an issue than with LUTs, and (3) that a few LUTs can still be used in extreme unfortunate cases. In summary, to bridge the gap between FPGAs and ASICs, we approach the problem from two complementary directions, which balance flexibility and efficiency of the logic blocks of FPGAs. However, we were able to explore a few design points in this thesis, and future work could focus on further exploration of the design space
    corecore