2 research outputs found

    Low power design of 16-bit synchronous counter by introducing effective clock monitoring circuits

    Get PDF
    Most of the system-level designs contain sequential circuits. Power optimization of these circuits at many levels is required to build a portable device with a long battery life. A dynamic clock gating technique was used in this work to reduce the power and temperature of a 16-bit counter. The simulation was performed on cadence SCL 180 nm technology, for a supply voltage of 1.8 V at a frequency of 500 MHz. With the proposed approach, a 77.16% power reduction was achieved at the cost of 14.83% in area overhead. Moreover, the layout of the circuits was also designed in the Innovus tool to obtain a more accurate silicon area and gate count. The Innovus output files ".flp file" and ".pptrace file" were used as inputs to the HotSpot tool for determining the absolute temperature of the integrated circuits (ICs). The obtained temperature results were compared with the ordinary 16-bit counter, and it was found that the proposed approach was able to reduce temperature by 14.34%

    Rethinking FPGA Architectures for Deep Neural Network applications

    Get PDF
    The prominence of machine learning-powered solutions instituted an unprecedented trend of integration into virtually all applications with a broad range of deployment constraints from tiny embedded systems to large-scale warehouse computing machines. While recent research confirms the edges of using contemporary FPGAs to deploy or accelerate machine learning applications, especially where the latency and energy consumption are strictly limited, their pre-machine learning optimised architectures remain a barrier to the overall efficiency and performance. Realizing this shortcoming, this thesis demonstrates an architectural study aiming at solutions that enable hidden potentials in the FPGA technology, primarily for machine learning algorithms. Particularly, it shows how slight alterations to the state-of-the-art architectures could significantly enhance the FPGAs toward becoming more machine learning-friendly while maintaining the near-promised performance for the rest of the applications. Eventually, it presents a novel systematic approach to deriving new block architectures guided by designing limitations and machine learning algorithm characteristics through benchmarking. First, through three modifications to Xilinx DSP48E2 blocks, an enhanced digital signal processing (DSP) block for important computations in embedded deep neural network (DNN) accelerators is described. Then, two tiers of modifications to FPGA logic cell architecture are explained that deliver a variety of performance and utilisation benefits with only minor area overheads. Eventually, with the goal of exploring this new design space in a methodical manner, a problem formulation involving computing nested loops over multiply-accumulate (MAC) operations is first proposed. A quantitative methodology for deriving efficient coarse-grained compute block architectures from benchmarks is then suggested together with a family of new embedded blocks, called MLBlocks
    corecore