562 research outputs found

    To Develop and Implement Low Power, High Speed VLSI for Processing Signals using Multirate Techniques

    Get PDF
    Multirate technique is necessary for systems with different input and output sampling rates. Recent advances in mobile computing and communication applications demand low power and high speed VLSI DSP systems [4]. This Paper presents Multirate modules used for filtering to provide signal processing in wireless communication system. Many architecture developed for the design of low complexity, bit parallel Multiple Constant Multiplications operation which dominates the complexity of DSP systems. However, major drawbacks of present approaches are either too costly or not efficient enough. On the other hand, MCM and digit-serial adder offer alternative low complexity designs, since digit-serial architecture occupy less area and are independent of the data word length [1][10]. Multiple Constant Multiplications is efficient way to reduce the number of addition and subtraction in polyphase filter implementation. This Multirate design methodology is systematic and applicable to many problems. In this paper, attention has given to the MCM & digit serial architecture with shifting and adding techniques that offers alternative low complexity in operations. This paper also focused on Multirate Signal Processing Modules using Voltage and Technology scaling. Reduction of power consumption is important for VLSI system and also it becomes one of the most critical design parameter. Transistorized Multirate module which has full custom design with different circuit topology and optimization level simulated on cadence platform. Multirate modules are used AMI 0.6 um, TSMC 0.35 um, and TSMC 0.25 um technologies for different voltage scaling. The presented methodology provides a systematic way to derive circuit technique for high speed operation at a low supply voltage. Multirate polyphase interpolator and decimator are also designed and optimized at architectural level in order to analyze the terms power consumption, area and speed. DOI: 10.17762/ijritcc2321-8169.150314

    A high-speed integrated circuit with applications to RSA Cryptography

    Get PDF
    Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and private communications systems has led to an increasing need for these systems to be secure against unauthorised access and eavesdropping. To this end, modern computer security systems employ public-key ciphers, of which probably the most well known is the RSA ciphersystem, to provide both secrecy and authentication facilities. The basic RSA cryptographic operation is a modular exponentiation where the modulus and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable encryption rates using the RSA cipher requires that it be implemented in hardware. This thesis presents the design of a high-performance VLSI device, called the WHiSpER chip, that can perform the modular exponentiations required by the RSA cryptosystem for moduli and exponents up to 506 bits long. The design has an expected throughput in excess of 64kbit/s making it attractive for use both as a general RSA processor within the security function provider of a security system, and for direct use on moderate-speed public communication networks such as ISDN. The thesis investigates the low-level techniques used for implementing high-speed arithmetic hardware in general, and reviews the methods used by designers of existing modular multiplication/exponentiation circuits with respect to circuit speed and efficiency. A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic, together with an efficient multiplier architecture, are proposed that remove the speed bottleneck of previous designs. Finally, the implementation of the new algorithm and architecture within the WHiSpER chip is detailed, along with a discussion of the application of the chip to ciphering and key generation

    Hardware Implementations for Symmetric Key Cryptosystems

    Get PDF
    The utilization of global communications network for supporting new electronic applications is growing. Many applications provided over the global communications network involve exchange of security-sensitive information between different entities. Often, communicating entities are located at different locations around the globe. This demands deployment of certain mechanisms for providing secure communications channels between these entities. For this purpose, cryptographic algorithms are used by many of today\u27s electronic applications to maintain security. Cryptographic algorithms provide set of primitives for achieving different security goals such as: confidentiality, data integrity, authenticity, and non-repudiation. In general, two main categories of cryptographic algorithms can be used to accomplish any of these security goals, namely, asymmetric key algorithms and symmetric key algorithms. The security of asymmetric key algorithms is based on the hardness of the underlying computational problems, which usually require large overhead of space and time complexities. On the other hand, the security of symmetric key algorithms is based on non-linear transformations and permutations, which provide efficient implementations compared to the asymmetric key ones. Therefore, it is common to use asymmetric key algorithms for key exchange, while symmetric key counterparts are deployed in securing the communications sessions. This thesis focuses on finding efficient hardware implementations for symmetric key cryptosystems targeting mobile communications and resource constrained applications. First, efficient lightweight hardware implementations of two members of the Welch-Gong (WG) family of stream ciphers, the WG(29,11)\left(29,11\right) and WG-1616, are considered for the mobile communications domain. Optimizations in the WG(29,11)\left(29,11\right) stream cipher are considered when the GF(229)GF\left(2^{29}\right) elements are represented in either the Optimal normal basis type-II (ONB-II) or the Polynomial basis (PB). For WG-1616, optimizations are considered only for PB representations of the GF(216)GF\left(2^{16}\right) elements. In this regard, optimizations for both ciphers are accomplished mainly at the arithmetic level through reducing the number of field multipliers, based on novel trace properties. In addition, other optimization techniques such as serialization and pipelining, are also considered. After this, the thesis explores efficient hardware implementations for digit-level multiplication over binary extension fields GF(2m)GF\left(2^{m}\right). Efficient digit-level GF(2m)GF\left(2^{m}\right) multiplications are advantageous for ultra-lightweight implementations, not only in symmetric key algorithms, but also in asymmetric key algorithms. The thesis introduces new architectures for digit-level GF(2m)GF\left(2^{m}\right) multipliers considering the Gaussian normal basis (GNB) and PB representations of the field elements. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers do not require loading of the two input field elements in advance to computations. This feature results in high throughput fast multiplication in resource constrained applications with limited capacity of input data-paths. The new digit-level GF(2m)GF\left(2^{m}\right) single multipliers are considered for both the GNB and PB. In addition, for the GNB representation, new architectures for digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple multipliers are introduced. The new digit-level GF(2m)GF\left(2^{m}\right) hybrid-double and hybrid-triple GNB multipliers, respectively, accomplish the multiplication of three and four field elements using the latency required for multiplying two field elements. Furthermore, a new hardware architecture for the eight-ary exponentiation scheme is proposed by utilizing the new digit-level GF(2m)GF\left(2^{m}\right) hybrid-triple GNB multipliers

    Evolutionary design of digital VLSI hardware

    Get PDF

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5ฮผ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

    High Speed and Low-Complexity Hardware Architectures for Elliptic Curve-Based Crypto-Processors

    Get PDF
    The elliptic curve cryptography (ECC) has been identified as an efficient scheme for public-key cryptography. This thesis studies efficient implementation of ECC crypto-processors on hardware platforms in a bottom-up approach. We first study efficient and low-complexity architectures for finite field multiplications over Gaussian normal basis (GNB). We propose three new low-complexity digit-level architectures for finite field multiplication. Architectures are modified in order to make them more suitable for hardware implementations specially focusing on reducing the area usage. Then, for the first time, we propose a hybrid digit-level multiplier architecture which performs two multiplications together (double-multiplication) with the same number of clock cycles required as the one for one multiplication. We propose a new hardware architecture for point multiplication on newly introduced binary Edwards and generalized Hessian curves. We investigate higher level parallelization and lower level scheduling for point multiplication on these curves. Also, we propose a highly parallel architecture for point multiplication on Koblitz curves by modifying the addition formulation. Several FPGA implementations exploiting these modifications are presented in this thesis. We employed the proposed hybrid multiplier architecture to reduce the latency of point multiplication in ECC crypto-processors as well as the double-exponentiation. This scheme is the first known method to increase the speed of point multiplication whenever parallelization fails due to the data dependencies amongst lower level arithmetic computations. Our comparison results show that our proposed multiplier architectures outperform the counterparts available in the literature. Furthermore, fast computation of point multiplication on different binary elliptic curves is achieved

    Studies on Implementation of . . . High Throughput and Low Power Consumption

    Get PDF
    In this thesis we discuss design and implementation of frequency selective digital filters with high throughput and low power consumption. The thesis includes proposed arithmetic transformations of lattice wave digital filters that aim at increasing the throughput and reduce the power consumption of the filter implementation. The thesis also includes two case studies where digital filters with high throughput and low power consumption are required. A method for obtaining high throughput as well as reduced power consumption of digital filters is arithmetic transformation of the filter structure. In this thesis arithmetic transformations of first- and second-order Richardsโ€™ allpass sections composed by symmetric two-port adaptors and implemented using carry-save arithmetic are proposed. Such filter sections can be used for implementation of lattice wave digital filters and bireciprocal lattice wave digital filters. The latter structures are efficient for implementation of interpolators and decimators by factors of two. Th

    ์˜จ-๋””๋ฐ”์ด์Šค ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง ์—ฐ์‚ฐ ๊ฐ€์†๊ธฐ๋ฅผ ์œ„ํ•œ ๊ณ ์„ฑ๋Šฅ ์—ฐ์‚ฐ ์œ ๋‹› ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ๊น€ํƒœํ™˜.Optimizing computing units for an on-device neural network accelerator can bring less energy and latency, more throughput, and might enable unprecedented new applications. This dissertation studies on two specific optimization opportunities of multiplyaccumulate (MAC) unit for on-device neural network accelerator stem from precision quantization methodology. Firstly, we propose an enhanced MAC processing unit structure efficiently processing mixed-precision model with majority operations with low precision. Precisely, two essential works are: (1) MAC unit structure supporting two precision modes is designed for fully utilizing its computation logic when processing lower precision data, which brings more computation efficiency for mixed-precision models whose major operations are in lower precision; (2) for a set of input CNNs, we formulate the exploration of the size of a single internal multiplier in MAC unit to derive an economical instance, in terms of computation and energy cost, of MAC unit structure across the whole network layers. Experimental results with two well-known CNN models, AlexNet and VGG-16, and two experimental precision settings showed that proposed units can reduce computational cost per multiplication by 4.68โˆผ30.3% and save energy cost by 43.3% on average over conventional units. Secondly, we propose an acceleration technique for processing multiplication operations using stochastic computing (SC). MUX-FSM based SC, which employs a MUX controlled by an FSM to generate a bit sequence of a binary number to count up for a MAC operation, considerably reduces the hardware cost for implementing MAC operations over the traditional stochastic number generator (SNG) based SC. Nevertheless, the existing MUX-FSM based SC still does not meet the multiplication processing time required for a wide adoption of on-device neural networks in practice even though it offers a very economical hardware implementation. Also, conventional enhancements have their limitation for sub-maximal cycle reduction, parameter conversion cost, etc. This work proposes a solution to the problem of further speeding up the conventional MUX-FSM based SC. Precisely, we analyze the bit counting pattern produced by MUX-FSM and replace the counting redundancy by shift operation, resulting in reducing the length of the required bit sequence significantly, theoretically speeding up the worst-case multiplication processing time by 2X or more. Through experiments, it is shown that our enhanced SC technique is able to shorten the average processing time by 38.8% over the conventional MUX-FSM based SC.์˜จ-๋””๋ฐ”์ด์Šค ์ธ๊ณต ์‹ ๊ฒฝ๋ง ์—ฐ์‚ฐ ๊ฐ€์†๊ธฐ๋ฅผ ์œ„ํ•œ ์—ฐ์‚ฐ ํšŒ๋กœ ์ตœ์ ํ™”๋Š” ์ €์ „๋ ฅ, ์ €์ง€์—ฐ์‹œ๊ฐ„, ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰, ๊ทธ๋ฆฌ๊ณ  ์ด์ „์— ๋ถˆ๊ฐ€ํ•˜์˜€๋˜ ์ƒˆ๋กœ์šด ์‘์šฉ์„ ๊ฐ€๋Šฅ์ผ€ ํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์˜จ-๋””๋ฐ”์ด์Šค ์ธ๊ณต ์‹ ๊ฒฝ๋ง ์—ฐ์‚ฐ ๊ฐ€์†๊ธฐ์˜ ๊ณฑ์…ˆ-๋ˆ„์ ํ•ฉ ์—ฐ์‚ฐ๊ธฐ(MAC)์— ๋Œ€ํ•ด ์ •๋ฐ€๋„ ์–‘์žํ™” ๊ธฐ๋ฒ• ์ ์šฉ ๊ณผ์ •์—์„œ ํŒŒ์ƒํ•œ ๋‘ ๊ฐ€์ง€ ํŠน์ •ํ•œ ์ตœ์ ํ™” ๋ฌธ์ œ์— ๋Œ€ํ•ด ๋…ผ์˜ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ๋‚ฎ์€ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ์ด ๋Œ€๋‹ค์ˆ˜๋ฅผ ์ฐจ์ง€ํ•˜๋„๋ก ์ค€๋น„๋œ ๋‹ค์ค‘ ์ •๋ฐ€๋„๊ฐ€ ์ ์šฉ๋œ ๋ชจ๋ธ์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ์„ ๋œ MAC ์—ฐ์‚ฐ ์œ ๋‹› ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ๋‹ค์Œ ๋‘ ๊ฐ€์ง€ ๊ธฐ์—ฌ์ ์„ ์ œ์•ˆํ•œ๋‹ค: (1) ์ œ์•ˆํ•œ ๋‘ ๊ฐ€์ง€ ์ •๋ฐ€๋„ ๋ชจ๋“œ๋ฅผ ์ง€์›ํ•˜๋Š” MAC ์œ ๋‹› ๊ตฌ์กฐ๋Š” ๋‚ฎ์€ ์ •๋ฐ€๋„ ๋ฐ์ดํ„ฐ๋ฅผ ์—ฐ์‚ฐํ•  ๋•Œ ์œ ๋‹›์˜ ์—ฐ์‚ฐ ํšŒ๋กœ๋ฅผ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•˜๋„๋ก ์„ค๊ณ„๋˜๋ฉฐ, ๋‚ฎ์€ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ ๋น„์œจ์ด ๋Œ€๋‹ค์ˆ˜๋ฅผ ์ฐจ์ง€ํ•˜๋Š” ๋‹ค์ค‘ ์ •๋ฐ€๋„ ์—ฐ์‚ฐ ๋ชจ๋ธ์— ๋” ๋†’์€ ์—ฐ์‚ฐ ํšจ์œจ์„ ์ œ๊ณตํ•œ๋‹ค; (2) ์—ฐ์‚ฐ ๋Œ€์ƒ CNN ๋„คํŠธ์›Œํฌ์— ๋Œ€ํ•ด, MAC ์œ ๋‹›์˜ ๋‚ด๋ถ€ ๊ณฑ์…ˆ๊ธฐ์˜ `๊ฒฝ์ œ์ ์ธ' (๋น„ํŠธ) ํฌ๊ธฐ๋ฅผ ํƒ์ƒ‰ํ•˜๊ธฐ ์œ„ํ•œ ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ, ์ „์ฒด ๋„คํŠธ์›Œํฌ ๋ ˆ์ด์–ด๋ฅผ ์—ฐ์‚ฐ ๋Œ€์ƒ์œผ๋กœ ํ•˜์—ฌ ์—ฐ์‚ฐ ๋น„์šฉ๊ณผ ์—๋„ˆ์ง€ ๋น„์šฉ ํ•ญ์œผ๋กœ ๋‚˜ํƒ€๋ƒˆ๋‹ค. ๋„๋ฆฌ ์•Œ๋ ค์ง„ AlexNet๊ณผ VGG-16 CNN ๋ชจ๋ธ์— ๋Œ€ํ•˜์—ฌ, ๊ทธ๋ฆฌ๊ณ  ๋‘ ๊ฐ€์ง€ ์‹คํ—˜ ์ƒ ์ •๋ฐ€๋„ ๊ตฌ์„ฑ์— ๋Œ€ํ•˜์—ฌ, ์‹คํ—˜ ๊ฒฐ๊ณผ ์ œ์•ˆํ•œ ์œ ๋‹›์ด ๊ธฐ์กด ์œ ๋‹› ๋Œ€๋น„ ๋‹จ์œ„ ๊ณฑ์…ˆ๋‹น ์—ฐ์‚ฐ ๋น„์šฉ์„ 4.68~30.3% ์ ˆ๊ฐํ•˜์˜€์œผ๋ฉฐ ์—๋„ˆ์ง€ ๋น„์šฉ์„ 43.3% ์ ˆ๊ฐํ•˜์˜€๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์Šคํ† ์บ์Šคํ‹ฑ ์ปดํ“จํŒ… (SC) ๊ธฐ๋ฐ˜ MAC ์—ฐ์‚ฐ ์œ ๋‹›์˜ ์—ฐ์‚ฐ ์‚ฌ์ดํด ์ ˆ๊ฐ์„ ์œ„ํ•œ ๊ธฐ๋ฒ• ๋ฐ ์—ฐ๊ด€๋œ ํ•˜๋“œ์›จ์–ด ์œ ๋‹› ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. FSM์œผ๋กœ ์ œ์–ด๋˜๋Š” MUX๋ฅผ ํ†ตํ•ด ์ž…๋ ฅ ์ด์ง„์ˆ˜์—์„œ ๋งŒ๋“  ๋น„ํŠธ ์ˆ˜์—ด์„ ์„ธ์–ด MAC ์—ฐ์‚ฐ์„ ๊ตฌํ˜„ํ•˜๋Š” MUX-FSM ๊ธฐ๋ฐ˜ SC๋Š” ๊ธฐ์กด ์Šคํ† ์บ์Šคํ‹ฑ ์ˆซ์ž ์ƒ์„ฑ๊ธฐ ๊ธฐ๋ฐ˜ SC ๋Œ€๋น„ ํ•˜๋“œ์›จ์–ด ๋น„์šฉ์„ ์ƒ๋‹นํžˆ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ MUX-FSM ๊ธฐ๋ฐ˜ SC๋Š” ํšจ์œจ์ ์ธ ํ•˜๋“œ์›จ์–ด ๊ตฌํ˜„๊ณผ ๋ณ„๊ฐœ๋กœ ์—ฌ์ „ํžˆ ๋‹ค์ˆ˜์˜ ์—ฐ์‚ฐ ์‚ฌ์ดํด์„ ์š”๊ตฌํ•˜์—ฌ ์˜จ-๋””๋ฐ”์ด์Šค ์‹ ๊ฒฝ๋ง ์—ฐ์‚ฐ๊ธฐ์— ์ ์šฉ๋˜๊ธฐ ์–ด๋ ค์› ๋‹ค. ๋˜ํ•œ, ๊ธฐ์กด์— ์ œ์•ˆ๋œ ๋Œ€์•ˆ์€ ์ œ๊ฐ๊ธฐ ์ ˆ๊ฐ ํšจ๊ณผ์— ํ•œ๊ณ„๊ฐ€ ์žˆ๊ฑฐ๋‚˜ ๋ชจ๋ธ ๋ณ€์ˆ˜ ๋ณ€ํ™˜ ๋น„์šฉ์ด ์žˆ๋Š” ๋“ฑ ํ•œ๊ณ„์ ์ด ์žˆ์—ˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด MUX-FSM ๊ธฐ๋ฐ˜ SC์˜ ์ถ”๊ฐ€ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. MUX-FSM ๊ธฐ๋ฐ˜ SC์˜ ๋น„ํŠธ ์ง‘๊ณ„ ํŒจํ„ด์„ ํŒŒ์•…ํ•˜๊ณ , ์ค‘๋ณต ์ง‘๊ณ„๋ฅผ ์‹œํ”„ํŠธ ์—ฐ์‚ฐ์œผ๋กœ ๊ต์ฒดํ•˜์˜€๋‹ค. ์ด๋กœ๋ถ€ํ„ฐ ํ•„์š” ๋น„ํŠธ ํŒจํ„ด์˜ ๊ธธ์ด๋ฅผ ํฌ๊ฒŒ ์ค„์ด๋ฉฐ, ๊ณฑ์…ˆ ์—ฐ์‚ฐ ์ค‘ ์ตœ์•…์˜ ๊ฒฝ์šฐ์˜ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์„ ์ด๋ก ์ ์œผ๋กœ 2๋ฐฐ ์ด์ƒ ํ–ฅ์ƒํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ์—์„œ ์ œ์•ˆํ•œ ๊ฐœ์„ ๋œ SC ๊ธฐ๋ฒ•์ด ๊ธฐ์กดMUX-FSM ๊ธฐ๋ฐ˜ SC ๋Œ€๋น„ ํ‰๊ท  ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์„ 38.8% ์ค„์ผ ์ˆ˜ ์žˆ์—ˆ๋‹ค.1 INTRODUCTION 1 1.1 Neural network accelerator and its optimizations 1 1.2 Necessity of optimizing computational block of neural network accelerator 5 1.3 Contributions of This Dissertation 7 2 MAC Design Considering Mixed Precision 9 2.1 Motivation 9 2.2 Internal Multiplier Size Determination 14 2.3 Proposed hardware structure 16 2.4 Experiments 21 2.4.1 Implementation of Reference MAC units 23 2.4.2 Area, Wirelength, Power, Energy, and Performance of MAC units for AlexNet 24 2.4.3 Area, Wirelength, Power, Energy, and Performance of MAC units for VGG-16 31 2.4.4 Power Saving by Clock Gating 35 3 Speeding up MUX-FSM based Stochastic Computing Unit Design 37 3.1 Motivations 37 3.1.1 MUX-FSM based SC and previous enhancements 42 3.2 The Proposed MUX-FSM based SC 48 3.2.1 Refined Algorithm for Stochastic Computing 48 3.3 The Supporting Hardware Architecture 55 3.3.1 Bit Counter with shift operation 55 3.3.2 Controller 57 3.3.3 Combining with preceding architectures 58 3.4 Experiments 59 3.4.1 Experiments Setup 59 3.4.2 Generating input bit selection pattern 60 3.4.3 Performance Comparison 61 3.4.4 Hardware Area and Energy Comparison 63 4 CONCLUSIONS 67 4.1 MAC Design Considering Mixed Precision 67 4.2 Speeding up MUX-FSM based Stochastic Computing Unit Design 68 Abstract (In Korean) 73Docto
    • โ€ฆ
    corecore