Search CORE

19 research outputs found

HIGH-SPEED CO-PROCESSORS BASED ON REDUNDANT NUMBER SYSTEMS

Author: Kaivani Amir
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

There is a growing demand for high-speed arithmetic co-processors for use in applications with computationally intensive tasks. For instance, Fast Fourier Transform (FFT) co-processors are used in real-time multimedia services and financial applications use decimal co-processors to perform large amounts of decimal computations. Using redundant number systems to eliminate word-wide carry propagation within interim operations is a well-known technique to increase the speed of arithmetic hardware units. Redundant number systems are mostly useful in applications where many consecutive arithmetic operations are performed prior to the final result, making it advantageous for arithmetic co-processors. This thesis discusses the implementation of two popular arithmetic co-processors based on redundant number systems: namely, the binary FFT co-processor and the decimal arithmetic co-processor. FFT co-processors consist of several consecutive multipliers and adders over complex numbers. FFT architectures are implemented based on fixed-point and floating-point arithmetic. The main advantage of floating-point over fixed-point arithmetic is the wide dynamic range it introduces. Moreover, it avoids numerical issues such as scaling and overflow/underflow concerns at the expense of higher cost. Furthermore, floating-point implementation allows for an FFT co-processor to collaborate with general purpose processors. This offloads computationally intensive tasks from the primary processor. The first part of this thesis, which is devoted to FFT co-processors, proposes a new FFT architecture that uses a new Binary-Signed Digit (BSD) carry-limited adder, a new floating-point BSD multiplier and a new floating-point BSD three-operand adder. Finally, a new unit labeled as Fused-Dot-Product-Add (FDPA) is designed to compute AB+CD+E over floating-point BSD operands. The second part of the thesis discusses decimal arithmetic operations implemented in hardware using redundant number systems. These operations are popularly used in decimal floating-point co-processors. A new signed-digit decimal adder is proposed along with a sequential decimal multiplier that uses redundant number systems to increase the operational frequency of the multiplier. New redundant decimal division and square-root units are also proposed. The architectures proposed in this thesis were all implemented using Hardware-Description-Language (Verilog) and synthesized using Synopsys Design Compiler. The evaluation results prove the speed improvement of the new arithmetic units over previous pertinent works. Consequently, the FFT and decimal co-processors designed in this thesis work with at least 10% higher speed than that of previous works. These architectures are meant to fulfill the demand for the high-speed co-processors required in various applications such as multimedia services and financial computations

eCommons@USASK

University of Saskatchewan Research Archive

Single-Precision and Double-Precision Merged Floating-Point Multiplication and Addition Units on FPGA

Author: Zhang Hao
Publication venue: 'University of Saskatchewan Library'
Publication date: 11/02/2020
Field of study

Floating-point (FP) operations defined in IEEE 754-2008 Standard for Floating-Point Arithmetic can provide wider dynamic range and higher precision than fixed-point operations. Many scientific computations and multimedia applications adopt FP operations. Among all the FP operations, addition and multiplication are the most frequent operations. In this thesis, the single-precision (SP) and double-precision (DP) merged FP multiplier and FP adder architectures are proposed. The proposed efficient iterative FP multiplier is designed based on the Karatsuba algorithm and implemented with the pipelined architecture. It can accomplish two parallel SP multiplication operations in one iteration with a latency of 6 clock cycles or one DP multiplication operation in two iterations with a latency of 9 clock cycles. Implemented on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device, the proposed multiplier runs at 348 MHz using 6 DSP48E blocks, 1117 LUTs, and 1370 FFs. Compared to previous FPGA based multiple-precision FP multiplier, the proposed designs runs at 4% faster clock frequency with reduction of 33% of DSP blocks, 17% latency for SP multiplication, and 28% latency for DP multiplication. The proposed high performance FP adder is designed based one the two-path FP addition algorithm. With fully pipelined architecture, the proposed adder can accomplish one DP or two parallel SP addition/subtraction operations in 6 clock cycles. The proposed adder architecture is implemented on both Altera and Xilinx 65nm process FPGA devices. The proposed adder can run up to 336 MHz with 1694 FFs, 1420 LUTs on Xilinx Virtex-5 (xc5vlx155ff1760-3) FPGA device. Compared to the combination of one DP and two SP architecture built with Xilinx FP operator, the proposed adder has 11.3% faster clock frequency. On Altera Stratix-III (EP3SL340F1760C2) FPGA device, the maximum clock frequency of the proposed adder can reach 358 MHz and 1686 ALUTs and 1556 registers are occupied. The proposed adder is 11.6% faster than the combination of one DP and two SP architecture built with Altera FP megafunction. For the reference of other researchers, the implementation results of the proposed FP multiplier and FP adder on the latest Xilinx Virtex-7 device and Altera Arria 10 device are also provided

University of Saskatchewan Research Archive

IEEE Compliant Double-Precision FPU and 64-bit ALU with Variable Latency Integer Divider

Author: Williams Ryan D
Publication venue: RIT Scholar Works
Publication date: 01/01/2007
Field of study

Together the arithmetic logic unit (ALU) and floating-point unit (FPU) perform all of the mathematical and logic operations of computer processors. Because they are used so prominently, they fall in the critical path of the central processing unit - often becoming the bottleneck, or limiting factor for performance. As such, the design of a high-speed ALU and FPU is vital to creating a processor capable of performing up to the demanding standards of today\u27s computer users. In this paper, both a 64-bit ALU and a 64-bit FPU are designed based on the reduced instruction set computer architecture. The ALU performs the four basic mathematical operations - addition, subtraction, multiplication and division - in both unsigned and two\u27s complement format, basic logic operations and shifting. The division algorithm is a novel approach, using a comparison multiples based SRT divider to create a variable latency integer divider. The floating-point unit performs the double-precision floating-point operations add, subtract, multiply and divide, in accordance with the IEEE 754 standard for number representation and rounding. The ALU and FPU were implemented in VHDL, simulated in ModelSim, and constrained and synthesized using Synopsys Design Compiler (2006.06). They were synthesized using TSMC 0.1 3nm CMOS technology. The timing, power and area synthesis results were recorded, and, where applicable, compared to those of the corresponding DesignWare components.The ALU synthesis reported an area of 122,215 gates, a power of 384 mW, and a delay of 2.89 ns - a frequency of 346 MHz. The FPU synthesis reported an area 84,440 gates, a delay of 2.82 ns and an operating frequency of 355 MHz. It has a maximum dynamic power of 153.9 mW

RIT Scholar Works

Proceedings of the 7th Conference on Real Numbers and Computers (RNC'7)

Author: Hanrot Guillaume
Zimmermann Paul
Publication venue: None.
Publication date: 01/01/2006
Field of study

These are the proceedings of RNC7

INRIA a CCSD electronic archive server

HAL-Rennes 1

Yapay sinir ağlarının uyarlanabilir donanımsal yapılarda gerçeklenmesi

Author: Çetin Onursal
Publication venue: 'Sakarya Universitesi Ilahiyat Fakultesi Dergisi'
Publication date: 01/01/2014
Field of study

06.03.2018 tarihli ve 30352 sayılı Resmi Gazetede yayımlanan “Yükseköğretim Kanunu İle Bazı Kanun Ve Kanun Hükmünde Kararnamelerde Değişiklik Yapılması Hakkında Kanun” ile 18.06.2018 tarihli “Lisansüstü Tezlerin Elektronik Ortamda Toplanması, Düzenlenmesi ve Erişime Açılmasına İlişkin Yönerge” gereğince tam metin erişime açılmıştır.Yapay Sinir Ağları (YSA'lar), biyolojik sinir sistemine dayalı elektronik modellerdir. YSA'lar girişlerden gelen verileri işleyen birbirine bağlı yapay nöronlardan oluşmaktadır. Bu mimariler, yazılım ya da donanım olarak gerçekleştirilebilirler. YSA'nın yazılım uygulamasının avantajı, tasarımcının YSA bileşenlerinin iç işleyişini bilmesine gerek olmamasıdır. Bununla birlikte, gerçek zamanlı uygulamalarda, yazılım tabanlı YSA'lar donanım tabanlı YSA'lardan daha yavaştır. YSA hesaplamaları paralel olarak gerçekleştirilmektedir ve paralel işlem için özel donanım aygıtları gereklidir. Birçok alandan araştırmacılar optimizasyon, sınıflandırma, kontrol, görüntü işleme vb. problemlerin çözümü için YSA donanım uygulamaları gerçekleştirmişlerdir. Bu uygulamalar, YSA'ların paralel doğasından yararlanmak için farklı türde cihazlar üzerinde gerçekleştirilmiştir. YSA'nın FPGA uygulamaları, yeniden yapılandırılabilir yapısı ve paralel mimarisi nedeniyle son yirmi yılda büyük ilgi uyandırmıştır. Bu tez çalışmasında, Quartus II şematik tasarım kullanılarak eğitilebilir çok katmanlı sinir ağı (MLNN) yapısının donanım uygulaması FPGA üzerinde tamamen kombinasyonel mantık olarak gerçekleştirilmiştir. Yapay sinir ağını eğitmek için eğim düşüm metodunu kullanan geri yayılım algoritması uygulanmıştır. Nümerik tanımlama için IEEE tek-hassasiyetli kayan-noktalı format kullanılmıştır. Bu çalışma aynı zamanda IEEE tek-hassasiyetli kayan-noktalı format ile tam uyumlu hızlı bir kayan noktalı toplayıcı, bir paralel çarpıcı ve bir sigmoid aktivasyon fonksiyonu bloğunu sunmaktadır. İşlemleri paralel olarak gerçekleştiren toplayıcı, paralel çarpıcı ve aktivasyon fonksiyonu bloğu tamamen kombinasyonel mantık olarak tasarlanmıştır. Bu yeni tasarımda, gecikmeyi azaltmak için kaydırma işlemlerinde kaydırmalı yazmaçlar yerine üç-durumlu tampon serileri kullanılmıştır. Üç-durumlu tampon serileri kullanıldığından kaydırma işlemi için saat darbesi gerekli değildir ve böylece sonuç tek bir çevrimde üretilir. Sadece kapı gecikmesi maliyetli önerilen tasarım, YSA'nın donanım uygulamaları için uygundur.Artificial Neural Networks (ANNs) are electronic models based on biological nervous system. ANNs are made up of interconnected artificial neurons which can process values from inputs. These architectures can be implemented either in software or in hardware. The advantage of the software implementation of ANN is that the designer does not need to know the inner workings of ANN components. However, in real time applications, software based ANNs are slower than hardware based ANNs. ANN computations are carried out in parallel and special hardware devices are required for parallel processing. Researchers from many disciplines have been performing ANN hardware implementations to solve a variety of problems in optimization, classification, control, image processing etc. These applications have been performed on different types of devices to take advantage of the parallel nature inherent to ANNs. FPGA implementations of ANN have aroused great interest during the last two decades due to its reconfigurable structure and parallel architecture. In this thesis, hardware implementation of trainable Multi Layer Neural Network (MLNN) structure on FPGA (Field Programmable Gate Array) is realized as entirely combinational logic by using Quartus II schematic design. The back propagation algortihm, which uses gradient descent metod is implemented in order to train the neural network. IEEE single-precision floating-point format is used for numerical description. This study also presents the hardware designs of a fast floating point adder, a parallel multiplier and a sigmoid activation function block that are fully compliant with the IEEE single-precision floating-point format. The adder, parallel multiplier and the activation function block are designed as entirely combinational logic that perform operations in parallel. In this novel design, tri state buffer series are used for shifting operations instead of shift registers for reducing latency. Because the use of tri-state buffer series, clock pulse is not required for shifting and thus the result is generated in only a single clock-cycle. The proposed design is suitable for hardware implementation of ANN at the cost of gate delays only

Sakarya Üniversitesi Kurumsal Açık Akademik Arşivi

Yapay sinir ağlarının uyarlanabilir donanımsal yapılarda gerçeklenmesi

Author: Çetin Onursal
Publication venue: 'Sakarya Universitesi Ilahiyat Fakultesi Dergisi'
Publication date: 01/01/2014
Field of study

Sakarya Üniversitesi Kurumsal Açık Akademik Arşivi

Many-core architectures with time predictable execution Support for hard real-time applications

Author: Kinsy Michel A
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 183-193).Hybrid control systems are a growing domain of application. They are pervasive and their complexity is increasing rapidly. Distributed control systems for future "Intelligent Grid" and renewable energy generation systems are demanding high-performance, hard real-time computation, and more programmability. General-purpose computer systems are primarily designed to process data and not to interact with physical processes as required by these systems. Generic general-purpose architectures even with the use of real-time operating systems fail to meet the hard realtime constraints of hybrid system dynamics. ASIC, FPGA, or traditional embedded design approaches to these systems often result in expensive, complicated systems that are hard to program, reuse, or maintain. In this thesis, we propose a domain-specific architecture template targeting hybrid control system applications. Using power electronics control applications, we present new modeling techniques, synthesis methodologies, and a parameterizable computer architecture for these large distributed control systems. We propose a new system modeling approach, called Adaptive Hybrid Automaton, based on previous work in control system theory, that uses a mixed-model abstractions and lends itself well to digital processing. We develop a domain-specific architecture based on this modeling that uses heterogeneous processing units and predictable execution, called MARTHA. We develop a hard real-time aware router architecture to enable deterministic on-chip interconnect network communication. We present several algorithms for scheduling task-based applications onto these types of heterogeneous architectures. We create Heracles, an open-source, functional, parameterized, synthesizable many-core system design toolkit, that can be used to explore future multi/many-core processors with different topologies, routing schemes, processing elements or cores, and memory system organizations. Using the Heracles design tool we build a prototype of the proposed architecture using a state-of-the-art FPGA-based platform, and deploy and test it in actual physical power electronics systems. We develop and release an open-source, small representative set of power electronics system applications that can be used for hard real-time application benchmarking.by Michel A. Kinsy.Ph.D

DSpace@MIT

Communication platform for inter-satellite links in distributed satellite systems

Author: Paul Jean R
Publication venue
Publication date: 01/01/2011
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

University of Surrey

Surrey Research Insight

OpenGrey Repository