39 research outputs found

    The implementation and applications of multiple-valued logic

    Get PDF
    Multiple-Valued Logic (MVL) takes two major forms. Multiple-valued circuits can implement the logic directly by using multiple-valued signals, or the logic can be implemented indirectly with binary circuits, by using more than one binary signal to represent a single multiple-valued signal. Techniques such as carry-save addition can be viewed as indirectly implemented MVL. Both direct and indirect techniques have been shown in the past to provide advantages over conventional arithmetic and logic techniques in algorithms required widely in computing for applications such as image and signal processing. It is possible to implement basic MVL building blocks at the transistor level. However, these circuits are difficult to design due to their non binary nature. In the design stage they are more like analogue circuits than binary circuits. Current integrated circuit technologies are biased towards binary circuitry. However, in spite of this, there is potential for power and area savings from MVL circuits, especially in technologies such as BiCMOS. This thesis shows that the use of voltage mode MVL will, in general not provide bandwidth increases on circuit buses because the buses become slower as the number of signal levels increases. Current mode MVL circuits however do have potential to reduce power and area requirements of arithmetic circuitry. The design of transistor level circuits is investigated in terms of a modern production technology. A novel methodology for the design of current mode MVL circuits is developed. The methodology is based upon the novel concept of the use of non-linear current encoding of signals, providing the opportunity for the efficient design of many previously unimplemented circuits in current mode MVL. This methodology is used to design a useful set of basic MVL building blocks, and fabrication results are reported. The creation of libraries of MVL circuits is also discussed. The CORDIC algorithm for two dimensional vector rotation is examined in detail as an example for indirect MVL implementation. The algorithm is extended to a set of three dimensional vector rotators using conventional arithmetic, redundant radix four arithmetic, and Taylor's series expansions. These algorithms can be used for two dimensional vector rotations in which no scale factor corrections are needed. The new algorithms are compared in terms of basic VLSI criteria against previously reported algorithms. A pipelined version of the redundant arithmetic algorithm is floorplanned and partially laid out to give indications of wiring overheads, and layout densities. An indirectly implemented MVL algorithm such as the CORDIC algorithm described in this thesis would clearly benefit from direct implementation in MVL

    Mixed radix design flow for security applications

    Get PDF
    The purpose of secure devices, such as smartcards, is to protect sensitive information against software and hardware attacks. Implementation of the appropriate protection techniques often implies non-standard methods that are not supported by the conventional design tools. In the recent decade the designers of secure devices have been working hard on customising the workflow. The presented research aims at collecting the up-to-date experiences in this area and create a generic approach to the secure design flow that can be used as guidance by engineers. Well-known countermeasures to hardware attacks imply the use of specific signal encodings. Therefore, multi-valued logic has been considered as a primary aspect of the secure design. The choice of radix is crucial for multi-valued logic synthesis. Practical examples reveal that it is not always possible to find the optimal radix when taking into account actual physical parameters of multi-valued operations. In other words, each radix has its advantages and disadvantages. Our proposal is to synthesise logic in different radices, so it could benefit from their combination. With respect to the design opportunities of the existing tools and the possibilities of developing new tools that would fill the gaps in the flow, two distinct design approaches have been formed: conversion driven design and pre-synthesis. The conversion driven design approach takes the outputs of mature and time-proven electronic design automation (EDA) synthesis tools to generate mixed radix datapath circuits in an endeavour to investigate the added relative advantages or disadvantages. An algorithm underpinning the approach is presented and formally described together with secure gate-level implementations. The obtained results are reported showing an increase in power consumption, thus giving further motivation for the second approach. The pre-synthesis approach is aimed at improving the efficiency by using multivalued logic synthesis techniques to produce an abstract component-level circuit before mapping it into technology libary. Reed-Muller expansions over Galois field arithmetic have been chosen as a theoretical foundation for this approach. In order to enable the combination of radices at the mathematical level, the multi-valued Reed-Muller expansions have been developed into mixed radix Reed-Muller expansions. The goals of the work is to estimate the potential of the new approach and to analyse its impact on circuit parameters down to the level of physical gates. The benchmark results show the approach extends the search space for optimisation and provides information on how the implemented functions are related to different radices. The theory of two-level radix models and corresponding computation methods are the primary theoretical contribution. It has been implemented in RMMixed tool and interfaced to the standard EDA tools to form a complete security-aware design flow.EThOS - Electronic Theses Online ServiceEPSRCGBUnited Kingdo

    Algorithms and architectures for decimal transcendental function computation

    Get PDF
    Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches

    Optimisations arithmétiques et synthèse de haut niveau

    Get PDF
    High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthèse de haut-niveau (HLS), de nombreuses optimisations arithmétiques n'y sont pas encore implémentées. Cette thèse propose des optimisations arithmétiques se servant du contexte spécifique dans lequel les opérateurs sont instanciés.Certaines optimisations sont de simples spécialisations d'opérateurs, respectant la sémantique du C.D'autres nécéssitent de s'éloigner de cette sémantique pour améliorer le compromis précision/coût/performance.Cette proposition est démontré sur des sommes de produits de nombres flottants.La somme est réalisée dans un format en virgule-fixe défini par son contexte.Quand trop peu d’informations sont disponibles pour définir ce format en virgule-fixe, une stratégie est de générer un accumulateur couvrant l'intégralité du format flottant.Cette thèse explore plusieurs implémentations d'un tel accumulateur.L'utilisation d'une représentation en complément à deux permet de réduire le chemin critique de la boucle d'accumulation, ainsi que la quantité de ressources utilisées. Un format alternatif aux nombres flottants, appelé posit, propose d'utiliser un encodage à précision variable.De plus, ce format est augmenté par un accumulateur exact.Pour évaluer précisément le coût matériel de ce format, cette thèse présente des architectures d'opérateurs posits, implémentés avec le même degré d'optimisation que celui de l'état de l'art des opérateurs flottants.Une analyse détaillée montre que le coût des opérateurs posits est malgré tout bien plus élevé que celui de leurs équivalents flottants.Enfin, cette thèse présente une couche de compatibilité entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothèque implémente un type d'entiers de taille variable, avec de plus une sémantique strictement typée, ainsi qu'un ensemble d'opérateurs ad-hoc optimisés

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

    Get PDF
    The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    ASIC implementations of the Viterbi Algorithm

    Get PDF

    Evolutionary design of digital VLSI hardware

    Get PDF
    corecore