145 research outputs found

    Properties of two's complement floating point notations

    Get PDF
    International audienceFew designs, mostly those of Texas Instruments, continue to use tworsquos complement floating point units. Such units are simpler to build and to validate, but they do not comply to the dominant IEEE standard for floating point arithmetic. We compare some properties of the two systems in this text. Some features are lost, but others remain unchanged. One strong example is the case of Sterbenzrsquos theorem and our recent extension. We show in the paper that the theorem and its extension hold for the tworsquos complement architecture. Still, users should ensure that results are large enough on circuits that do not implement gradual underflow. Theorems have been proven and validated using the Coq automatic proof checker

    A New Representation of the Rational Numbers for Fast Easy Arithmetic

    Full text link

    Formalization of Fixed-Point Arithmetic in HOL

    Get PDF
    This paper addresses the formalization in higher-order logic of fixed-point arithmetic. We encoded the fixed-point number system and specified the different quantization modes in fixed-point arithmetic such as the directed and even quantization modes. We also considered the formalization of exceptions detection and their handling like overflow and invalid operation. An error analysis is then performed to check the correctness of the quantized result after carrying out basic arithmetic operations, such as addition, subtraction, multiplication and division against their mathematical counterparts. Finally, we showed by an example how this formalization can be used to enable the verification of the transition from floating-point to fixed-point algorithmic level in the signal processing design flow

    Finite worldlength effects in fixed-point implementations of linear systems

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 173-194).by Vinay Mohta.M.Eng

    On the design of IEEE compliant floating point units and their quantitative analysis

    Get PDF
    Abstract this thesis addresses the question of which are the important issues in the design of a high-speed floating-point unit (FPU) that is fully compliant with the IEEE floating-point standard 754-1985 [19]. There are a few choices that need to be made when designing an IEEE compliant FPU, among them: the internal representation of floating-point numbers, the rounding algorithms, handling of denormal results, usage of the same rounding hardware for different units (e.g. adder, multiplier, divider), and the implementations of the adder, the multiplier and the divider. These choices influence both the cost and the performance of the FPU. Nevertheless, these issues have not been discussed in the open literature todate. This work begins to fill this gap by designing, analyzing and comparing 18 different IEEE compliant FPU implementations, that consider design options regarding: (a) the internal representation of floating-point numbers; (b) the rounding algorithms; (c) sharing of a rounding unit, the implementation of gradual step rounding or the implementation of dedicated rounding units for each functional unit; (d) the implementation of the floating-point multiplier; and (e) the implementation of the floating-point divider. The presented FPU designs make also use of the following innovations, that were developed in the context of this work: (a) a fast implementation of variable position rounding integrated into a FP multiplier [37]; (b) to the best of our knowledge the fastest integrated FP addition and rounding algorithm published todate [40], (c) the fastest FP multiplication rounding algorithm published todate [11, 12] and (d) the fastest linear reciprocal approximation implementation published todate. [36, 39]; (e) an efficient integration of single and double precision rounding [9]; (f) a Booth encoded adder-tree with an improved cost formula [30]. All the FPUs designed in this work are fully compliant with the IEEE standard for all implemented operations, support both single and double precision, and deal with denormal values and special cases in hardware. Because to design an IEEE compliant FPU is a complex and error-prone task, all the FPU designs are specified in full detail at gate level and the correctness of the FPU designs (in particular the compliance with the IEEE standard) is proven. The proposed FPU implementations are analyed and compared regarding the hardware cost, the cycle time and the performance that they achieve on traces of the SPECfp92 benchmark suite [17] integrated into a pipelined RISC processor from [23]. In this quantitative analysis [38] it is demonstrated that the choice of the rounding architecture in the FPU has a larger impact on the performance of the microprocessor than the choice of the FP multiplication or the FP division implementation. In comparison to this the impact of the rounding architecture choice on the cost is relatively small. The rounding architecture that uses dedicated rounding units provides the best performance with only small additional cost, so that this rounding architecture seems to be the best choice in floating-point implementations. The fast implementation of this rounding architecture is only made possible by the fast variable position rounding implementation for multipliers from [37]. This underlines the importance of this technique.In dieser Arbeit wird der Frage nachgegangen, welches die wichtigsten Designentscheidungen bei der Implementierung einer schnellen Gleitkommaeinheit (FPU), die dem IEEE Standard 754-1985 [19] genügt, sind. Es gibt verschiedene Entscheidungen, die beim Entwurf einer IEEE konformen FPU getroffen werden müssen, darunter: die internen Darstellungen der Gleitkomma-(FP) Zahlen, die Rundungsalgorithmen, die Art der Behandlung von denormalisierten Ergebnissen, die Mehrfachverwendung von Teilen der Hardware, wie z.B. die Benutzung derselben Rundungshardware für verschiedene Einheiten, und die Implementierungen des FP Addierers, des FP Multiplizierers und des FP Dividierers. Diese Entscheidungen beeinflussen sowohl die Kosten alsauch die Leistung der FPU. Nichtsdestotrotz wurden diese Entscheidungen bislang nicht in der Literatur diskutiert. Die vorliegende Arbeit setzt in dieser Lücke an. Es werden 18 unterschiedliche FPUs vorgestellt, analysiert und verglichen, die Optionen zu den folgenden Entscheidungen betrachten: (a) interne Darstellung der FP Zahlen; (b) Rundungsalgorithmen; (c) Gemeinsame Nutzung einer allgemeinen Rundungseinheit, Aufteilen des Rundens in mehrere Schritte und gemeinsame Realisierung einer Teilmenge dieser Schritte oder vollständige eigene Implementierung des Rundens für jede Funktionseinheit; (d) Implementierung des FP Multiplizierers; (e) Implementierung des FP Dividierers. Die vorgestellten FPU Designs benutzen darüberhinaus folgende Neuerungen, die im Rahmen dieser Arbeit entstanden sind: (a) eine schnelle Rundungsimplementierung für den FP Multiplizierer mit variabler Rundungsposition [37]; (b) nach unserem besten Wissen den bisher schnellsten publizierten Algorithmus zum Addieren und Runden von FP Zahlen [40], (c) den bisher schnellsten publizierten Algorithmus zum Runden bei der FP Multiplikation [11, 12] und (d) die bisher schnellste publizierte Implementierung einer linearen Approximation von Reziproken [36, 39]; (e) eine effiziente Integration des Rundens in single precision und double precision [9]; (f) einen Booth-Multiplizierer mit verringerten Kosten [30]. Alle entworfenen FPUs sind für alle implementierten Operationen vollständig konform zum IEEE FP Standard 754, unterstützen sowohl single alsauch double precision Zahlen, und behandeln selbst denormalisierte Ergebnisse und Spezialfälle in Hardware. Weil der Entwurf von IEEE konformen FPUs eine komplexe und fehleranfällige Aufgabe ist, werden sämtliche entworfenen FPUs detailiert auf Gatterebene spezifiziert und ihre Korrektheit (insbesondere die Konformität zum IEEE FP Standard 754) bewiesen. Die vorgestellten FPU Implementierungen werden bezüglich der Hardwarekosten, der Zykluszeit und der Leistung, die sie integriert in einen gepipelinten RISC Processor aus [23] auf Traces der SPECfp92 Benchmark Suite erbringen, analysiert und verglichen. In dieser quantitativen Analyse (siehe auch [38]) wird demonstriert, daß die Auswahl der Rundungs-Architektur einer FPU einen größeren Einfluß auf die Prozessorleistung hat als die Auswahl der Implementierung der FP Multiplikation oder der FP Division. Im Gegensatz dazu ist der Einfluß der Auswahl einer Rundungs-Architektur der FPU auf die Hardwarekosten vergleichsweise gering. Die Rundungs-Architektur, die vollständige eigene Rundungsimplementierungen für jede Funktionseinheit benutzt, liefert bei weitem die beste Leistung und ist lediglich geringfügig teurer als Varianten mit anderen Rundungs-Architekturen. Demzufolge scheint diese Rundungs-Architektur die beste Wahl in FP Implementierungen zu sein. Die schnelle Implementierung dieser Rundungs-Architektur wurde erst durch die schnelle Rundungsimplementierung für FP Multiplizierer mit variabler Rundungsposition nach [37] ermöglicht. Das unterstreicht die Bedeutung dieser Technik

    Optimization of Lyapunov invariants in analysis and implementation of safety-critical software systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2008.Includes bibliographical references (leaves 168-176).This dissertation contributes to two major research areas in safety-critical software systems, namely, software analysis, and software implementation. In reference to the software analysis problem, the main contribution of the dissertation is the development of a novel framework, based on Lyapunov invariants and convex optimization, for verification of various safety and performance specifications for software systems. The enabling elements of the framework for software analysis are: (i) dynamical system interpretation and modeling of computer programs, (ii) Lyapunov invariants as behavior certificates for computer programs, and (iii) a computational procedure for finding the Lyapunov invariants. (i) The view in this dissertation is that software defines a rule for iterative modification of the operating memory at discrete instances of time. Hence, it can be modeled as a discrete-time dynamical system with the program variables as the state variables, and the operating memory as the state space. Three specific modeling languages are introduced which can represent a broad range of computer programs of interest to the control community. These are: Mixed Integer-Linear Models, Graph Models, and Linear Models with Conditional Switching. (ii) Inspired by the concept of Lyapunov functions in stability analysis of nonlinear dynamical systems, Lyapunov invariants are introduced and proposed for analysis of behavioral properties, and verification of various safety and performance specifications for computer programs. In the same spirit as standard Lyapunov functions, a Lyapunov invariant is an appropriately defined function of the state which satisfies a difference inequality along the trajectories. It is shown that variations of Lyapunov invariants satisfying certain technical conditions can be formulated for verification of several common specifications.(cont.) These include but are not limited to: absence of overflow, absence of division-by-zero, termination in finite time, and certain user-specified program assertions. (iii) A computational procedure based on convex relaxation techniques and numerical optimization is proposed for finding the Lyapunov invariants that prove the specifications. The framework is complemented by the introduction of a notion of optimality for the graph models. This notion can be used for constructing efficient graph models that improve the analysis in a systematic way. It is observed that the application of the framework to (graph models of) programs that are semantically identical but syntactically different does not produce identical results. This suggests that the success or failure of the method is contingent on the choice of the graph model. Based on this observation, the concepts of graph reduction, irreducible graphs, and minimal and maximal realizations of graph models are introduced. Several new theorems that compare the performance of the original graph model of a computer program and its reduced offsprings are presented. In reference to the software implementation problem for safety-critical systems, the main contribution of the dissertation is the introduction of an algorithm, based on optimization of quadratic Lyapunov functions and semidefinite programming, for computing optimal state space implementations for digital filters. The particular implementation that is considered is a finite word-length implementation on a fixed-point processor with quantization before or after multiplication. The objective is to minimize the effects of finite word-length constraints on performance deviation while respecting the overflow limits. The problem is first formulated as a special case of controller synthesis where the controller has a specific structure, which is known to be a hard non-convex problem in general.(cont.) It is then shown that this special case can be convexified exactly and the optimal implementation can be computed by solving a semidefinite optimization problem. It is observed that the optimal state space implementation of a digital filter on a machine with finite memory, does not necessarily define the same transfer function as that of an ideal implementation.by Mardavij Roozbehani.Ph.D

    Extended update plans

    Get PDF
    Formal methods are gaining popularity as a way of increasing the reliability of systems through the use of mathematically based techniques. Their domain is no longer restricted to purely academic environments and examples, as they are slowly moving into industrial settings. The slow rate at which this transition takes place is mainly due to the perceived difficulty of formalising the behaviour of systems. While this is undoubtedly true, it is not the case with all formal methods. Update Plans are a powerful formalism for the description of computer architectures and intermediate to low-level languages. They are a declarative specification language with an underlying imperative machine model. The descriptions using Update Plans are clear, compact, intuitive, unambiguous and simple to read. These characteristics allow for the minimisation of possible errors at early stages of the development process even before a verification takes place. In this thesis an overview of the Update Plans formalism is given and a number of realworld applications is shown. The investigation of the application area focuses on computer architectures for which various specifications already exist. The comparison of Update Plan specifications to other specifications provides a useful insight into the strengths and shortcomings of the formalism. The shortcomings, in particular the lack of synchronisation primitives and modularity, are addressed by the development and evaluation of several syntactic and semantic extensions described in this thesis. The extended formalism is also compared to other specification languages and conclusions are drawn

    No Excess Babbage - Design Considerations for the Interface to a Systolic Matrix Processor

    Get PDF
    Pages 174-179 removed for reasons of commercial confidentiality (including print copy)Computational systems used for the solution of large matrix-based numerical problems are rapidly converging on the limits of technology, and novel architectures are now being sought to improve performance. In signal processing and control theory, high performance systems are required which do not contain the physical size penalty of current supercomputers. To achieve performance comparable with current supercomputers, a systolic processing array specifically targeted at matrix applications has been developed at the University of Adelaide. The work of this thesis involves the problem of delivering and receiving the data moving between the processing array and the memory subsystem. This involves the reformatting of existing algorithms to map efficiently onto the matrix array, the design and VLSI layout of a matrix address generation unit using signed digit arithmetic for enhanced performance, and the block level description of a multiport cached memory system. Performance estimates predict a modest configuration will perform selected matrix routines in excess of three GigaFLOPs.Thesis (MESc.) -- University of Adelaide, Department of Electrical and Electronic Engineering, 199

    A computer-aided design for digital filter implementation

    Get PDF
    Imperial Users onl

    Design and implementation of a downlink MC-CDMA receiver

    Get PDF
    Cette thèse présente une étude d'un système complet de transmission en liaison descendante utilisant la technologie multi-porteuse avec l'accès multiple par division de code (Multi-Carrier Code Division Multiple Access, MC-CDMA). L'étude inclut la synchronisation et l'estimation du canal pour un système MC-CDMA en liaison descendante ainsi que l'implémentation sur puce FPGA d'un récepteur MC-CDMA en liaison descendante en bande de base. Le MC-CDMA est une combinaison de la technique de multiplexage par fréquence orthogonale (Orthogonal Frequency Division Multiplexing, OFDM) et de l'accès multiple par répartition de code (CDMA), et ce dans le but d'intégrer les deux technologies. Le système MC-CDMA est conçu pour fonctionner à l'intérieur de la contrainte d'une bande de fréquence de 5 MHz pour les modèles de canaux intérieur/extérieur pédestre et véhiculaire tel que décrit par le "Third Genaration Partnership Project" (3GPP). La composante OFDM du système MC-CDMA a été simulée en utilisant le logiciel MATLAB dans le but d'obtenir des paramètres de base. Des codes orthogonaux à facteur d'étalement variable (OVSF) de longueur 8 ont été choisis comme codes d'étalement pour notre système MC-CDMA. Ceci permet de supporter des taux de transmission maximum jusquà 20.6 Mbps et 22.875 Mbps (données non codées, pleine charge de 8 utilisateurs) pour les canaux intérieur/extérieur pédestre et véhiculaire, respectivement. Une étude analytique des expressions de taux d'erreur binaire pour le MC-CDMA dans un canal multivoies de Rayleigh a été réalisée dans le but d'évaluer rapidement et de façon précise les performances. Des techniques d'estimation de canal basées sur les décisions antérieures ont été étudiées afin d'améliorer encore plus les performances de taux d'erreur binaire du système MC-CDMA en liaison descendante. L'estimateur de canal basé sur les décisions antérieures et utilisant le critère de l'erreur quadratique minimale linéaire avec une matrice' de corrélation du canal de taille 64 x 64 a été choisi comme étant un bon compromis entre la performance et la complexité pour une implementation sur puce FPGA. Une nouvelle séquence d'apprentissage a été conçue pour le récepteur dans la configuration intérieur/extérieur pédestre dans le but d'estimer de façon grossière le temps de synchronisation et le décalage fréquentiel fractionnaire de la porteuse dans le domaine du temps. Les estimations fines du temps de synchronisation et du décalage fréquentiel de la porteuse ont été effectués dans le domaine des fréquences à l'aide de sous-porteuses pilotes. Un récepteur en liaison descendante MC-CDMA complet pour le canal intérieur /extérieur pédestre avec les synchronisations en temps et en fréquence en boucle fermée a été simulé avant de procéder à l'implémentation matérielle. Le récepteur en liaison descendante en bande de base pour le canal intérieur/extérieur pédestre a été implémenté sur un système de développement fabriqué par la compagnie Nallatech et utilisant le circuit XtremeDSP de Xilinx. Un transmetteur compatible avec le système de réception a également été réalisé. Des tests fonctionnels du récepteur ont été effectués dans un environnement sans fil statique de laboratoire. Un environnement de test plus dynamique, incluant la mobilité du transmetteur, du récepteur ou des éléments dispersifs, aurait été souhaitable, mais n'a pu être réalisé étant donné les difficultés logistiques inhérentes. Les taux d'erreur binaire mesurés avec différents nombres d'usagers actifs et différentes modulations sont proches des simulations sur ordinateurs pour un canal avec bruit blanc gaussien additif
    • …
    corecore