2,011 research outputs found

    Optimisations arithmétiques et synthÚse de haut niveau

    Get PDF
    High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthĂšse de haut-niveau (HLS), de nombreuses optimisations arithmĂ©tiques n'y sont pas encore implĂ©mentĂ©es. Cette thĂšse propose des optimisations arithmĂ©tiques se servant du contexte spĂ©cifique dans lequel les opĂ©rateurs sont instanciĂ©s.Certaines optimisations sont de simples spĂ©cialisations d'opĂ©rateurs, respectant la sĂ©mantique du C.D'autres nĂ©cĂ©ssitent de s'Ă©loigner de cette sĂ©mantique pour amĂ©liorer le compromis prĂ©cision/coĂ»t/performance.Cette proposition est dĂ©montrĂ© sur des sommes de produits de nombres flottants.La somme est rĂ©alisĂ©e dans un format en virgule-fixe dĂ©fini par son contexte.Quand trop peu d’informations sont disponibles pour dĂ©finir ce format en virgule-fixe, une stratĂ©gie est de gĂ©nĂ©rer un accumulateur couvrant l'intĂ©gralitĂ© du format flottant.Cette thĂšse explore plusieurs implĂ©mentations d'un tel accumulateur.L'utilisation d'une reprĂ©sentation en complĂ©ment Ă  deux permet de rĂ©duire le chemin critique de la boucle d'accumulation, ainsi que la quantitĂ© de ressources utilisĂ©es. Un format alternatif aux nombres flottants, appelĂ© posit, propose d'utiliser un encodage Ă  prĂ©cision variable.De plus, ce format est augmentĂ© par un accumulateur exact.Pour Ă©valuer prĂ©cisĂ©ment le coĂ»t matĂ©riel de ce format, cette thĂšse prĂ©sente des architectures d'opĂ©rateurs posits, implĂ©mentĂ©s avec le mĂȘme degrĂ© d'optimisation que celui de l'Ă©tat de l'art des opĂ©rateurs flottants.Une analyse dĂ©taillĂ©e montre que le coĂ»t des opĂ©rateurs posits est malgrĂ© tout bien plus Ă©levĂ© que celui de leurs Ă©quivalents flottants.Enfin, cette thĂšse prĂ©sente une couche de compatibilitĂ© entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothĂšque implĂ©mente un type d'entiers de taille variable, avec de plus une sĂ©mantique strictement typĂ©e, ainsi qu'un ensemble d'opĂ©rateurs ad-hoc optimisĂ©s

    VLSI design of high-speed adders for digital signal processing applications.

    Get PDF

    Low Power Elliptic Curve Cryptography

    Get PDF
    This M.S. thesis introduces new modulus scaling techniques for transforming a class of primes into special forms which enable efficient arithmetic. The scaling technique may be used to improve multiplication and inversion in finite fields. We present an efficient inversion algorithm that utilizes the structure of a scaled modulus. Our inversion algorithm exhibits superior performance to the Euclidean algorithm and lends itself to efficient hardware implementation due to its simplicity. Using the scaled modulus technique and our specialized inversion algorithm we develop an elliptic curve processor architecture. The resulting architecture successfully utilizes redundant representation of elements in GF(p) and provides a low-power, high speed, and small footprint specialized elliptic curve implementation. We also introduce a unified Montgomery multiplier architecture working on the extension fields GF(p), GF(2) and GF(3). With the increasing research activity for identity based encryption schemes, there has been an increasing need for arithmetic operations in field GF(3). Since we based our research on low-power and small footprint applications, we designed a unified architecture rather than having a seperate hardware for GF{3}. To the best of our knowledge, this is the first time a unified architecture was built working on three different extension fields

    Implementação eficiente da Curve25519 para microcontroladores ARM

    Get PDF
    Orientador: Diego de Freitas AranhaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Com o advento da computação ubĂ­qua, o fenĂŽmeno da Internet das Coisas (de Internet of Things) farĂĄ que com inĂșmeros dispositivos conectem-se um com os outros, enquanto trocam dados muitas vezes sensĂ­veis pela sua natureza. Danos irreparĂĄveis podem ser causados caso o sigilo destes seja quebrado. Isso causa preocupaçÔes acerca da segurança da comunicação e dos prĂłprios dispositivos, que geralmente tĂȘm carĂȘncia de mecanismos de proteção contra interferĂȘncias fĂ­sicas e pouca ou nenhuma medida de segurança. Enquanto desenvolver criptografia segura e eficiente como um meio de prover segurança Ă  informação nĂŁo Ă© inĂ©dito, esse novo ambiente, com uma grande superfĂ­cie de ataque, tem imposto novos desafios para a engenharia criptogrĂĄfica. Uma abordagem segura para resolver este problema Ă© utilizar blocos bem conhecidos e profundamente analisados, tal como o protocolo Segurança da Camada de Transporte (de Transport Layer Security, TLS). Na Ășltima versĂŁo desse padrĂŁo, as opçÔes para Criptografia de Curvas ElĂ­pticas (de Elliptic Curve Cryptography - ECC) sĂŁo expandidas para alĂ©m de parĂąmetros estabelecidos por governos, tal como a proposta Curve25519 e protocolos criptogrĂĄficos relacionados. Esse trabalho pesquisa implementaçÔes seguras e eficientes de Curve25519 para construir um esquema de troca de chaves em um microcontrolador ARM Cortex-M4, alĂ©m do esquema de assinatura digital Ed25519 e a proposta de esquema de assinaturas digitais qDSA. Como resultado, operaçÔes de desempenho crĂ­tico, tal como o multiplicador de 256 bits, foram otimizadas; em particular, aceleração de 50% foi alcançada, impactando o desempenho de protocolos em alto nĂ­velAbstract: With the advent of ubiquitous computing, the Internet of Things will undertake numerous devices connected to each other, while exchanging data often sensitive by nature. Breaching the secrecy of this data may cause irreparable damage. This raises concerns about the security of their communication and the devices themselves, which usually lack tamper resistance mechanisms or physical protection and even low to no security mesures. While developing efficient and secure cryptography as a mean to provide information security services is not a new problem, this new environment, with a wide attack surface, imposes new challenges to cryptographic engineering. A safe approach to solve this problem is reusing well-known and thoroughly analyzed blocks, such as the Transport Layer Security (TLS) protocol. In the last version of this standard, Elliptic Curve Cryptography options were expanded beyond government-backed parameters, such as the Curve25519 proposal and related cryptographic protocols. This work investigates efficient and secure implementations of Curve25519 to build a key exchange protocol on an ARM Cortex-M4 microcontroller, along the related signature scheme Ed25519 and a digital signature scheme proposal called qDSA. As result, performance-critical operations, such as a 256-bit multiplier, are greatly optimized; in this particular case, a 50% speedup is achieved, impacting the performance of higher-level protocolsMestradoCiĂȘncia da ComputaçãoMestre em CiĂȘncia da ComputaçãoCAPESFuncam

    Decimal Floating-point Fused Multiply Add with Redundant Number Systems

    Get PDF
    The IEEE standard of decimal floating-point arithmetic was officially released in 2008. The new decimal floating-point (DFP) format and arithmetic can be applied to remedy the conversion error caused by representing decimal floating-point numbers in binary floating-point format and to improve the computing performance of the decimal processing in commercial and financial applications. Nowadays, many architectures and algorithms of individual arithmetic functions for decimal floating-point numbers are proposed and investigated (e.g., addition, multiplication, division, and square root). However, because of the less efficiency of representing decimal number in binary devices, the area consumption and performance of the DFP arithmetic units are not comparable with the binary counterparts. IBM proposed a binary fused multiply-add (FMA) function in the POWER series of processors in order to improve the performance of floating-point computations and to reduce the complexity of hardware design in reduced instruction set computing (RISC) systems. Such an instruction also has been approved to be suitable for efficiently implementing not only stand-alone addition and multiplication, but also division, square root, and other transcendental functions. Additionally, unconventional number systems including digit sets and encodings have displayed advantages on performance and area efficiency in many applications of computer arithmetic. In this research, by analyzing the typical binary floating-point FMA designs and the design strategy of unconventional number systems, ``a high performance decimal floating-point fused multiply-add (DFMA) with redundant internal encodings" was proposed. First, the fixed-point components inside the DFMA (i.e., addition and multiplication) were studied and investigated as the basis of the FMA architecture. The specific number systems were also applied to improve the basic decimal fixed-point arithmetic. The superiority of redundant number systems in stand-alone decimal fixed-point addition and multiplication has been proved by the synthesis results. Afterwards, a new DFMA architecture which exploits the specific redundant internal operands was proposed. Overall, the specific number system improved, not only the efficiency of the fixed-point addition and multiplication inside the FMA, but also the architecture and algorithms to build up the FMA itself. The functional division, square root, reciprocal, reciprocal square root, and many other functions, which exploit the Newton's or other similar methods, can benefit from the proposed DFMA architecture. With few necessary on-chip memory devices (e.g., Look-up tables) or even only software routines, these functions can be implemented on the basis of the hardwired FMA function. Therefore, the proposed DFMA can be implemented on chip solely as a key component to reduce the hardware cost. Additionally, our research on the decimal arithmetic with unconventional number systems expands the way of performing other high-performance decimal arithmetic (e.g., stand-alone division and square root) upon the basic binary devices (i.e., AND gate, OR gate, and binary full adder). The proposed techniques are also expected to be helpful to other non-binary based applications

    Monetary policy with interest on reserves

    Get PDF
    Since the fall of 2008, the amount of outstanding reserves on the Federal Reserve's balance sheet has increased from about 100 billion dollars to more than 1 trillion dollars. There is some concern that the magnitude of outstanding reserves might affect the ability of the Federal Reserve to conduct monetary policy through an interest rate policy. In this article I argue that the ability of the Federal Reserve to pay interest on reserves, also introduced in the fall of 2008, should lessen this concern. For an appropriately modified baseline model of money, I show that, with the payment of interest on reserves, the interaction of monetary and fiscal policy in the determination of the price level is not affected in a quantitatively meaningful way by the amount of outstanding reserves.Inflation (Finance) ; Monetary policy

    Automated Game Design Learning

    Full text link
    While general game playing is an active field of research, the learning of game design has tended to be either a secondary goal of such research or it has been solely the domain of humans. We propose a field of research, Automated Game Design Learning (AGDL), with the direct purpose of learning game designs directly through interaction with games in the mode that most people experience games: via play. We detail existing work that touches the edges of this field, describe current successful projects in AGDL and the theoretical foundations that enable them, point to promising applications enabled by AGDL, and discuss next steps for this exciting area of study. The key moves of AGDL are to use game programs as the ultimate source of truth about their own design, and to make these design properties available to other systems and avenues of inquiry.Comment: 8 pages, 2 figures. Accepted for CIG 201

    ARITHMETIC LOGIC UNIT ARCHITECTURES WITH DYNAMICALLY DEFINED PRECISION

    Get PDF
    Modern central processing units (CPUs) employ arithmetic logic units (ALUs) that support statically defined precisions, often adhering to industry standards. Although CPU manufacturers highly optimize their ALUs, industry standard precisions embody accuracy and performance compromises for general purpose deployment. Hence, optimizing ALU precision holds great potential for improving speed and energy efficiency. Previous research on multiple precision ALUs focused on predefined, static precisions. Little previous work addressed ALU architectures with customized, dynamically defined precision. This dissertation presents approaches for developing dynamic precision ALU architectures for both fixed-point and floating-point to enable better performance, energy efficiency, and numeric accuracy. These new architectures enable dynamically defined precision, including support for vectorization. The new architectures also prevent performance and energy loss due to applying unnecessarily high precision on computations, which often happens with statically defined standard precisions. The new ALU architectures support different precisions through the use of configurable sub-blocks, with this dissertation including demonstration implementations for floating point adder, multiply, and fused multiply-add (FMA) circuits with 4-bit sub-blocks. For these circuits, the dynamic precision ALU speed is nearly the same as traditional ALU approaches, although the dynamic precision ALU is nearly twice as large
    • 

    corecore