240 research outputs found

    Versatile Montgomery Multiplier Architectures

    Get PDF
    Several algorithms for Public Key Cryptography (PKC), such as RSA, Diffie-Hellman, and Elliptic Curve Cryptography, require modular multiplication of very large operands (sizes from 160 to 4096 bits) as their core arithmetic operation. To perform this operation reasonably fast, general purpose processors are not always the best choice. This is why specialized hardware, in the form of cryptographic co-processors, become more attractive. Based upon the analysis of recent publications on hardware design for modular multiplication, this M.S. thesis presents a new architecture that is scalable with respect to word size and pipelining depth. To our knowledge, this is the first time a word based algorithm for Montgomery\u27s method is realized using high-radix bit-parallel multipliers working with two different types of finite fields (unified architecture for GF(p) and GF(2n)). Previous approaches have relied mostly on bit serial multiplication in combination with massive pipelining, or Radix-8 multiplication with the limitation to a single type of finite field. Our approach is centered around the notion that the optimal delay in bit-parallel multipliers grows with logarithmic complexity with respect to the operand size n, O(log3/2 n), while the delay of bit serial implementations grows with linear complexity O(n). Our design has been implemented in VHDL, simulated and synthesized in 0.5μ CMOS technology. The synthesized net list has been verified in back-annotated timing simulations and analyzed in terms of performance and area consumption

    Architectures and implementations for the Polynomial Ring Engine over small residue rings

    Get PDF
    This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis. As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4. Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption

    Development of Cryptography since Shannon

    Get PDF
    This paper presents the development of cryptography since Shannon\u27s seminal paper ``Communication Theory of Secrecy Systems\u27\u27 in 1949

    完全準同型暗号によるセキュアクラウドコンピューティング

    Get PDF
    筑波大学 (University of Tsukuba)201

    Computer arithmetic based on the Continuous Valued Number System

    Get PDF

    Proceedings of the 7th Conference on Real Numbers and Computers (RNC'7)

    Get PDF
    These are the proceedings of RNC7

    NONLINEAR OPERATORS FOR IMAGE PROCESSING: DESIGN, IMPLEMENTATION AND MODELING TECHNIQUES FOR POWER ESTIMATION

    Get PDF
    1998/1999Negli ultimi anni passati le applicazioni multimediali hanno visto uno sviluppo notevole, trovando applicazione in un gran numero di campi. Applicazioni come video conferenze, diagnostica medica, telefonia mobile e applicazioni militari necessitano il trattamento di una gran mole di dati ad alta velocità. Pertanto, l'elaborazione di immagini e di dati vocali è molto importante ed è stata oggetto di numerosi sforzi, nel tentativo di trovare algoritmi sempre più veloci ed efficaci. Tra gli algoritmi proposti, noi crediamo che gli operatori razionali svolgano un ruolo molto importante, grazie alla loro versatilità ed efficacia nell'elaborazione di dati. Negli ultimi anni sono stati proposti diversi algoritmi, dimostrando che questi operatori possono essere molto vantaggiosi in diverse applicazioni, producendo buoni risultati. Lo scopo di questo lavoro è di realizzare alcuni di questi algoritmi e, quindi, dimostrare che i filtri razionali, in particolare, possono essere realizzati senza ricorrere a sistemi di grandi dimensioni e possono raggiungere frequenze operative molto alte. Una volta che il blocco fondamentale di un sistema basato su operatori razionali sia stato realizzato, esso pu6 essere riusato con successo in molte altre applicazioni. Dal punto di vista del progettista, è importante avere uno schema generale di studio, che lo renda capace di studiare le varie configurazioni del sistema da realizzare e di analizzare i compromessi tra le variabili di progetto. In particolare, per soddisfare l'esigenza di metodi versatili per la stima della potenza, abbiamo sviluppato una tecnica di macro modellizazione che permette al progettista di stimare velocemente ed accuratamente la potenza dissipata da un circuito. La tesi è organizzata come segue: Nel Capitolo 1 alcuni sono presentati alcuni algoritmi studiati per la realizzazione. Ne viene data solo una veloce descrizione, lasciando comunque al lettore interessato dei riferimenti bibliografici. Nel Capitolo 2 vengono discusse le architetture fondamentali usate per la realizzazione. Principalmente sono state usate architetture a pipeline, ma viene data anche una descrizione degli approcci oggigiorno disponibili per l'ottimizzazione delle temporizzazioni. Nel Capitolo 3 sono presentate le realizzazioni di due sistemi studiati per questa tesi. Gli approcci seguiti si basano su ASIC e FPGA. Richiedono tecniche e soluzioni diverse per il progetto del sistema, per cui é interessante vedere cosa pu6 essere fatto nei due casi. Infine, nel Capitolo 4, descriviamo la nostra tecnica di macro modellizazione per la stima di potenza, dando una breve visione delle tecniche finora proposte e facendo vedere quali sono i vantaggi che il nostro metodo comporta per il progetto.In the past few years, multimedia application have been growing very fast, being applied to a large variety of fields. Applications like video conference, medical diagnostic, mobile phones, military applications require to handle large amount of data at high rate. Images as well as voice data processing are therefore very important and they have been subjected to a lot of efforts in order to find always faster and effective algorithms. Among image processing algorithms, we believe that rational operators assume an important role, due to their versatility and effectiveness in data processing. In the last years, several algorithms have been proposed, demonstrating that these operators can be very suitable in different applications with very good results. The aim of this work is to implement some of these algorithm and, therefore, demonstrate that rational filters, in particular, can be implemented without requiring large sized systems and they can operate at very high frequencies. Once the basic building block of a rational based system has been implemented, it can be successfully reused in many other applications. From the designer point of view, it is important to have a general framework, which makes it able to study various configurations of the system to be implemented and analyse the trade-off among the design variables. In particular, to meet the need far versatile tools far power estimation, we developed a new macro modelling technique, which allows the designer to estimate the power dissipated by a circuit quickly and accurately. The thesis is organized as follows: In chapter 1 we present some of the algorithms which have been studied for implementation. Only a brief overview is given, leaving to the interested reader some references in literature. In chapter 2 we discuss the basic architectures used for the implementations. Pipelined structures have been mainly used for this thesis, but an overview of the nowaday available approaches for timing optimization is presented. In chapter 3 we present two of the implementation designed for this thesis. The approaches followed are ASIC driven and FPGA drive. They require different techniques and different solution for the design of the system, therefore it is interesting to see what can be done in both the cases. Finally, in chapter 4, we describe our macro modelling techniques for power estimation, giving a brief overview of the up to now proposed techniques and showing the advantages our method brings to the design.XII Ciclo1969Versione digitalizzata della tesi di dottorato cartacea

    Resiliency Mechanisms for In-Memory Column Stores

    Get PDF
    The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems. In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION 1.1 Contributions of this Thesis 1.2 Outline 2 PROBLEM DESCRIPTION AND RELATED WORK 2.1 Reliable Data Management on Reliable Hardware 2.2 The Shift Towards Unreliable Hardware 2.3 Hardware-Based Mitigation of Bit Flips 2.4 Data Management System Requirements 2.5 Software-Based Techniques For Handling Bit Flips 2.5.1 Operating System-Level Techniques 2.5.2 Compiler-Level Techniques 2.5.3 Application-Level Techniques 2.6 Summary and Conclusions 3 ANALYSIS OF CODING TECHNIQUES 3.1 Selection of Error Codes 3.1.1 Hamming Coding 3.1.2 XOR Checksums 3.1.3 AN Coding 3.1.4 Summary and Conclusions 3.2 Probabilities of Silent Data Corruption 3.2.1 Probabilities of Hamming Codes 3.2.2 Probabilities of XOR Checksums 3.2.3 Probabilities of AN Codes 3.2.4 Concrete Error Models 3.2.5 Summary and Conclusions 3.3 Throughput Considerations 3.3.1 Test Systems Descriptions 3.3.2 Vectorizing Hamming Coding 3.3.3 Vectorizing XOR Checksums 3.3.4 Vectorizing AN Coding 3.3.5 Summary and Conclusions 3.4 Comparison of Error Codes 3.4.1 Effectiveness 3.4.2 Efficiency 3.4.3 Runtime Adaptability 3.5 Performance Optimizations for AN Coding 3.5.1 The Modular Multiplicative Inverse 3.5.2 Faster Softening 3.5.3 Faster Error Detection 3.5.4 Comparison to Original AN Coding 3.5.5 The Multiplicative Inverse Anomaly 3.6 Summary 4 BIT FLIP DETECTING STORAGE 4.1 Column Store Architecture 4.1.1 Logical Data Types 4.1.2 Storage Model 4.1.3 Data Representation 4.1.4 Data Layout 4.1.5 Tree Index Structures 4.1.6 Summary 4.2 Hardened Data Storage 4.2.1 Hardened Physical Data Types 4.2.2 Hardened Lightweight Compression 4.2.3 Hardened Data Layout 4.2.4 UDI Operations 4.2.5 Summary and Conclusions 4.3 Hardened Tree Index Structures 4.3.1 B-Tree Verification Techniques 4.3.2 Justification For Further Techniques 4.3.3 The Error Detecting B-Tree 4.4 Summary 5 BIT FLIP DETECTING QUERY PROCESSING 5.1 Column Store Query Processing 5.2 Bit Flip Detection Opportunities 5.2.1 Early Onetime Detection 5.2.2 Late Onetime Detection 5.2.3 Continuous Detection 5.2.4 Miscellaneous Processing Aspects 5.2.5 Summary and Conclusions 5.3 Hardened Intermediate Results 5.3.1 Materialization of Hardened Intermediates 5.3.2 Hardened Bitmaps 5.4 Summary 6 END-TO-END EVALUATION 6.1 Prototype Implementation 6.1.1 AHEAD Architecture 6.1.2 Diversity of Physical Operators 6.1.3 One Concrete Operator Realization 6.1.4 Summary and Conclusions 6.2 Performance of Individual Operators 6.2.1 Selection on One Predicate 6.2.2 Selection on Two Predicates 6.2.3 Join Operators 6.2.4 Grouping and Aggregation 6.2.5 Delta Operator 6.2.6 Summary and Conclusions 6.3 Star Schema Benchmark Queries 6.3.1 Query Runtimes 6.3.2 Improvements Through Vectorization 6.3.3 Storage Overhead 6.3.4 Summary and Conclusions 6.4 Error Detecting B-Tree 6.4.1 Single Key Lookup 6.4.2 Key Value-Pair Insertion 6.5 Summary 7 SUMMARY AND CONCLUSIONS 7.1 Future Work A APPENDIX A.1 List of Golden As A.2 More on Hamming Coding A.2.1 Code examples A.2.2 Vectorization BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES LIST OF LISTINGS LIST OF ACRONYMS LIST OF SYMBOLS LIST OF DEFINITION

    Digital Signal Processing (Second Edition)

    Get PDF
    This book provides an account of the mathematical background, computational methods and software engineering associated with digital signal processing. The aim has been to provide the reader with the mathematical methods required for signal analysis which are then used to develop models and algorithms for processing digital signals and finally to encourage the reader to design software solutions for Digital Signal Processing (DSP). In this way, the reader is invited to develop a small DSP library that can then be expanded further with a focus on his/her research interests and applications. There are of course many excellent books and software systems available on this subject area. However, in many of these publications, the relationship between the mathematical methods associated with signal analysis and the software available for processing data is not always clear. Either the publications concentrate on mathematical aspects that are not focused on practical programming solutions or elaborate on the software development of solutions in terms of working ‘black-boxes’ without covering the mathematical background and analysis associated with the design of these software solutions. Thus, this book has been written with the aim of giving the reader a technical overview of the mathematics and software associated with the ‘art’ of developing numerical algorithms and designing software solutions for DSP, all of which is built on firm mathematical foundations. For this reason, the work is, by necessity, rather lengthy and covers a wide range of subjects compounded in four principal parts. Part I provides the mathematical background for the analysis of signals, Part II considers the computational techniques (principally those associated with linear algebra and the linear eigenvalue problem) required for array processing and associated analysis (error analysis for example). Part III introduces the reader to the essential elements of software engineering using the C programming language, tailored to those features that are used for developing C functions or modules for building a DSP library. The material associated with parts I, II and III is then used to build up a DSP system by defining a number of ‘problems’ and then addressing the solutions in terms of presenting an appropriate mathematical model, undertaking the necessary analysis, developing an appropriate algorithm and then coding the solution in C. This material forms the basis for part IV of this work. In most chapters, a series of tutorial problems is given for the reader to attempt with answers provided in Appendix A. These problems include theoretical, computational and programming exercises. Part II of this work is relatively long and arguably contains too much material on the computational methods for linear algebra. However, this material and the complementary material on vector and matrix norms forms the computational basis for many methods of digital signal processing. Moreover, this important and widely researched subject area forms the foundations, not only of digital signal processing and control engineering for example, but also of numerical analysis in general. The material presented in this book is based on the lecture notes and supplementary material developed by the author for an advanced Masters course ‘Digital Signal Processing’ which was first established at Cranfield University, Bedford in 1990 and modified when the author moved to De Montfort University, Leicester in 1994. The programmes are still operating at these universities and the material has been used by some 700++ graduates since its establishment and development in the early 1990s. The material was enhanced and developed further when the author moved to the Department of Electronic and Electrical Engineering at Loughborough University in 2003 and now forms part of the Department’s post-graduate programmes in Communication Systems Engineering. The original Masters programme included a taught component covering a period of six months based on two semesters, each Semester being composed of four modules. The material in this work covers the first Semester and its four parts reflect the four modules delivered. The material delivered in the second Semester is published as a companion volume to this work entitled Digital Image Processing, Horwood Publishing, 2005 which covers the mathematical modelling of imaging systems and the techniques that have been developed to process and analyse the data such systems provide. Since the publication of the first edition of this work in 2003, a number of minor changes and some additions have been made. The material on programming and software engineering in Chapters 11 and 12 has been extended. This includes some additions and further solved and supplementary questions which are included throughout the text. Nevertheless, it is worth pointing out, that while every effort has been made by the author and publisher to provide a work that is error free, it is inevitable that typing errors and various ‘bugs’ will occur. If so, and in particular, if the reader starts to suffer from a lack of comprehension over certain aspects of the material (due to errors or otherwise) then he/she should not assume that there is something wrong with themselves, but with the author
    corecore