2,482 research outputs found

    Design of a Flexible Schoenhage-Strassen FFT Polynomial Multiplier with High-Level Synthesis

    Get PDF
    Homomorphic Encryption (HE) is a promising field because it allows for encrypted data to be sent to and operated on by untrusted parties without the risk of privacy compromise. The benefits and applications of HE are far reaching, especially in regard to cloud computing. However, current HE solutions require resource intensive arithmetic operations such as high precision, high degree polynomial multiplication resulting in a minimum computational complexity of O(n log(n)) on standard CPUs though application of the Fast Fourier Transform (FFT). These operations result in poor overall performance for HE schemes in software and would benefit greatly from hardware acceleration. This work aims to accelerate the multi-precision arithmetic operations used in HE with specific focus on an implementation of the Schönhage-Strassen FFT based multiplication algorithm. It is to be incorporated into a larger HE library of arithmetic functions tuned for High Level Synthesis (HLS) that enables flexible solutions for hardware/software systems on reconfigurable cloud resources. Although this project was inspired by HE, it could be incorporated within a generic mathematical library and support other domains. The developed FFT based polynomial multiplier exhibits flexibility in the selection of security parameters facilitating its use in a wide range of HE schemes and applications. The design also displayed substantial speedup over the polynomial multiplication functions implemented in the Number Theory Library (NTL) utilized by software based HE solutions

    Using Reduced Graphs for Efficient HLS Scheduling

    Get PDF
    High-Level Synthesis (HLS) is the process of inferring a digital circuit from a high-level algorithmic description provided as a software implementation, usually in C/C++. HLS tools will parse the input code and then perform three main steps: allocation, scheduling, and binding. This results in a hardware architecture which can then be represented as a Register-Transfer Level (RTL) model using a Hardware Description Language (HDL), such as VHDL or Verilog. Allocation determines the amount of resources needed, scheduling finds the order in which operations should occur, and binding maps operations onto the allocated hardware resources. Two main challenges of scheduling are in its computational complexity and memory requirements. Finding an optimal schedule is an NP-hard problem, so many tools use elaborate heuristics to find a solution which satisfies prescribed implementation constraints. These heuristics require the Control/Data Flow Graph (CDFG), a representation of all operations and their dependencies, which must be stored in its entirety and therefore use large amounts of memory. This thesis presents a new scheduling approach for use in the HLS tool chain. The new technique schedules operations using an algorithm which operates on a reduced representation of the graph, which does not need to retain individual dependency information in order to generate a schedule. By using the simplified graph, the complexity of scheduling is significantly reduced, resulting in improved memory usage and lower computational effort. This new scheduler is implemented and compared to the existing scheduler in the open source version of the LegUp HLS tool. The results demonstrate that an average of 16 times speedup on the time required to determine the schedule can be achieved, with just a fraction of the memory usage (1/5 on average). All of this is achieved with 0 to 6% of added cost on the final hardware execution time

    Fast and Clean: Auditable high-performance assembly via constraint solving

    Get PDF
    Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice. In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture. We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72 microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability

    CoFHEE: A Co-processor for Fully Homomorphic Encryption Execution

    Full text link
    The migration of computation to the cloud has raised privacy concerns as sensitive data becomes vulnerable to attacks since they need to be decrypted for processing. Fully Homomorphic Encryption (FHE) mitigates this issue as it enables meaningful computations to be performed directly on encrypted data. Nevertheless, FHE is orders of magnitude slower than unencrypted computation, which hinders its practicality and adoption. Therefore, improving FHE performance is essential for its real world deployment. In this paper, we present a year-long effort to design, implement, fabricate, and post-silicon validate a hardware accelerator for Fully Homomorphic Encryption dubbed CoFHEE. With a design area of 12mm212mm^2, CoFHEE aims to improve performance of ciphertext multiplications, the most demanding arithmetic FHE operation, by accelerating several primitive operations on polynomials, such as polynomial additions and subtractions, Hadamard product, and Number Theoretic Transform. CoFHEE supports polynomial degrees of up to n=214n = 2^{14} with a maximum coefficient sizes of 128 bits, while it is capable of performing ciphertext multiplications entirely on chip for n≀213n \leq 2^{13}. CoFHEE is fabricated in 55nm CMOS technology and achieves 250 MHz with our custom-built low-power digital PLL design. In addition, our chip includes two communication interfaces to the host machine: UART and SPI. This manuscript presents all steps and design techniques in the ASIC development process, ranging from RTL design to fabrication and validation. We evaluate our chip with performance and power experiments and compare it against state-of-the-art software implementations and other ASIC designs. Developed RTL files are available in an open-source repository

    Compact Ring-LWE Cryptoprocessor

    Full text link
    Abstract. In this paper we propose an efficient and compact processor for a ring-LWE based encryption scheme. We present three optimizations for the Num-ber Theoretic Transform (NTT) used for polynomial multiplication: we avoid pre-processing in the negative wrapped convolution by merging it with the main algo-rithm, we reduce the fixed computation cost of the twiddle factors and propose an advanced memory access scheme. These optimization techniques reduce both the cycle and memory requirements. Finally, we also propose an optimization of the ring-LWE encryption system that reduces the number of NTT operations from five to four resulting in a 20 % speed-up. We use these computational optimiza-tions along with several architectural optimizations to design an instruction-set ring-LWE cryptoprocessor. For dimension 256, our processor performs encryp-tion/decryption operations in 20/9 ”s on a Virtex 6 FPGA and only requires 1349 LUTs, 860 FFs, 1 DSP-MULT and 2 BRAMs. Similarly for dimension 512, the processor takes 48/21 ”s for performing encryption/decryption operations and only requires 1536 LUTs, 953 FFs, 1 DSP-MULT and 3 BRAMs. Our pro-cessors are therefore more than three times smaller than the current state of the art hardware implementations, whilst running somewhat faster

    Power network and smart grids analysis from a graph theoretic perspective

    Get PDF
    The growing size and complexity of power systems has given raise to the use of complex network theory in their modelling, analysis, and synthesis. Though most of the previous studies in this area have focused on distributed control through well established protocols like synchronization and consensus, recently, a few fundamental concepts from graph theory have also been applied, for example in symmetry-based cluster synchronization. Among the existing notions of graph theory, graph symmetry is the focus of this proposal. However, there are other development around some concepts from complex network theory such as graph clustering in the study. In spite of the widespread applications of symmetry concepts in many real world complex networks, one can rarely find an article exploiting the symmetry in power systems. In addition, no study has been conducted in analysing controllability and robustness for a power network employing graph symmetry. It has been verified that graph symmetry promotes robustness but impedes controllability. A largely absent work, even in other fields outside power systems, is the simultaneous investigation of the symmetry effect on controllability and robustness. The thesis can be divided into two section. The first section, including Chapters 2-3, establishes the major theoretical development around the applications of graph symmetry in power networks. A few important topics in power systems and smart grids such as controllability and robustness are addressed using the symmetry concept. These topics are directed toward solving specific problems in complex power networks. The controllability analysis will lead to new algorithms elaborating current controllability benchmarks such as the maximum matching and the minimum dominant set. The resulting algorithms will optimize the number of required driver nodes indicated as FACTS devices in power networks. The second topic, robustness, will be tackled by the symmetry analysis of the network to investigate three aspects of network robustness: robustness of controllability, disturbance decoupling, and fault tolerance against failure in a network element. In the second section, including Chapters 4-8, in addition to theoretical development, a few novel applications are proposed for the theoretical development proposed in both sections one and two. In Chapter 4, an application for the proposed approaches is introduced and developed. The placement of flexible AC transmission systems (FACTS) is investigated where the cybersecurity of the associated data exchange under the wide area power networks is also considered. A new notion of security, i.e. moderated-k-symmetry, is introduced to leverage on the symmetry characteristics of the network to obscure the network data from the adversary perspective. In chapters 5-8, the use of graph theory, and in particular, graph symmetry and centrality, are adapted for the complex network of charging stations. In Chapter 5, the placement and sizing of charging stations (CSs) of the network of electric vehicles are addressed by proposing a novel complex network model of the charging stations. The problems of placement and sizing are then reformulated in a control framework and the impact of symmetry on the number and locations of charging stations is also investigated. These results are developed in Chapters 6-7 to robust placement and sizing of charging stations for the Tesla network of Sydney where the problem of extending the capacity having a set of pre-existing CSs are addressed. The role of centrality in placement of CSs is investigated in Chapter 8. Finally, concluding remarks and future works are presented in Chapter 9

    Challenges and opportunities for non-antibody scaffold drugs

    Get PDF
    The first candidates from the promising class of small non-antibody protein scaffolds are now moving into clinical development and practice. Challenges remain, and scaffolds will need to be further tailored toward applications where they provide real advantages over established therapeutics to succeed in a rapidly evolving drug development landscape

    Dagstuhl News January - December 2005

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic
    • 

    corecore