187 research outputs found

    Decryption Failure Attacks on Post-Quantum Cryptography

    Get PDF
    This dissertation discusses mainly new cryptanalytical results related to issues of securely implementing the next generation of asymmetric cryptography, or Public-Key Cryptography (PKC).PKC, as it has been deployed until today, depends heavily on the integer factorization and the discrete logarithm problems.Unfortunately, it has been well-known since the mid-90s, that these mathematical problems can be solved due to Peter Shor's algorithm for quantum computers, which achieves the answers in polynomial time.The recently accelerated pace of R&D towards quantum computers, eventually of sufficient size and power to threaten cryptography, has led the crypto research community towards a major shift of focus.A project towards standardization of Post-quantum Cryptography (PQC) was launched by the US-based standardization organization, NIST. PQC is the name given to algorithms designed for running on classical hardware/software whilst being resistant to attacks from quantum computers.PQC is well suited for replacing the current asymmetric schemes.A primary motivation for the project is to guide publicly available research toward the singular goal of finding weaknesses in the proposed next generation of PKC.For public key encryption (PKE) or digital signature (DS) schemes to be considered secure they must be shown to rely heavily on well-known mathematical problems with theoretical proofs of security under established models, such as indistinguishability under chosen ciphertext attack (IND-CCA).Also, they must withstand serious attack attempts by well-renowned cryptographers both concerning theoretical security and the actual software/hardware instantiations.It is well-known that security models, such as IND-CCA, are not designed to capture the intricacies of inner-state leakages.Such leakages are named side-channels, which is currently a major topic of interest in the NIST PQC project.This dissertation focuses on two things, in general:1) how does the low but non-zero probability of decryption failures affect the cryptanalysis of these new PQC candidates?And 2) how might side-channel vulnerabilities inadvertently be introduced when going from theory to the practice of software/hardware implementations?Of main concern are PQC algorithms based on lattice theory and coding theory.The primary contributions are the discovery of novel decryption failure side-channel attacks, improvements on existing attacks, an alternative implementation to a part of a PQC scheme, and some more theoretical cryptanalytical results

    Memory Safety Acceleration on RISC-V for C Programming Language

    Full text link
    Memory corruption vulnerabilities can lead to memory attacks. Three of the top ten most dangerous weaknesses in computer security are memory-related. Memory attack is one of a computer system’s oldest but everlasting problems. Companies and the government lost billions of dollars due to memory security breaches. Memory safety is paramount to securing memory systems. Pointer-based memory safety protection has been shown as a promising solution covering both out-of-bounds and use-after-free errors. However, pointer-based memory safety relies on additional information (metadata) to check validity when a pointer is dereferenced. Such operations on the metadata introduce significant performance overhead to the system. Existing hardware/software implementations are primarily limited to proprietary closed-source microprocessors, simulation-only studies, or require changes to the input source code. In order to provide the need for memory security, we created a memory-safe RISC-V platform with low-performance overhead. In this thesis, a novel hardware/software co-design methodology consisting of a RISC-V based processor is extended with new instructions and microarchitecture enhancements, enabling complete memory safety in the C programming language and faster memory safety checks. Furthermore, a compiler is instrumented to provide security operations considering the changes to the processor. Moreover, a design exploration framework is proposed to provide an in-depth search for optimal hardware/software configuration for application-specific workloads regarding performance overhead, security coverage, area cost, and critical path latency. The entire system is realized by enhancing a RISC-V Rocket-chip system-on-chip (SoC). The resultant processor SoC is implemented on an FPGA and evaluated with applications from SPEC 2006 (for generic applications), MiBench (for embedded applications), and Olden benchmark suites for performance. The system, including the RISC-V CHISEL, compiler, profiling and analysis tool-chain, is fully available and open-source to the public

    An Extensible Theorem Proving Frontend

    Get PDF
    Interaktive Theorembeweiser sind Softwarewerkzeuge zum computergestützten Beweisen, d.h. sie können entsprechend kodierte Beweise von logischen Aussagen sowohl verifizieren als auch beim Erstellen dieser unterstützen. In den letzten Jahren wurden weitreichende Formalisierungsprojekte über Mathematik sowie Programmverifikation mit solchen Theorembeweisern bewältigt. Der Theorembeweiser Lean insbesondere wurde nicht nur erfolgreich zum Verifizieren lange bekannter mathematischer Theoreme verwendet, sondern auch zur Unterstützung von aktueller mathematischer Forschung. Das Ziel des Lean-Projekts ist nichts weniger als die Arbeitsweise von Mathematikern grundlegend zu verändern, indem mit dem Computer formalisierte Beweise eine praktible Alternative zu solchen mit Stift und Papier werden sollen. Aufwändige manuelle Gutachten zur Korrektheit von Beweisen wären damit hinfällig und gleichzeitig wäre garantiert, dass alle nötigen Beweisschritte exakt erfasst sind, statt der Interpretation und dem Hintergrundwissen des Lesers überlassen zu sein. Um dieses Ziel zu erreichen, sind jedoch noch weitere Fortschritte hinsichtlich Effizienz und Nutzbarkeit von Theorembeweisern nötig. Als Schritt in Richtung dieses Ziels beschreibt diese Dissertation eine neue, vollständig erweiterbare Theorembeweiser-Benutzerschnittstelle ("frontend") im Rahmen von Lean 4, der nächsten Version von Lean. Aufgabe dieser Benutzerschnittstelle ist die textuelle Beschreibung und Entgegennahme der Beweiseingabe in einer Syntax, die mehrere teils widersprüchliche Ziele optimieren sollte: Kompaktheit, Lesbarkeit für menschliche Benutzer und Eindeutigkeit in der Interpretation durch den Theorembeweiser. Da in der geschriebenen Mathematik eine umfangreiche Menge an verschiedenen Notationen existiert, die von Jahr zu Jahr weiter wächst und sich gleichzeitig zwischen verschiedenen Feldern, Autoren oder sogar einzelnen Arbeiten unterscheiden kann, muss solch eine Schnittstelle es Benutzern erlauben, sie jederzeit mit neuen, ausdrucksfähigen Notationen zu erweitern und ihnen mit flexiblen Regeln Bedeutung zuzuschreiben. Dieser Wunsch nach Flexibilität der Eingabesprache lässt sich weiterhin auch auf der Ebene der einzelnen Beweisschritte ("Taktiken") sowie höheren Ebenen der Beweis- und Programmorganisation wiederfinden. Den Kernteil dieser gewünschten Erweiterbarkeit habe ich mit einem ausdrucksstarken Makrosystem für Lean realisiert, mit dem sich sowohl einfach Syntaxtransformationen ("syntaktischer Zucker") also auch komplexe, typgesteuerte Übersetzung in die Kernsprache des Beweisers ausdrücken lassen. Das Makrosystem basiert auf einem neuartigen Algorithmus für Makrohygiene, basierend auf dem der Lisp-Sprache Racket und von mir an die spezifischen Anforderungen von Theorembeweisern angepasst, dessen Aufgabe es ist zu gewährleisten, dass lexikalische Geltungsbereiche von Bezeichnern selbst für komplexe Makros wie intuitiv erwartet funktionieren. Besonders habe ich beim Entwurf des Makrosystems darauf geachtet, das System einfach zugänglich zu gestalten, indem mehrere Abstraktionsebenen bereitgestellt werden, die sich in ihrer Ausdrucksstärke unterscheiden, aber auf den gleichen fundamentalen Prinzipien wie der erwähnten Makrohygiene beruhen. Als ein Anwendungsbeispiel des Makrosystems beschreibe ich eine Erweiterung der aus Haskell bekannten "do"-Notation um weitere imperative Sprachfeatures. Die erweiterte Syntax ist in Lean 4 eingeflossen und hat grundsätzlich die Art und Weise verändert, wie sowohl Entwickler als auch Benutzer monadischen, aber auch puren Code schreiben. Das Makrosystem stellt das "Herz" des erweiterbaren Frontends dar, ist gleichzeitig aber auch eng mit anderen Softwarekomponenten innerhalb der Benutzerschnittstelle verknüpft oder von ihnen abhängig. Ich stelle das gesamte Frontend und das umgebende Lean-System vor mit Fokus auf Teilen, an denen ich maßgeblich mitgewirkt habe. Schließlich beschreibe ich noch ein effizientes Referenzzählungsschema für funktionale Programmierung, welches eine Neuimplementierung von Lean in Lean selbst und damit das erweiterbare Frontend erst ermöglicht hat. Spezifische Optimierungen darin zur Wiederverwendung von Allokationen vereinen, ähnlich wie die erweiterte do-Notation, die Vorteile von imperativer und pur funktionaler Programmierung in einem neuen Paradigma, das ich "pure imperative Programmierung" nenne

    Scalable and fault-tolerant data stream processing on multi-core architectures

    Get PDF
    With increasing data volumes and velocity, many applications are shifting from the classical “process-after-store” paradigm to a stream processing model: data is produced and consumed as continuous streams. Stream processing captures latency-sensitive applications as diverse as credit card fraud detection and high-frequency trading. These applications are expressed as queries of algebraic operations (e.g., aggregation) over the most recent data using windows, i.e., finite evolving views over the input streams. To guarantee correct results, streaming applications require precise window semantics (e.g., temporal ordering) for operations that maintain state. While high processing throughput and low latency are performance desiderata for stateful streaming applications, achieving both poses challenges. Computing the state of overlapping windows causes redundant aggregation operations: incremental execution (i.e., reusing previous results) reduces latency but prevents parallelization; at the same time, parallelizing window execution for stateful operations with precise semantics demands ordering guarantees and state access coordination. Finally, streams and state must be recovered to produce consistent and repeatable results in the event of failures. Given the rise of shared-memory multi-core CPU architectures and high-speed networking, we argue that it is possible to address these challenges in a single node without compromising window semantics, performance, or fault-tolerance. In this thesis, we analyze, design, and implement stream processing engines (SPEs) that achieve high performance on multi-core architectures. To this end, we introduce new approaches for in-memory processing that address the previous challenges: (i) for overlapping windows, we provide a family of window aggregation techniques that enable computation sharing based on the algebraic properties of aggregation functions; (ii) for parallel window execution, we balance parallelism and incremental execution by developing abstractions for both and combining them to a novel design; and (iii) for reliable single-node execution, we enable strong fault-tolerance guarantees without sacrificing performance by reducing the required disk I/O bandwidth using a novel persistence model. We combine the above to implement an SPE that processes hundreds of millions of tuples per second with sub-second latencies. These results reveal the opportunity to reduce resource and maintenance footprint by replacing cluster-based SPEs with single-node deployments.Open Acces

    Design and Code Optimization for Systems with Next-generation Racetrack Memories

    Get PDF
    With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 13371 and 13372 constitutes the refereed proceedings of the 34rd International Conference on Computer Aided Verification, CAV 2022, which was held in Haifa, Israel, in August 2022. The 40 full papers presented together with 9 tool papers and 2 case studies were carefully reviewed and selected from 209 submissions. The papers were organized in the following topical sections: Part I: Invited papers; formal methods for probabilistic programs; formal methods for neural networks; software Verification and model checking; hyperproperties and security; formal methods for hardware, cyber-physical, and hybrid systems. Part II: Probabilistic techniques; automata and logic; deductive verification and decision procedures; machine learning; synthesis and concurrency. This is an open access book

    Automated Reasoning

    Get PDF
    This volume, LNAI 13385, constitutes the refereed proceedings of the 11th International Joint Conference on Automated Reasoning, IJCAR 2022, held in Haifa, Israel, in August 2022. The 32 full research papers and 9 short papers presented together with two invited talks were carefully reviewed and selected from 85 submissions. The papers focus on the following topics: Satisfiability, SMT Solving,Arithmetic; Calculi and Orderings; Knowledge Representation and Jutsification; Choices, Invariance, Substitutions and Formalization; Modal Logics; Proofs System and Proofs Search; Evolution, Termination and Decision Prolems. This is an open access book

    Lossy and Lossless Compression Techniques to Improve the Utilization of Memory Bandwidth and Capacity

    Get PDF
    Main memory is a critical resource in modern computer systems and is in increasing demand. An increasing number of on-chip cores and specialized accelerators improves the potential processing throughput but also calls for higher data rates and greater memory capacity. In addition, new emerging data-intensive applications further increase memory traffic and footprint. On the other hand, memory bandwidth is pin limited and power constrained and is therefore more difficult to scale. Memory capacity is limited by cost and energy considerations.This thesis proposes a variety of memory compression techniques as a means to reduce the memory bottleneck. These techniques target two separate problems in the memory hierarchy: memory bandwidth and memory capacity. In order to reduce transferred data volumes, lossy compression is applied which is able to reach more aggressive compression ratios. A reduction of off-chip memory traffic leads to reduced memory latency, which in turn improves the performance and energy efficiency of the system. To improve memory capacity, a novel approach to memory compaction is presented.The first part of this thesis introduces Approximate Value Reconstruction (AVR), which combines a low-complexity downsampling compressor with an LLC design able to co-locate compressed and uncompressed data. Two separate thresholds limit the error introduced by approximation. For applications that tolerate aggressive approximation in large fractions of their data, in a system with 1GB of 1600MHz DDR4 per core and 1MB of LLC space per core, AVR reduces memory traffic by up to 70%, execution time by up to 55%, and energy costs by up to 20% introducing at most 1.2% error in the application output.The second part of this thesis proposes Memory Squeeze (MemSZ), introducing a parallelized implementation of the more advanced Squeeze (SZ) compression method. Furthermore, MemSZ improves on the error limiting capability of AVR by keeping track of life-time accumulated error. An alternate memory compression architecture is also proposed, which utilizes 3D-stacked DRAM as a last-level cache. In a system with 1GB of 800MHz DDR4 per core and 1MB of LLC space per core, MemSZ improves execution time, energy and memory traffic over AVR by up to 15%, 9%, and 64%, respectively.The third part of the thesis describes L2C, a hybrid lossy and lossless memory compression scheme. L2C applies lossy compression to approximable data, and falls back to lossless if an error threshold is exceeded. In a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, L2C improves on the performance of MemSZ by 9%, and energy consumption by 3%.The fourth and final contribution is FlatPack, a novel memory compaction scheme. FlatPack is able to reduce the traffic overhead compared to other memory compaction systems, thus retaining the bandwidth benefits of compression. Furthermore, FlatPack is flexible to changes in block compressibility both over time and between adjacent blocks. When available memory corresponds to 50% of the application footprint, in a system with 4GB of 800MHz DDR4 per core and 1MB of LLC space per core, FlatPack increases system performance compared to current state-of-the-art designs by 36%, while reducing system energy consumption by 12%

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
    corecore