Search CORE

50 research outputs found

A Compiler from Array Programs to Vectorized Homomorphic Encryption

Author: Myers Andrew C.
Recto Rolph
Publication venue
Publication date: 10/11/2023
Field of study

Homomorphic encryption (HE) is a practical approach to secure computation over encrypted data. However, writing programs with efficient HE implementations remains the purview of experts. A difficult barrier for programmability is that efficiency requires operations to be vectorized in inobvious ways, forcing efficient HE programs to manipulate ciphertexts with complex data layouts and to interleave computations with data movement primitives. We present Viaduct-HE, a compiler generates efficient vectorized HE programs. Viaduct-HE can generate both the operations and complex data layouts required for efficient HE programs. The source language of Viaduct-HE is array-oriented, enabling the compiler to have a simple representation of possible vectorization schedules. With such a representation, the compiler searches the space of possible vectorization schedules and finds those with efficient data layouts. After finding a vectorization schedule, Viaduct-HE further optimizes HE programs through term rewriting. The compiler has extension points to customize the exploration of vectorization schedules, to customize the cost model for HE programs, and to add back ends for new HE libraries. Our evaluation of the prototype Viaduct-HE compiler shows that it produces efficient vectorized HE programs with sophisticated data layouts and optimizations comparable to those designed by experts

arXiv.org e-Print Archive

Cautiously Optimistic Program Analyses for Secure and Reliable Software

Author: Banerjee Subarno
Publication venue
Publication date: 01/01/2021
Field of study

Modern computer systems still have various security and reliability vulnerabilities. Well-known dynamic analyses solutions can mitigate them using runtime monitors that serve as lifeguards. But the additional work in enforcing these security and safety properties incurs exorbitant performance costs, and such tools are rarely used in practice. Our work addresses this problem by constructing a novel technique- Cautiously Optimistic Program Analysis (COPA). COPA is optimistic- it infers likely program invariants from dynamic observations, and assumes them in its static reasoning to precisely identify and elide wasteful runtime monitors. The resulting system is fast, but also ensures soundness by recovering to a conservatively optimized analysis when a likely invariant rarely fails at runtime. COPA is also cautious- by carefully restricting optimizations to only safe elisions, the recovery is greatly simplified. It avoids unbounded rollbacks upon recovery, thereby enabling analysis for live production software. We demonstrate the effectiveness of Cautiously Optimistic Program Analyses in three areas: Information-Flow Tracking (IFT) can help prevent security breaches and information leaks. But they are rarely used in practice due to their high performance overhead (>500% for web/email servers). COPA dramatically reduces this cost by eliding wasteful IFT monitors to make it practical (9% overhead, 4x speedup). Automatic Garbage Collection (GC) in managed languages (e.g. Java) simplifies programming tasks while ensuring memory safety. However, there is no correct GC for weakly-typed languages (e.g. C/C++), and manual memory management is prone to errors that have been exploited in high profile attacks. We develop the first sound GC for C/C++, and use COPA to optimize its performance (16% overhead). Sequential Consistency (SC) provides intuitive semantics to concurrent programs that simplifies reasoning for their correctness. However, ensuring SC behavior on commodity hardware remains expensive. We use COPA to ensure SC for Java at the language-level efficiently, and significantly reduce its cost (from 24% down to 5% on x86). COPA provides a way to realize strong software security, reliability and semantic guarantees at practical costs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/170027/1/subarno_1.pd

Deep Blue Documents at the University of Michigan

Exploring C semantics and pointer provenance

Author: Davis Brooks
Gomes Victor BF
Kell Stephen
Memarian Kayvan
Richardson Alexander
Sewell Peter
Watson Robert NM
Publication venue: Proceedings of the ACM on Programming Languages
Publication date: 02/01/2019
Field of study

The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage - which itself is difficult to investigate. In this paper we explore the possible source-language semantics for memory objects and pointers, in ISO C and in C as it is used and implemented in practice, focussing especially on pointer provenance. We aim to, as far as possible, reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code. We present two coherent proposals, tracking provenance via integers and not; both address many design questions. We highlight some pros and cons and open questions, and illustrate the discussion with a library of test cases. We make our semantics executable as a test oracle, integrating it with the Cerberus semantics for much of the rest of C, which we have made substantially more complete and robust, and equipped with a web-interface GUI. This allows us to experimentally assess our proposals on those test cases. To assess their viability with respect to larger bodies of C code, we analyse the changes required and the resulting behaviour for a port of FreeBSD to CHERI, a research architecture supporting hardware capabilities, which (roughly speaking) traps on the memory safety violations which our proposals deem undefined behaviour. We also develop a new runtime instrumentation tool to detect possible provenance violations in normal C code, and apply it to some of the SPEC benchmarks. We compare our proposal with a source-language variant of the twin-allocation LLVM semantics proposal of Lee et al. Finally, we describe ongoing interactions with WG14, exploring how our proposals could be incorporated into the ISO standard

Kent Academic Repository

Apollo (Cambridge)

Vectorization system for unstructured codes with a Data-parallel Compiler IR

Author: Moll Simon
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2021
Field of study

With Dennard Scaling coming to an end, Single Instruction Multiple Data (SIMD) oﬀers itself as a way to improve the compute throughput of CPUs. One fundamental technique in SIMD code generators is the vectorization of data-parallel code regions. This has applications in outer-loop vectorization, whole-function vectorization and vectorization of explicitly data-parallel languages. This thesis makes contributions to the reliable vectorization of data-parallel code regions with unstructured, reducible control ﬂow. Reducibility is the case in practice where all control-ﬂow loops have exactly one entry point. We present P-LLVM, a novel, full-featured, intermediate representation for vectorizers that provides a semantics for the code region at every stage of the vectorization pipeline. Partial control-ﬂow linearization is a novel partial if-conversion scheme, an essential technique to vectorize divergent control ﬂow. Diﬀerent to prior techniques, partial linearization has linear running time, does not insert additional branches or blocks and gives proved guarantees on the control ﬂow retained. Divergence of control induces value divergence at join points in the control-ﬂow graph (CFG). We present a novel control-divergence analysis for directed acyclic graphs with optimal running time and prove that it is correct and precise under common static assumptions. We extend this technique to obtain a quadratic-time, control-divergence analysis for arbitrary reducible CFGs. For this analysis, we show on a range of realistic examples how earlier approaches are either less precise or incorrect. We present a feature-complete divergence analysis for P-LLVM programs. The analysis is the ﬁrst to analyze stack-allocated objects in an unstructured control setting. Finally, we generalize single-dimensional vectorization of outer loops to multi-dimensional tensorization of loop nests. SIMD targets beneﬁt from tensorization through more opportunities for re-use of loaded values and more eﬃcient memory access behavior. The techniques were implemented in the Region Vectorizer (RV) for vectorization and TensorRV for loop-nest tensorization. Our evaluation validates that the general-purpose RV vectorization system matches the performance of more specialized approaches. RV performs on par with the ISPC compiler, which only supports its structured domain-speciﬁc language, on a range of tree traversal codes with complex control ﬂow. RV is able to outperform the loop vectorizers of state-of-the-art compilers, as we show for the SPEC2017 nab_s benchmark and the XSBench proxy application.Mit dem Ausreizen des Dennard Scalings erreichen die gewohnten Zuwächse in der skalaren Rechenleistung zusehends ihr Ende. Moderne Prozessoren setzen verstärkt auf parallele Berechnung, um den Rechendurchsatz zu erhöhen. Hierbei spielen SIMD Instruktionen (Single Instruction Multiple Data), die eine Operation gleichzeitig auf mehrere Eingaben anwenden, eine zentrale Rolle. Eine fundamentale Technik, um SIMD Programmcode zu erzeugen, ist der Einsatz datenparalleler Vektorisierung. Diese unterliegt populären Verfahren, wie der Vektorisierung äußerer Schleifen, der Vektorisierung gesamter Funktionen bis hin zu explizit datenparallelen Programmiersprachen. Der Beitrag der vorliegenden Arbeit besteht darin, ein zuverlässiges Vektorisierungssystem für datenparallelen Code mit reduziblem Steuerﬂuss zu entwickeln. Diese Anforderung ist für alle Steuerﬂussgraphen erfüllt, deren Schleifen nur einen Eingang haben, was in der Praxis der Fall ist. Wir präsentieren P-LLVM, eine ausdrucksstarke Zwischendarstellung für Vektorisierer, welche dem Programm in jedem Stadium der Transformation von datenparallelem Code zu SIMD Code eine deﬁnierte Semantik verleiht. Partielle Steuerﬂuss-Linearisierung ist ein neuer Algorithmus zur If-Conversion, welcher Sprünge erhalten kann. Anders als existierende Verfahren hat Partielle Linearisierung eine lineare Laufzeit und fügt keine neuen Sprünge oder Blöcke ein. Wir zeigen Kriterien, unter denen der Algorithmus Steuerﬂuss erhält, und beweisen diese. Steuerﬂussdivergenz induziert Divergenz an Punkten zusammenﬂießenden Steuerﬂusses. Wir stellen eine neue Steuerﬂussdivergenzanalyse für azyklische Graphen mit optimaler Laufzeit vor und beweisen deren Korrektheit und Präzision. Wir verallgemeinern die Technik zu einem Algorithmus mit quadratischer Laufzeit für beliebiege, reduzible Steuerﬂussgraphen. Eine Studie auf realistischen Beispielgraphen zeigt, dass vergleichbare Techniken entweder weniger präsize sind oder falsche Ergebnisse liefern. Ebenfalls präsentieren wir eine Divergenzanalyse für P-LLVM Programme. Diese Analyse ist die erste Divergenzanalyse, welche Divergenz in stapelallokierten Objekten unter unstrukturiertem Steuerﬂuss analysiert. Schließlich generalisieren wir die eindimensionale Vektorisierung von äußeren Schleifen zur multidimensionalen Tensorisierung von Schleifennestern. Tensorisierung eröﬀnet für SIMD Prozessoren mehr Möglichkeiten, bereits geladene Werte wiederzuverwenden und das Speicherzugriﬀsverhalten des Programms zu optimieren, als dies mit Vektorisierung der Fall ist. Die vorgestellten Techniken wurden in den Region Vectorizer (RV) für Vektorisierung und TensorRV für die Tensorisierung von Schleifennestern implementiert. Wir zeigen auf einer Reihe von steuerﬂusslastigen Programmen für die Traversierung von Baumdatenstrukturen, dass RV das gleiche Niveau erreicht wie der ISPC Compiler, welcher nur seine strukturierte Eingabesprache verarbeiten kann. RV kann schnellere SIMD-Programme erzeugen als die Schleifenvektorisierer in aktuellen Industriecompilern. Dies demonstrieren wir mit dem nab_s benchmark aus der SPEC2017 Benchmarksuite und der XSBench Proxy-Anwendung

Universaar

Acronym

Symbolic execution of verification languages and floating-point code

Author: Liew Daniel Simon
Publication venue: Computing, Imperial College London
Publication date: 01/05/2018
Field of study

The focus of this thesis is a program analysis technique named symbolic execution. We present three main contributions to this field. First, an investigation into comparing several state-of-the-art program analysis tools at the level of an intermediate verification language over a large set of benchmarks, and improvements to the state-of-the-art of symbolic execution for this language. This is explored via a new tool, Symbooglix, that operates on the Boogie intermediate verification language. Second, an investigation into performing symbolic execution of floating-point programs via a standardised theory of floating-point arithmetic that is supported by several existing constraint solvers. This is investigated via two independent extensions of the KLEE symbolic execution engine to support reasoning about floating-point operations (with one tool developed by the thesis author). Third, an investigation into the use of coverage-guided fuzzing as a means for solving constraints over finite data types, inspired by the difficulties associated with solving floating-point constraints. The associated prototype tool, JFS, which builds on the LibFuzzer project, can at present be applied to a wide range of SMT queries over bit-vector and floating-point variables, and shows promise on floating-point constraints.Open Acces

Spiral - Imperial College Digital Repository

SoK: Fully Homomorphic Encryption Accelerators

Author: Chen Kai
Cheng Xiaodian
Hu Jinbin
Liu Ximeng
Yang Liu
Zhang Junxue
Publication venue
Publication date: 30/01/2024
Field of study

Fully Homomorphic Encryption~(FHE) is a key technology enabling privacy-preserving computing. However, the fundamental challenge of FHE is its inefficiency, due primarily to the underlying polynomial computations with high computation complexity and extremely time-consuming ciphertext maintenance operations. To tackle this challenge, various FHE accelerators have recently been proposed by both research and industrial communities. This paper takes the first initiative to conduct a systematic study on the 14 FHE accelerators -- cuHE/cuFHE, nuFHE, HEAT, HEAX, HEXL, HEXL-FPGA, 100

\times

, F1, CraterLake, BTS, ARK, Poseidon, FAB and TensorFHE. We first make our observations on the evolution trajectory of these existing FHE accelerators to establish a qualitative connection between them. Then, we perform testbed evaluations of representative open-source FHE accelerators to provide a quantitative comparison on them. Finally, with the insights learned from both qualitative and quantitative studies, we discuss potential directions to inform the future design and implementation for FHE accelerators

arXiv.org e-Print Archive

Towards A Practical High-Assurance Systems Programming Language

Author: Chen Zilin
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation. Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code. To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process

UNSWorks

Recommended from our members

Greybox Fuzzing and Its Applications

Author: Rong Yuyang
Publication venue: eScholarship, University of California
Publication date: 01/01/2024
Field of study

Reliable software is vital to society. Much effort has been spent to ensure the robustness and reliability of the software, including unit testing, model checking, static analysis, etc. However, these approaches do not scale well.Greybox fuzzing can test the software with little or no human intervention. A greybox fuzzer utilizes a mutator to automatically generate inputs to test the program. Unlike a random input generator, greybox fuzzer also monitors the program behavior to determine if the generated input triggers a new behavior. Inputs that trigger new behaviors are saved for future mutation. This monitoring is simple yet effective in practice. As a result, much work have focused on different parts of the fuzzer to improve its overall performance and applications. Despite its popularity, some aspects of greybox fuzzing and its applications have not been thoroughly studied. In this thesis, we cover three aspects of greybox fuzzing. First, many fuzzers aim to increase branch coverage. However, high branch coverage is only a sufficient condition for triggering bugs. We revisit some designs of the fuzzing process to increase the likelihood of finding bugs. We first design a tool called Integrity. Integrity sanitizes integer operations within the program, which are harder to spot compared with memory errors. Integrity has discovered eight new integer errors in open-source programs. While randomized fuzzers excel at increasing branch coverage, they struggle with solving predicates set by Integrity. To trigger bugs more effectively, we propose a deterministic fuzzer Valkyrie. Valkyrie uses principled approaches, such as gradient descent and compressed branch coverage, to eliminate the randomness in fuzzers while increasing throughput. Our evaluation shows that Valkyrie can find bugs faster than the state-of-the-art in many cases.Second, generic fuzzing is often less effective than specialized fuzzing. By incorporating expert knowledge into the fuzzer, a specialized fuzzer can reach deeply nested code more quickly. We select the LLVM backend as a test bed to see if a specialized strategy can find bugs in compilers. We develop IRFuzzer with a tailored mutation and monitoring method customized for the LLVM backend. We model LLVM intermediate representation (IR) so that IRFuzzer guarantees to generate valid input for the LLVM backend. IRFuzzer monitors matcher table coverage to track “behavior” in a more fine-grained manner with little overhead. IRFuzzer has found 78 new bugs in upstream LLVM, with 57 of them fixed, five of which have been backport to LLVM 15. These findings demonstrate that specialized fuzzing provides useful, actionable insights to LLVM developers. Finally, fuzzers generate large quantities of inputs as a byproduct, which are often discarded after the fuzzing process is completed. These inputs trigger different behaviors of the program. We notice that these behaviors can be vital for training large language models (LLMs). With this observation, we propose using source code coupled with their test cases for LLM training, where each test case is composed of a fuzzer-generated input and its corresponding output. We first build a dataset on top of an existing one by pairing test cases. Then, we develop methods to fine-tune a trained model and pretrain a new model on this dataset. With this new training scheme, wecontribute a new code understanding model, FuzzPretrain. Our evaluation shows that FuzzPretrain yielded more than 6%/19% mean average precision (mAP) improvements on code search over its baseline trained with only source code or abstract syntax trees (AST), respectively

eScholarship - University of California

Verifying Information Flow Control Libraries

Author: Vassena Marco
Publication venue
Publication date: 01/01/2019
Field of study

Information Flow Control (IFC) is a principled approach to protecting the confidentiality and integrity of data in software systems. Intuitively, IFC sys- tems associate data with security labels that track and restrict flows of information throughout a program in order to enforce security. Most IFC techniques require developers to use specific programming languages and tools that require substantial efforts to develop or to adopt. To avoid redundant work and lower the threshold for adopting secure languages, IFC has been embedded in general-purpose languages through software libraries that promote security-by-construction with their API.This thesis makes several contributions to state-of-the-art static (MAC) and dynamic IFC libraries (LIO) in three areas: expressive power, theoretical IFC foundations and protection against covert channels. Firstly, the thesis gives a functor algebraic structure to sensitive data, in a way that it can be processed through classic functional programming patterns that do not incur in security checks. Then, it establishes the formal security guarantees of MAC, using the standard proof technique of term erasure, enriched with two-steps erasure, a novel idea that simplifies reasoning about advanced programming features, such as exceptions, mutable references and concurrency. Secondly, the thesis demonstrates that the lightweight, but coarse-grained, enforcement of dynamic IFC libraries (e.g., LIO) can be as precise and permissive as the fine-grained, but heavyweight, approach of fully-fledged IFC languages. Lastly, the thesis contributes to the design of secure runtime systems that protect IFC libraries, and IFC languages as well, against internal- and external-timing covert channels that leak information through certain runtime system resources and features, such as lazy evaluation and parallelism.The results of this thesis are supported with extensive machine-checked proof scripts, consisting of 12,000 lines of code developed in the Agda proof assistant

Chalmers Research