224 research outputs found
Instruction-Level Abstraction (ILA): A Uniform Specification for System-on-Chip (SoC) Verification
Modern Systems-on-Chip (SoC) designs are increasingly heterogeneous and
contain specialized semi-programmable accelerators in addition to programmable
processors. In contrast to the pre-accelerator era, when the ISA played an
important role in verification by enabling a clean separation of concerns
between software and hardware, verification of these "accelerator-rich" SoCs
presents new challenges. From the perspective of hardware designers, there is a
lack of a common framework for the formal functional specification of
accelerator behavior. From the perspective of software developers, there exists
no unified framework for reasoning about software/hardware interactions of
programs that interact with accelerators. This paper addresses these challenges
by providing a formal specification and high-level abstraction for accelerator
functional behavior. It formalizes the concept of an Instruction Level
Abstraction (ILA), developed informally in our previous work, and shows its
application in modeling and verification of accelerators. This formal ILA
extends the familiar notion of instructions to accelerators and provides a
uniform, modular, and hierarchical abstraction for modeling software-visible
behavior of both accelerators and programmable processors. We demonstrate the
applicability of the ILA through several case studies of accelerators (for
image processing, machine learning, and cryptography), and a general-purpose
processor (RISC-V). We show how the ILA model facilitates equivalence checking
between two ILAs, and between an ILA and its hardware finite-state machine
(FSM) implementation. Further, this equivalence checking supports accelerator
upgrades using the notion of ILA compatibility, similar to processor upgrades
using ISA compatibility.Comment: 24 pages, 3 figures, 3 table
Modular Verification of Interrupt-Driven Software
Interrupts have been widely used in safety-critical computer systems to
handle outside stimuli and interact with the hardware, but reasoning about
interrupt-driven software remains a difficult task. Although a number of static
verification techniques have been proposed for interrupt-driven software, they
often rely on constructing a monolithic verification model. Furthermore, they
do not precisely capture the complete execution semantics of interrupts such as
nested invocations of interrupt handlers. To overcome these limitations, we
propose an abstract interpretation framework for static verification of
interrupt-driven software that first analyzes each interrupt handler in
isolation as if it were a sequential program, and then propagates the result to
other interrupt handlers. This iterative process continues until results from
all interrupt handlers reach a fixed point. Since our method never constructs
the global model, it avoids the up-front blowup in model construction that
hampers existing, non-modular, verification techniques. We have evaluated our
method on 35 interrupt-driven applications with a total of 22,541 lines of
code. Our results show the method is able to quickly and more accurately
analyze the behavior of interrupts.Comment: preprint of the ASE 2017 pape
Cyber-security for embedded systems: methodologies, techniques and tools
L'abstract è presente nell'allegato / the abstract is in the attachmen
Fast and Clean: Auditable high-performance assembly via constraint solving
Handwritten assembly is a widely used tool in the development of high-performance cryptography: By providing full control over instruction selection, instruction scheduling, and register allocation, highest performance can be unlocked. On the flip side, developing handwritten assembly is not only time-consuming, but the artifacts produced also tend to be difficult to review and maintain – threatening their suitability for use in practice.
In this work, we present SLOTHY (Super (Lazy) Optimization of Tricky Handwritten assemblY), a framework for the automated superoptimization of assembly with respect to instruction scheduling, register allocation, and loop optimization (software pipelining): With SLOTHY, the developer controls and focuses on algorithm and instruction selection, providing a readable “base” implementation in assembly, while SLOTHY automatically finds optimal and traceable instruction scheduling and register allocation strategies with respect to a model of the target (micro)architecture.
We demonstrate the flexibility of SLOTHY by instantiating it with models of the Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72 microarchitectures, implementing the Armv8.1-M+Helium and AArch64+Neon architectures. We use the resulting tools to optimize three workloads: First, for Cortex-M55 and Cortex-M85, a radix-4 complex Fast Fourier Transform (FFT) in fixed-point and floating-point arithmetic, fundamental in Digital Signal Processing. Second, on Cortex-M55, Cortex-M85, Cortex-A55 and Cortex-A72, the instances of the Number Theoretic Transform (NTT) underlying CRYSTALS-Kyber and CRYSTALS-Dilithium, two recently announced winners of the NIST Post-Quantum Cryptography standardization project. Third, for Cortex-A55, the scalar multiplication for the elliptic curve key exchange X25519. The SLOTHY-optimized code matches or beats the performance of prior art in all cases, while maintaining compactness and readability
Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study
Many multithreaded programs employ concurrent data types to safely share data among threads. However, highly-concurrent algorithms for even seemingly simple data types are difficult to implement correctly, especially when considering the relaxed memory ordering models commonly employed by today’s multiprocessors. The formal verification of such implementations is challenging as well because the high degree of concurrency leads to a large number of possible executions. In this case study, we develop a SAT-based bounded verification method and apply it to a representative example, a well-known two-lock concurrent queue algorithm. We first formulate a correctness criterion that specifically targets failures caused by concurrency; it demands that all concurrent executions be observationally equivalent to some serial execution. Next, we define a relaxed memory model that conservatively approximates several common shared-memory multiprocessors. Using commit point specifications, a suite of finite symbolic tests, a prototype encoder, and a standard SAT solver, we successfully identify two failures of a naive implementation that can be observed only under relaxed memory models. We eliminate these failures by inserting appropriate memory ordering fences into the code. The experiments confirm that our approach provides a valuable aid for desigining and implementing concurrent data types
Thread-Modular Static Analysis for Relaxed Memory Models
We propose a memory-model-aware static program analysis method for accurately
analyzing the behavior of concurrent software running on processors with weak
consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of
our method is a unified framework for deciding the feasibility of inter-thread
interferences to avoid propagating spurious data flows during static analysis
and thus boost the performance of the static analyzer. We formulate the
checking of interference feasibility as a set of Datalog rules which are both
efficiently solvable and general enough to capture a range of hardware-level
memory models. Compared to existing techniques, our method can significantly
reduce the number of bogus alarms as well as unsound proofs. We implemented the
method and evaluated it on a large set of multithreaded C programs. Our
experiments showthe method significantly outperforms state-of-the-art
techniques in terms of accuracy with only moderate run-time overhead.Comment: revised version of the ESEC/FSE 2017 pape
Towards Automated Detection of Single-Trace Side-Channel Vulnerabilities in Constant-Time Cryptographic Code
Although cryptographic algorithms may be mathematically secure, it is often
possible to leak secret information from the implementation of the algorithms.
Timing and power side-channel vulnerabilities are some of the most widely
considered threats to cryptographic algorithm implementations. Timing
vulnerabilities may be easier to detect and exploit, and all high-quality
cryptographic code today should be written in constant-time style. However,
this does not prevent power side-channels from existing. With constant time
code, potential attackers can resort to power side-channel attacks to try
leaking secrets. Detecting potential power side-channel vulnerabilities is a
tedious task, as it requires analyzing code at the assembly level and needs
reasoning about which instructions could be leaking information based on their
operands and their values. To help make the process of detecting potential
power side-channel vulnerabilities easier for cryptographers, this work
presents Pascal: Power Analysis Side Channel Attack Locator, a tool that
introduces novel symbolic register analysis techniques for binary analysis of
constant-time cryptographic algorithms, and verifies locations of potential
power side-channel vulnerabilities with high precision. Pascal is evaluated on
a number of implementations of post-quantum cryptographic algorithms, and it is
able to find dozens of previously reported single-trace power side-channel
vulnerabilities in these algorithms, all in an automated manner
On static execution-time analysis
Proving timeliness is an integral part of the verification of safety-critical real-time systems. To this end, timing analysis computes upper bounds on the execution times of programs that execute on a given hardware platform. Modern hardware platforms commonly exhibit counter-intuitive timing behaviour: a locally slower execution can lead to a faster overall execution. Such behaviour challenges efficient timing analysis. In this work, we present and discuss a hardware design, the strictly in-order pipeline, that behaves monotonically w.r.t. the progress of a program's execution. Based on monotonicity, we prove the absence of the aforementioned counter-intuitive behaviour. At least since multi-core processors have emerged, timing analysis separates concerns by analysing different aspects of the system's timing behaviour individually. In this work, we validate the underlying assumption that a timing bound can be soundly composed from individual contributions. We show that even simple processors exhibit counter-intuitive behaviour - a locally slow execution can lead to an even slower overall execution - that impedes the soundness of the composition. We present the compositional base bound analysis that accounts for any such amplifying effects within its timing contribution. This enables a sound compositional analysis even for complex processors. Furthermore, we discuss hardware modifications that enable efficient compositional analyses.Echtzeitsysteme müssen unter allen Umständen beweisbar pünktlich arbeiten. Zum Beweis errechnet die Zeitanalyse obere Schranken der für die Ausführung von Programmen auf einer Hardware-Plattform benötigten Zeit. Moderne Hardware-Plattformen sind bekannt für unerwartetes Zeitverhalten bei dem eine lokale Verzögerung in einer global schnelleren Ausführung resultiert. Solches Zeitverhalten erschwert eine effiziente Analyse. Im Rahmen dieser Arbeit diskutieren wir das Design eines Prozessors mit eingeschränkter Fließbandverarbeitung (strictly in-order pipeline), der sich bzgl. des Fortschritts einer Programmausführung monoton verhält. Wir beweisen, dass Monotonie das oben genannte unerwartete Zeitverhalten verhindert. Spätestens seit dem Einsatz von Mehrkernprozessoren besteht die Zeitanalyse aus einzelnen Teilanalysen welche nur bestimmte Aspekte des Zeitverhaltens betrachten. Eine zentrale Annahme ist hierbei, dass sich die Teilergebnisse zu einer korrekten Zeitschranke zusammensetzen lassen. Im Rahmen dieser Arbeit zeigen wir, dass diese Annahme selbst für einfache Prozessoren ungültig ist, da eine lokale Verzögerung zu einer noch größeren globalen Verzögerung führen kann. Für bestehende Prozessoren entwickeln wir eine neuartige Teilanalyse, die solche verstärkenden Effekte berücksichtigt und somit eine korrekte Komposition von Teilergebnissen erlaubt. Für zukünftige Prozessoren beschreiben wir Modifikationen, die eine deutlich effizientere Zeitanalyse ermöglichen
Experiments on Model-Based Software Energy Consumption Analysis Involving Sorting Algorithms
Although energy has become an important aspect in software development, little support exists for creating energy-efficient programs. One reason for that is the lack of abstractions and tools to enable the analysis of relevant properties involving energy consumption. This paper presents the results of some experiments involving the gathering, modelling, and analysis of energy-related information, in particular, the costs of executing certain parts of a software. We combine some existing free and open-source tools to carry out the experiments, extending one of them to handle energy information. Our experiments consider a comparison of energy consumption of Java implementations of the Bubble Sort, Insertion Sort and Selection Sort algorithms using different data structures. We show how to combine an energy measurement tool and a model analysis tool to carry such a comparison. Based on this support and on our experiments, we believe this is a first step to allow developers to start creating more energy-efficient software
- …