67 research outputs found
Formal verification of a software countermeasure against instruction skip attacks
Fault attacks against embedded circuits enabled to define many new attack
paths against secure circuits. Every attack path relies on a specific fault
model which defines the type of faults that the attacker can perform. On
embedded processors, a fault model consisting in an assembly instruction skip
can be very useful for an attacker and has been obtained by using several fault
injection means. To avoid this threat, some countermeasure schemes which rely
on temporal redundancy have been proposed. Nevertheless, double fault injection
in a long enough time interval is practical and can bypass those countermeasure
schemes. Some fine-grained countermeasure schemes have also been proposed for
specific instructions. However, to the best of our knowledge, no approach that
enables to secure a generic assembly program in order to make it fault-tolerant
to instruction skip attacks has been formally proven yet. In this paper, we
provide a fault-tolerant replacement sequence for almost all the instructions
of the Thumb-2 instruction set and provide a formal verification for this fault
tolerance. This simple transformation enables to add a reasonably good security
level to an embedded program and makes practical fault injection attacks much
harder to achieve
Experimental evaluation of two software countermeasures against fault attacks
Injection of transient faults can be used as a way to attack embedded
systems. On embedded processors such as microcontrollers, several studies
showed that such a transient fault injection with glitches or electromagnetic
pulses could corrupt either the data loads from the memory or the assembly
instructions executed by the circuit. Some countermeasure schemes which rely on
temporal redundancy have been proposed to handle this issue. Among them,
several schemes add this redundancy at assembly instruction level. In this
paper, we perform a practical evaluation for two of those countermeasure
schemes by using a pulsed electromagnetic fault injection process on a 32-bit
microcontroller. We provide some necessary conditions for an efficient
implementation of those countermeasure schemes in practice. We also evaluate
their efficiency and highlight their limitations. To the best of our knowledge,
no experimental evaluation of the security of such instruction-level
countermeasure schemes has been published yet.Comment: 6 pages, 2014 IEEE International Symposium on Hardware-Oriented
Security and Trust (HOST), Arlington : United States (2014
Automatic Detection of Performance Anomalies in Task-Parallel Programs
To efficiently exploit the resources of new many-core architectures,
integrating dozens or even hundreds of cores per chip, parallel programming
models have evolved to expose massive amounts of parallelism, often in the form
of fine-grained tasks. Task-parallel languages, such as OpenStream, X10,
Habanero Java and C or StarSs, simplify the development of applications for new
architectures, but tuning task-parallel applications remains a major challenge.
Performance bottlenecks can occur at any level of the implementation, from the
algorithmic level (e.g., lack of parallelism or over-synchronization), to
interactions with the operating and runtime systems (e.g., data placement on
NUMA architectures), to inefficient use of the hardware (e.g., frequent cache
misses or misaligned memory accesses); detecting such issues and determining
the exact cause is a difficult task.
In previous work, we developed Aftermath, an interactive tool for trace-based
performance analysis and debugging of task-parallel programs and run-time
systems. In contrast to other trace-based analysis tools, such as Paraver or
Vampir, Aftermath offers native support for tasks, i.e., visualization,
statistics and analysis tools adapted for performance debugging at task
granularity. However, the tool currently does not provide support for the
automatic detection of performance bottlenecks and it is up to the user to
investigate the relevant aspects of program execution by focusing the
inspection on specific slices of a trace file. In this paper, we present
ongoing work on two extensions that guide the user through this process.Comment: Presented at 1st Workshop on Resource Awareness and Adaptivity in
Multi-Core Computing (Racing 2014) (arXiv:1405.2281
Impact of Code Compression on Estimated Worst-Case Execution Times
International audienceCode compression techniques might be useful to meet code size constraints in embedded systems. In the average case, the impact of code compression on the performance is double-edged: on one side, the number of accesses to memory hierarchy is reduced because several instructions are coded in a single word, and this is likely to reduce the execution time; on the other side, the decompression penalty increases the processing time of compressed instructions. Nevertheless, experimental results show that the execution time might be lowered by code compression. In this paper, our goal is to analyze the impact of code compression on the estimated Worst-Case Execution Time of critical tasks that must meet at the same time code size constraints and timing deadlines. Changes in the access patterns to the instruction cache are indeed likely to alter the accuracy of the cache analysis within the process of determining the WCET. Experimental results show that, besides reducing the code size, our code compression scheme also improves the WCET estimates in most of the cases
Fault attacks on two software countermeasures
Short version of the article "Experimental evaluation of two software countermeasures against fault attacks" presented at the 2014 IEEE Symposium on Hardware-Oriented Security and Trust (HOST) in May 2014.International audienceInjection of transient faults can be used as a way to attack embedded systems. On embedded processors such as microcontrollers, several studies showed that such a transient fault injection could corrupt either the data loads from the memory or the assembly instructions executed by the circuit. Some countermeasure schemes which rely on temporal redundancy have been proposed to handle this issue. Among them, several schemes add this redundancy at assembly instruction level. In this paper, we perform a practical evaluation for two of those countermeasure schemes by using a pulsed electromagnetic fault injection process on a 32-bit microcontroller
Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management
International audienceDynamic task-parallel programming models are popular on shared-memory systems, promising enhanced scalability, load balancing and locality. Yet these promises are undermined by non-uniform memory access (NUMA). We show that using NUMA-aware task and data placement, it is possible to preserve the uniform abstraction of both computing and memory resources for task-parallel programming models while achieving high data locality. Our data placement scheme guarantees that all accesses to task output data target the local memory of the accessing core. The complementary task placement heuristic improves the locality of task input data on a best effort basis. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability by eliminating false dependences and enabling fine-grained dynamic control over data placement. The algorithms are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences readily available in the run-time system and placement information from the operating system. We achieve 94% of local memory accesses on a 192-core system with 24 NUMA nodes, up to 5Ă— higher performance than NUMA-aware hierarchical work-stealing, and even 5.6Ă— compared to static interleaved allocation. Finally, we show that state-of-the-art dynamic page migration by the operating system cannot catch up with frequent affinity changes between cores and data and thus fails to accelerate task-parallel applications
Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages
International audienceWe present a joint scheduling and memory allocation algorithm for efficient execution of task-parallel programs on non-uniform memory architecture (NUMA) systems. Task and data placement decisions are based on a static description of the memory hierarchy and on runtime information about intertask communication. Existing locality-aware scheduling strategies for fine-grained tasks have strong limitations: they are specific to some class of machines or applications, they do not handle task dependences, they require manual program annotations, or they rely on fragile profiling schemes. By contrast, our solution makes no assumption on the structure of programs or on the layout of data in memory. Experimental results, based on the OpenStream language, show that locality of accesses to main memory of scientific applications can be increased significantly on a 64-core machine, resulting in a speedup of up to 1.63Ă— compared to a state-of-the-art work-stealing scheduler
MAFIA: Protecting the Microarchitecture of Embedded Systems Against Fault Injection Attacks
Fault injection attacks represent an effective threat to embedded systems. Recently, Laurent et al. have reported that fault injection attacks can leverage faults inside the microarchitecture. However, state-of-the-art counter-measures, hardware-only or with hardware support, do not consider the integrity of microarchitecture control signals that are the target of these faults.
We present MAFIA, a microarchitecture protection against fault injection attacks. MAFIA ensures integrity of pipeline control signals through a signature-based mechanism, and ensures fine-grained control-flow integrity with a complete indirect branch support and code authenticity. We analyse the security properties of two different implementations with different security/overhead trade-offs: one with a CBC-MAC/Prince signature function, and another one with a CRC32. We present our implementation of MAFIA in a RISC-V processor, supported by a dedicated compiler toolchain based on LLVM/Clang. We report a hardware area overhead of 23.8 % and 6.5 % for the CBC-MAC/Prince and CRC32 respectively. The average code size and execution time overheads are 29.4% and 18.4% respectively for the CRC32 implementation and are 50 % and 39 % for the CBC-MAC/Prince
Language-Centric Performance Analysis of OpenMP Programs with Aftermath
International audienceWe present a new set of tools for the language-centric performance analysis and debugging of OpenMP programs that allows programmers to relate dynamic information from parallel execution to OpenMP constructs. Users can visualize execution traces, examine aggregate met-rics on parallel loops and tasks, such as load imbalance or synchronization overhead, and obtain detailed information on specific events, such as the partitioning of a loop's iteration space, its distribution to workers according to the scheduling policy and fine-grain synchronization. Our work is based on the Aftermath performance analysis tool and a ready-to-use, instrumented version of the LLVM/clang OpenMP run-time with negligible overhead for tracing. By analyzing the performance of the MG application of the NPB suite, we show that language-centric performance analysis in general and our tools in particular can help improve the performance of large-scale OpenMP applications significantly
- …