882 research outputs found
Dynamic Binary Translation for Embedded Systems with Scratchpad Memory
Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well.
In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead.
This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area.
StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements
A Survey of Techniques for Architecting TLBs
“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used
in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently
and a TLB miss is extremely costly, prudent management of TLB is important for improving performance
and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and
managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and
distinctions. We believe that this paper will be useful for chip designers, computer architects and system
engineers
Ubiquitous Memory Introspection (Preliminary Manuscript)
Modern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performance counters to measure events in the memory system,but the measurements remain coarse grained, better suited forperformance summaries than providing instruction level feedback. Theavailability of a low cost, online, and accurate methodology forderiving fine-grained memory behavior profiles can prove extremelyuseful for runtime analysis and optimization of programs.This paper presents a new methodology for Ubiquitous MemoryIntrospection (UMI). It is an online and lightweight mini-simulationmethodology that focuses on simulating short memory access tracesrecorded from frequently executed code regions. The simulations arefast and can provide profiling results at varying granularities, downto that of a single instruction or address. UMI naturally complementsruntime optimizations techniques and enables new opportunities formemory specific optimizations.In this paper, we present a prototype implementation of a runtimesystem implementing UMI. The prototype is readily deployed oncommodity processors, requires no user intervention, and can operatewith stripped binaries and legacy software. The prototype operateswith an average runtime overhead of 20% but this slowdown is only 6%slower than a state of the art binary instrumentation tool. We used32 benchmarks, including the full suite of SPEC2000 benchmarks, forour evaluation. We show that the mini-simulation results accuratelyreflect the cache performance of two existing memory systems, anIntel Pentium~4 and an AMD Athlon MP (K7) processor. We alsodemonstrate that low level profiling information from the onlinesimulation can serve to identify high-miss rate load instructions with a77% rate of accuracy compared to full offline simulations thatrequired days to complete. The online profiling results are used atruntime to implement a simple software prefetching strategy thatachieves a speedup greater than 60% in the best case
Transparently Mixing Undo Logs and Software Reversibility for State Recovery in Optimistic PDES
The rollback operation is a fundamental building block to support the correct execution of a speculative Time Warp-based Parallel Discrete Event Simulation. In the literature, several solutions to reduce the execution cost of this operation have been proposed, either based on the creation of a checkpoint of previous simulation state images, or on the execution of negative copies of simulation events which are able to undo the updates on the state. In this paper, we explore the practical design and implementation of a state recoverability technique which allows to restore a previous simulation state either relying on checkpointing or on the reverse execution of the state updates occurred while processing events in forward mode. Differently from other proposals, we address the issue of executing backward updates in a fully-transparent and event granularity-independent way, by relying on static software instrumentation (targeting the x86 architecture and Linux systems) to generate at runtime reverse update code blocks (not to be confused with reverse events, proper of the reverse computing approach). These are able to undo the effects of a forward execution while minimizing the cost of the undo operation. We also present experimental results related to our implementation, which is released as free software and fully integrated into the open source ROOT-Sim (ROme OpTimistic Simulator) package. The experimental data support the viability and effectiveness of our proposal
Processor Microarchitecture Security
As computer systems grow more and more complicated, various optimizations can unintentionally introduce security vulnerabilities in these systems. The vulnerabilities can lead to user information and data being compromised or stolen. In particular, the ending of both Moore\u27s law and Dennard scaling motivate the design of more exotic microarchitectural optimizations to extract more performance -- further exacerbating the security vulnerabilities. The performance optimizations often focus on sharing or re-using of hardware components within a processor, between different users or programs. Because of the sharing of the hardware, unintentional information leakage channels, through the shared components, can be created. Microarchitectural attacks, such as the high-profile Spectre and Meltdown attacks or the cache covert channels that they leverage, have demonstrated major vulnerabilities of modern computer architectures due to the microarchitectural~optimizations. Key components of processor microarchitectures are processor caches used for achieving high memory bandwidth and low latency for frequently accessed data. With frequently accessed data being brought and stored in caches, memory latency can be significantly reduced when data is fetched from the cache, as opposed to being fetched from the main memory. With limited processor chip area, however, the cache size cannot be very large. Thus, modern processors adopt a cache hierarchy with multiple levels of caches, where the cache close to processor is faster but smaller, and the cache far from processor is slower but larger. This leads to a fundamental property of modern processors: {\em the latency of accessing data in different cache levels and in main memory is different}. As a result, the timing of memory operations when fetching data from different cache levels, e.g., the timing of fetching data from closest-to-processor L1 cache vs. from main memory, can reveal secret-dependent information if attacker is able to observe the timing of these accesses and correlate them to the operation of the victim\u27s code. Further, due to limited size of the caches, memory accesses by a victim may displace attacker\u27s data from the cache, and with knowledge, or reverse-engineering, of the cache architecture, the attacker can learn some information about victim\u27s data based on the modifications to the state of the cache -- which can be observed by the timing~measurements. Caches are not only structures in the processor that can suffer from security vulnerabilities. As an essential mechanism to achieving high performance, cache-like structures are used pervasively in various processor components, such as the translation lookaside buffer (TLB) and processor frontend. Consequently, the vulnerabilities due to timing differences of accessing data in caches or cache-like structures affect many components of the~processor. The main goal of this dissertation is the {\em design of high performance and secure computer architectures}. Since the sophisticated hardware components such as caches, TLBs, value predictors, and processor frontend are critical to ensure high performance, realizing this goal requires developing fundamental techniques to guarantee security in the presence of timing differences of different processor operations. Furthermore, effective defence mechanisms can be only developed after developing a formal and systematic understanding of all the possible attacks that timing side-channels can lead to. To realize the research goals, the main main contributions of this dissertation~are: \begin{itemize}[noitemsep] \item Design and evaluation of a novel three-step cache timing model to understand theoretical vulnerabilities in caches \item Development of a benchmark suite that can test if processor caches or secure cache designs are vulnerable to certain theoretical vulnerabilities. \item Development of a timing vulnerability model to test TLBs and design of hardware defenses for the TLBs to address newly found vulnerabilities. \item Analysis of value predictor attacks and design of defenses for value predictors. \item Evaluation of vulnerabilities in processor frontends based on timing differences in the operation of the frontends. \item Development of a design-time security verification framework for secure processor architectures, using information flow tracking methods. \end{itemize} \newpage This dissertation combines the theoretical modeling and practical benchmarking analysis to help evaluate susceptibility of different architectures and microarchitectures to timing attacks on caches, TLBs, value predictors and processor frontend. Although cache timing side-channel attacks have been studied for more than a decade, there is no evidence that the previously-known attacks exhaustively cover all possible attacks. One of the initial research directions covered by this dissertation was to develop a model for cache timing attacks, which can help lead towards discovering all possible cache timing attacks. The proposed three-step cache timing vulnerability model provides a means to enumerate all possible interactions between the victim and attacker who are sharing a cache-like structure, producing the complete set of theoretical timing vulnerabilities. This dissertation also covers new theoretical cache timing attacks that are unknown prior to being found by the model. To make the advances in security not only theoretical, this dissertation also covers design of a benchmarking suite that runs on commodity processors and helps evaluate their cache\u27s susceptibility to attacks, as well as can run on simulators to test potential or future cache designs. As the dissertation later demonstrates, the three-step timing vulnerability model can be naturally applied to any cache-like structures such as TLBs, and the dissertation encompasses a three-step model for TLBs, uncovering of theoretical new TLB attacks, and proposals for defenses. Building on success of analyzing caches and TLBs for new timing attacks, this dissertation then discusses follow-on research on evaluation and uncovering of new timing vulnerabilities in processor frontends. Since security analysis should be applied not just to existing processor microarchitectural features, the dissertation further analyzes possible future features such as value predictors. Although not currently in use, value predictors are actively being researched and proposed for addition into future microarchitectures. This dissertation shows, however, that they are vulnerable to attacks. Lastly, based on findings of the security issues with existing and proposed processor features, this dissertation explores how to better design secure processors from ground up, and presents a design-time security verification framework for secure processor architectures, using information flow tracking methods
Cyber-security protection techniques to mitigate memory errors exploitation
Tesis por compendio[EN] Practical experience in software engineering has demonstrated that the goal of
building totally fault-free software systems, although desirable, is impossible
to achieve. Therefore, it is necessary to incorporate mitigation techniques in
the deployed software, in order to reduce the impact of latent faults.
This thesis makes contributions to three memory corruption mitigation
techniques: the stack smashing protector (SSP), address space layout
randomisation (ASLR) and automatic software diversification.
The SSP is a very effective protection technique used against stack buffer
overflows, but it is prone to brute force attacks, particularly the dangerous
byte-for-byte attack. A novel modification, named RenewSSP, has been proposed
which eliminates brute force attacks, can be used in a completely transparent
way with existing software and has negligible overheads. There are two
different kinds of application for which RenewSSP is especially beneficial:
networking servers (tested in Apache) and application launchers (tested on
Android).
ASLR is a generic concept with multiple designs and implementations. In this
thesis, the two most relevant ASLR implementations of Linux have been analysed
(Vanilla Linux and PaX patch), and several weaknesses have been found. Taking
into account technological improvements in execution support (compilers and
libraries), a new ASLR design has been proposed, named ASLR-NG, which
maximises entropy, effectively addresses the fragmentation issue and removes a
number of identified weaknesses. Furthermore, ASLR-NG is transparent to
applications, in that it preserves binary code compatibility and does not add
overheads. ASLR-NG has been implemented as a patch to the Linux kernel 4.1.
Software diversification is a technique that covers a wide range of faults,
including memory errors. The main problem is how to create variants,
i.e. programs which have identical behaviours on normal inputs but
where faults manifest differently. A novel form of automatic variant
generation has been proposed, using multiple cross-compiler suites and
processor emulators.
One of the main goals of this thesis is to create applicable results.
Therefore, I have placed particular emphasis on the development of real
prototypes in parallel with the theoretical study. The results of this thesis
are directly applicable to real systems; in fact, some of the results have
already been included in real-world products.[ES] La creación de software supone uno de los retos más complejos para el
ser humano ya que requiere un alto grado de abstracción. Aunque se ha
avanzado mucho en las metodologías para la prevención de los fallos
software, es patente que el software resultante dista mucho de ser
confiable, y debemos asumir que el software que se produce no está
libre de fallos. Dada la imposibilidad de diseñar o implementar
sistemas libres de fallos, es necesario incorporar técnicas de
mitigación de errores para mejorar la seguridad.
La presente tesis realiza aportaciones en tres de las principales
técnicas de mitigación de errores de corrupción de memoria: Stack
Smashing Protector (SSP), Address Space Layout Randomisation (ASLR) y
Automatic Software Diversification.
SSP es una técnica de protección muy efectiva contra
ataques de desbordamiento de buffer en pila, pero es sensible a ataques de
fuerza bruta, en particular al peligroso ataque denominado byte-for-byte.
Se ha propuesto una novedosa modificación del SSP, llamada RenewSSP,
la cual elimina los ataques de fuerza bruta. Puede ser usada
de manera completamente transparente con los programas existentes sin
introducir sobrecarga. El RenewSSP es especialmente beneficioso en dos áreas de
aplicación: Servidores de red (probado en Apache) y
lanzadores de aplicaciones eficientes (probado en Android).
ASLR es un concepto genérico, del cual hay multitud de diseños e
implementaciones. Se han analizado las dos implementaciones más
relevantes de Linux (Vanilla Linux y PaX patch), encontrándose en
ambas tanto debilidades como elementos mejorables. Teniendo en cuenta
las mejoras tecnológicas en el soporte a la ejecución (compiladores y
librerías), se ha propuesto un nuevo diseño del ASLR, llamado
ASLR-NG, el cual: maximiza la entropía, soluciona el problema de la
fragmentación y elimina las debilidades encontradas. Al igual que la
solución propuesta para el SSP, la nueva propuesta de ASLR es
transparente para las aplicaciones y compatible a nivel
binario sin introducir sobrecarga. ASLR-NG ha sido implementado como
un parche del núcleo de Linux para la versión 4.1.
La diversificación software es una técnica que cubre una amplia gama
de fallos, incluidos los errores de memoria. La principal dificultad
para aplicar esta técnica radica en la generación de las
"variantes", que son programas que tienen un comportamiento idéntico
entre ellos ante entradas normales, pero tienen un comportamiento
diferenciado en presencia de entradas anormales. Se ha propuesto una
novedosa forma de generar variantes de forma automática a partir de un
mismo código fuente, empleando la emulación de sistemas.
Una de las máximas de esta investigación ha sido la aplicabilidad de
los resultados, por lo que se ha hecho especial hincapié en el
desarrollo de prototipos sobre sistemas reales a la par que se llevaba
a cabo el estudio teórico. Como resultado, las propuestas de esta
tesis son directamente aplicables a sistemas reales, algunas de ellas
ya están siendo explotadas en la práctica.[CA] La creació de programari suposa un dels reptes més complexos per al ser humà ja
que requerix un alt grau d'abstracció. Encara que s'ha avançat molt en les
metodologies per a la prevenció de les fallades de programari, és palès que el
programari resultant dista molt de ser confiable, i hem d'assumir que el
programari que es produïx no està lliure de fallades. Donada la impossibilitat
de dissenyar o implementar sistemes lliures de fallades, és necessari
incorporar tècniques de mitigació d'errors per a millorar la seguretat.
La present tesi realitza aportacions en tres de les principals tècniques de
mitigació d'errors de corrupció de memòria: Stack Smashing Protector (SSP),
Address Space Layout Randomisation (ASLR) i Automatic Software
Diversification.
SSP és una tècnica de protecció molt efectiva contra atacs de desbordament de
buffer en pila, però és sensible a atacs de força bruta, en particular al
perillós atac denominat byte-for-byte.
S'ha proposat una nova modificació del SSP, RenewSSP, la qual elimina els atacs
de força bruta. Pot ser usada de manera completament transparent amb els
programes existents sense introduir sobrecàrrega. El RenewSSP és especialment
beneficiós en dos àrees d'aplicació: servidors de xarxa (provat en Apache) i
llançadors d'aplicacions eficients (provat en Android).
ASLR és un concepte genèric, del qual hi ha multitud de dissenys i
implementacions. S'han analitzat les dos implementacions més rellevants de
Linux (Vanilla Linux i PaX patch), trobant-se en ambdues tant debilitats com
elements millorables. Tenint en compte les millores tecnològiques en el suport
a l'execució (compiladors i llibreries), s'ha proposat un nou disseny de
l'ASLR: ASLR-NG, el qual, maximitza l'entropia, soluciona el problema de
la fragmentació i elimina les debilitats trobades. Igual que la solució
proposada per al SSP, la nova proposta d'ASLR és transparent per a les
aplicacions i compatible a nivell binari sense introduir sobrecàrrega. ASLR-NG
ha sigut implementat com un pedaç del nucli de Linux per a la versió 4.1.
La diversificació de programari és una tècnica que cobrix una àmplia gamma de
fa\-llades, inclosos els errors de memòria. La principal dificultat per a aplicar
esta tècnica radica en la generació de les "variants", que són programes que
tenen un comportament idèntic entre ells davant d'entrades normals, però tenen
un comportament diferenciat en presència d'entrades anormals. S'ha proposat una
nova forma de generar variants de forma automàtica a partir d'un mateix codi
font, emprant l'emulació de sistemes.
Una de les màximes d'esta investigació ha sigut l'aplicabilitat dels resultats,
per la qual cosa s'ha fet especial insistència en el desenrotllament de
prototips sobre sistemes reals al mateix temps que es duia a terme l'estudi
teòric. Com a resultat, les propostes d'esta tesi són directament aplicables
a sistemes reals, algunes d'elles ja estan sent explotades en la pràctica.Marco Gisbert, H. (2015). Cyber-security protection techniques to mitigate memory errors exploitation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/57806TESISCompendi
- …