882 research outputs found

    Virtual machine monitors: current technology and future trends

    Full text link

    Dynamic Binary Translation for Embedded Systems with Scratchpad Memory

    Get PDF
    Embedded software development has recently changed with advances in computing. Rather than fully co-designing software and hardware to perform a relatively simple task, nowadays embedded and mobile devices are designed as a platform where multiple applications can be run, new applications can be added, and existing applications can be updated. In this scenario, traditional constraints in embedded systems design (i.e., performance, memory and energy consumption and real-time guarantees) are more difficult to address. New concerns (e.g., security) have become important and increase software complexity as well. In general-purpose systems, Dynamic Binary Translation (DBT) has been used to address these issues with services such as Just-In-Time (JIT) compilation, dynamic optimization, virtualization, power management and code security. In embedded systems, however, DBT is not usually employed due to performance, memory and power overhead. This dissertation presents StrataX, a low-overhead DBT framework for embedded systems. StrataX addresses the challenges faced by DBT in embedded systems using novel techniques. To reduce DBT overhead, StrataX loads code from NAND-Flash storage and translates it into a Scratchpad Memory (SPM), a software-managed on-chip SRAM with limited capacity. SPM has similar access latency as a hardware cache, but consumes less power and chip area. StrataX manages SPM as a software instruction cache, and employs victim compression and pinning to reduce retranslation cost and capture frequently executed code in the SPM. To prevent performance loss due to excessive code expansion, StrataX minimizes the amount of code inserted by DBT to maintain control of program execution. When a hardware instruction cache is available, StrataX dynamically partitions translated code among the SPM and main memory. With these techniques, StrataX has low performance overhead relative to native execution for MiBench programs. Further, it simplifies embedded software and hardware design by operating transparently to applications without any special hardware support. StrataX achieves sufficiently low overhead to make it feasible to use DBT in embedded systems to address important design goals and requirements

    A Survey of Techniques for Architecting TLBs

    Get PDF
    “Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

    Ubiquitous Memory Introspection (Preliminary Manuscript)

    Get PDF
    Modern memory systems play a critical role in the performance ofapplications, but a detailed understanding of the application behaviorin the memory system is not trivial to attain. It requires timeconsuming simulations of the memory hierarchy using long traces, andoften using detailed modeling. It is increasingly possible to accesshardware performance counters to measure events in the memory system,but the measurements remain coarse grained, better suited forperformance summaries than providing instruction level feedback. Theavailability of a low cost, online, and accurate methodology forderiving fine-grained memory behavior profiles can prove extremelyuseful for runtime analysis and optimization of programs.This paper presents a new methodology for Ubiquitous MemoryIntrospection (UMI). It is an online and lightweight mini-simulationmethodology that focuses on simulating short memory access tracesrecorded from frequently executed code regions. The simulations arefast and can provide profiling results at varying granularities, downto that of a single instruction or address. UMI naturally complementsruntime optimizations techniques and enables new opportunities formemory specific optimizations.In this paper, we present a prototype implementation of a runtimesystem implementing UMI. The prototype is readily deployed oncommodity processors, requires no user intervention, and can operatewith stripped binaries and legacy software. The prototype operateswith an average runtime overhead of 20% but this slowdown is only 6%slower than a state of the art binary instrumentation tool. We used32 benchmarks, including the full suite of SPEC2000 benchmarks, forour evaluation. We show that the mini-simulation results accuratelyreflect the cache performance of two existing memory systems, anIntel Pentium~4 and an AMD Athlon MP (K7) processor. We alsodemonstrate that low level profiling information from the onlinesimulation can serve to identify high-miss rate load instructions with a77% rate of accuracy compared to full offline simulations thatrequired days to complete. The online profiling results are used atruntime to implement a simple software prefetching strategy thatachieves a speedup greater than 60% in the best case

    Transparently Mixing Undo Logs and Software Reversibility for State Recovery in Optimistic PDES

    Get PDF
    The rollback operation is a fundamental building block to support the correct execution of a speculative Time Warp-based Parallel Discrete Event Simulation. In the literature, several solutions to reduce the execution cost of this operation have been proposed, either based on the creation of a checkpoint of previous simulation state images, or on the execution of negative copies of simulation events which are able to undo the updates on the state. In this paper, we explore the practical design and implementation of a state recoverability technique which allows to restore a previous simulation state either relying on checkpointing or on the reverse execution of the state updates occurred while processing events in forward mode. Differently from other proposals, we address the issue of executing backward updates in a fully-transparent and event granularity-independent way, by relying on static software instrumentation (targeting the x86 architecture and Linux systems) to generate at runtime reverse update code blocks (not to be confused with reverse events, proper of the reverse computing approach). These are able to undo the effects of a forward execution while minimizing the cost of the undo operation. We also present experimental results related to our implementation, which is released as free software and fully integrated into the open source ROOT-Sim (ROme OpTimistic Simulator) package. The experimental data support the viability and effectiveness of our proposal

    Processor Microarchitecture Security

    Get PDF
    As computer systems grow more and more complicated, various optimizations can unintentionally introduce security vulnerabilities in these systems. The vulnerabilities can lead to user information and data being compromised or stolen. In particular, the ending of both Moore\u27s law and Dennard scaling motivate the design of more exotic microarchitectural optimizations to extract more performance -- further exacerbating the security vulnerabilities. The performance optimizations often focus on sharing or re-using of hardware components within a processor, between different users or programs. Because of the sharing of the hardware, unintentional information leakage channels, through the shared components, can be created. Microarchitectural attacks, such as the high-profile Spectre and Meltdown attacks or the cache covert channels that they leverage, have demonstrated major vulnerabilities of modern computer architectures due to the microarchitectural~optimizations. Key components of processor microarchitectures are processor caches used for achieving high memory bandwidth and low latency for frequently accessed data. With frequently accessed data being brought and stored in caches, memory latency can be significantly reduced when data is fetched from the cache, as opposed to being fetched from the main memory. With limited processor chip area, however, the cache size cannot be very large. Thus, modern processors adopt a cache hierarchy with multiple levels of caches, where the cache close to processor is faster but smaller, and the cache far from processor is slower but larger. This leads to a fundamental property of modern processors: {\em the latency of accessing data in different cache levels and in main memory is different}. As a result, the timing of memory operations when fetching data from different cache levels, e.g., the timing of fetching data from closest-to-processor L1 cache vs. from main memory, can reveal secret-dependent information if attacker is able to observe the timing of these accesses and correlate them to the operation of the victim\u27s code. Further, due to limited size of the caches, memory accesses by a victim may displace attacker\u27s data from the cache, and with knowledge, or reverse-engineering, of the cache architecture, the attacker can learn some information about victim\u27s data based on the modifications to the state of the cache -- which can be observed by the timing~measurements. Caches are not only structures in the processor that can suffer from security vulnerabilities. As an essential mechanism to achieving high performance, cache-like structures are used pervasively in various processor components, such as the translation lookaside buffer (TLB) and processor frontend. Consequently, the vulnerabilities due to timing differences of accessing data in caches or cache-like structures affect many components of the~processor. The main goal of this dissertation is the {\em design of high performance and secure computer architectures}. Since the sophisticated hardware components such as caches, TLBs, value predictors, and processor frontend are critical to ensure high performance, realizing this goal requires developing fundamental techniques to guarantee security in the presence of timing differences of different processor operations. Furthermore, effective defence mechanisms can be only developed after developing a formal and systematic understanding of all the possible attacks that timing side-channels can lead to. To realize the research goals, the main main contributions of this dissertation~are: \begin{itemize}[noitemsep] \item Design and evaluation of a novel three-step cache timing model to understand theoretical vulnerabilities in caches \item Development of a benchmark suite that can test if processor caches or secure cache designs are vulnerable to certain theoretical vulnerabilities. \item Development of a timing vulnerability model to test TLBs and design of hardware defenses for the TLBs to address newly found vulnerabilities. \item Analysis of value predictor attacks and design of defenses for value predictors. \item Evaluation of vulnerabilities in processor frontends based on timing differences in the operation of the frontends. \item Development of a design-time security verification framework for secure processor architectures, using information flow tracking methods. \end{itemize} \newpage This dissertation combines the theoretical modeling and practical benchmarking analysis to help evaluate susceptibility of different architectures and microarchitectures to timing attacks on caches, TLBs, value predictors and processor frontend. Although cache timing side-channel attacks have been studied for more than a decade, there is no evidence that the previously-known attacks exhaustively cover all possible attacks. One of the initial research directions covered by this dissertation was to develop a model for cache timing attacks, which can help lead towards discovering all possible cache timing attacks. The proposed three-step cache timing vulnerability model provides a means to enumerate all possible interactions between the victim and attacker who are sharing a cache-like structure, producing the complete set of theoretical timing vulnerabilities. This dissertation also covers new theoretical cache timing attacks that are unknown prior to being found by the model. To make the advances in security not only theoretical, this dissertation also covers design of a benchmarking suite that runs on commodity processors and helps evaluate their cache\u27s susceptibility to attacks, as well as can run on simulators to test potential or future cache designs. As the dissertation later demonstrates, the three-step timing vulnerability model can be naturally applied to any cache-like structures such as TLBs, and the dissertation encompasses a three-step model for TLBs, uncovering of theoretical new TLB attacks, and proposals for defenses. Building on success of analyzing caches and TLBs for new timing attacks, this dissertation then discusses follow-on research on evaluation and uncovering of new timing vulnerabilities in processor frontends. Since security analysis should be applied not just to existing processor microarchitectural features, the dissertation further analyzes possible future features such as value predictors. Although not currently in use, value predictors are actively being researched and proposed for addition into future microarchitectures. This dissertation shows, however, that they are vulnerable to attacks. Lastly, based on findings of the security issues with existing and proposed processor features, this dissertation explores how to better design secure processors from ground up, and presents a design-time security verification framework for secure processor architectures, using information flow tracking methods

    Cyber-security protection techniques to mitigate memory errors exploitation

    Full text link
    Tesis por compendio[EN] Practical experience in software engineering has demonstrated that the goal of building totally fault-free software systems, although desirable, is impossible to achieve. Therefore, it is necessary to incorporate mitigation techniques in the deployed software, in order to reduce the impact of latent faults. This thesis makes contributions to three memory corruption mitigation techniques: the stack smashing protector (SSP), address space layout randomisation (ASLR) and automatic software diversification. The SSP is a very effective protection technique used against stack buffer overflows, but it is prone to brute force attacks, particularly the dangerous byte-for-byte attack. A novel modification, named RenewSSP, has been proposed which eliminates brute force attacks, can be used in a completely transparent way with existing software and has negligible overheads. There are two different kinds of application for which RenewSSP is especially beneficial: networking servers (tested in Apache) and application launchers (tested on Android). ASLR is a generic concept with multiple designs and implementations. In this thesis, the two most relevant ASLR implementations of Linux have been analysed (Vanilla Linux and PaX patch), and several weaknesses have been found. Taking into account technological improvements in execution support (compilers and libraries), a new ASLR design has been proposed, named ASLR-NG, which maximises entropy, effectively addresses the fragmentation issue and removes a number of identified weaknesses. Furthermore, ASLR-NG is transparent to applications, in that it preserves binary code compatibility and does not add overheads. ASLR-NG has been implemented as a patch to the Linux kernel 4.1. Software diversification is a technique that covers a wide range of faults, including memory errors. The main problem is how to create variants, i.e. programs which have identical behaviours on normal inputs but where faults manifest differently. A novel form of automatic variant generation has been proposed, using multiple cross-compiler suites and processor emulators. One of the main goals of this thesis is to create applicable results. Therefore, I have placed particular emphasis on the development of real prototypes in parallel with the theoretical study. The results of this thesis are directly applicable to real systems; in fact, some of the results have already been included in real-world products.[ES] La creación de software supone uno de los retos más complejos para el ser humano ya que requiere un alto grado de abstracción. Aunque se ha avanzado mucho en las metodologías para la prevención de los fallos software, es patente que el software resultante dista mucho de ser confiable, y debemos asumir que el software que se produce no está libre de fallos. Dada la imposibilidad de diseñar o implementar sistemas libres de fallos, es necesario incorporar técnicas de mitigación de errores para mejorar la seguridad. La presente tesis realiza aportaciones en tres de las principales técnicas de mitigación de errores de corrupción de memoria: Stack Smashing Protector (SSP), Address Space Layout Randomisation (ASLR) y Automatic Software Diversification. SSP es una técnica de protección muy efectiva contra ataques de desbordamiento de buffer en pila, pero es sensible a ataques de fuerza bruta, en particular al peligroso ataque denominado byte-for-byte. Se ha propuesto una novedosa modificación del SSP, llamada RenewSSP, la cual elimina los ataques de fuerza bruta. Puede ser usada de manera completamente transparente con los programas existentes sin introducir sobrecarga. El RenewSSP es especialmente beneficioso en dos áreas de aplicación: Servidores de red (probado en Apache) y lanzadores de aplicaciones eficientes (probado en Android). ASLR es un concepto genérico, del cual hay multitud de diseños e implementaciones. Se han analizado las dos implementaciones más relevantes de Linux (Vanilla Linux y PaX patch), encontrándose en ambas tanto debilidades como elementos mejorables. Teniendo en cuenta las mejoras tecnológicas en el soporte a la ejecución (compiladores y librerías), se ha propuesto un nuevo diseño del ASLR, llamado ASLR-NG, el cual: maximiza la entropía, soluciona el problema de la fragmentación y elimina las debilidades encontradas. Al igual que la solución propuesta para el SSP, la nueva propuesta de ASLR es transparente para las aplicaciones y compatible a nivel binario sin introducir sobrecarga. ASLR-NG ha sido implementado como un parche del núcleo de Linux para la versión 4.1. La diversificación software es una técnica que cubre una amplia gama de fallos, incluidos los errores de memoria. La principal dificultad para aplicar esta técnica radica en la generación de las "variantes", que son programas que tienen un comportamiento idéntico entre ellos ante entradas normales, pero tienen un comportamiento diferenciado en presencia de entradas anormales. Se ha propuesto una novedosa forma de generar variantes de forma automática a partir de un mismo código fuente, empleando la emulación de sistemas. Una de las máximas de esta investigación ha sido la aplicabilidad de los resultados, por lo que se ha hecho especial hincapié en el desarrollo de prototipos sobre sistemas reales a la par que se llevaba a cabo el estudio teórico. Como resultado, las propuestas de esta tesis son directamente aplicables a sistemas reales, algunas de ellas ya están siendo explotadas en la práctica.[CA] La creació de programari suposa un dels reptes més complexos per al ser humà ja que requerix un alt grau d'abstracció. Encara que s'ha avançat molt en les metodologies per a la prevenció de les fallades de programari, és palès que el programari resultant dista molt de ser confiable, i hem d'assumir que el programari que es produïx no està lliure de fallades. Donada la impossibilitat de dissenyar o implementar sistemes lliures de fallades, és necessari incorporar tècniques de mitigació d'errors per a millorar la seguretat. La present tesi realitza aportacions en tres de les principals tècniques de mitigació d'errors de corrupció de memòria: Stack Smashing Protector (SSP), Address Space Layout Randomisation (ASLR) i Automatic Software Diversification. SSP és una tècnica de protecció molt efectiva contra atacs de desbordament de buffer en pila, però és sensible a atacs de força bruta, en particular al perillós atac denominat byte-for-byte. S'ha proposat una nova modificació del SSP, RenewSSP, la qual elimina els atacs de força bruta. Pot ser usada de manera completament transparent amb els programes existents sense introduir sobrecàrrega. El RenewSSP és especialment beneficiós en dos àrees d'aplicació: servidors de xarxa (provat en Apache) i llançadors d'aplicacions eficients (provat en Android). ASLR és un concepte genèric, del qual hi ha multitud de dissenys i implementacions. S'han analitzat les dos implementacions més rellevants de Linux (Vanilla Linux i PaX patch), trobant-se en ambdues tant debilitats com elements millorables. Tenint en compte les millores tecnològiques en el suport a l'execució (compiladors i llibreries), s'ha proposat un nou disseny de l'ASLR: ASLR-NG, el qual, maximitza l'entropia, soluciona el problema de la fragmentació i elimina les debilitats trobades. Igual que la solució proposada per al SSP, la nova proposta d'ASLR és transparent per a les aplicacions i compatible a nivell binari sense introduir sobrecàrrega. ASLR-NG ha sigut implementat com un pedaç del nucli de Linux per a la versió 4.1. La diversificació de programari és una tècnica que cobrix una àmplia gamma de fa\-llades, inclosos els errors de memòria. La principal dificultat per a aplicar esta tècnica radica en la generació de les "variants", que són programes que tenen un comportament idèntic entre ells davant d'entrades normals, però tenen un comportament diferenciat en presència d'entrades anormals. S'ha proposat una nova forma de generar variants de forma automàtica a partir d'un mateix codi font, emprant l'emulació de sistemes. Una de les màximes d'esta investigació ha sigut l'aplicabilitat dels resultats, per la qual cosa s'ha fet especial insistència en el desenrotllament de prototips sobre sistemes reals al mateix temps que es duia a terme l'estudi teòric. Com a resultat, les propostes d'esta tesi són directament aplicables a sistemes reals, algunes d'elles ja estan sent explotades en la pràctica.Marco Gisbert, H. (2015). Cyber-security protection techniques to mitigate memory errors exploitation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/57806TESISCompendi
    corecore