965 research outputs found

    Processor Microarchitecture Security

    Get PDF
    As computer systems grow more and more complicated, various optimizations can unintentionally introduce security vulnerabilities in these systems. The vulnerabilities can lead to user information and data being compromised or stolen. In particular, the ending of both Moore\u27s law and Dennard scaling motivate the design of more exotic microarchitectural optimizations to extract more performance -- further exacerbating the security vulnerabilities. The performance optimizations often focus on sharing or re-using of hardware components within a processor, between different users or programs. Because of the sharing of the hardware, unintentional information leakage channels, through the shared components, can be created. Microarchitectural attacks, such as the high-profile Spectre and Meltdown attacks or the cache covert channels that they leverage, have demonstrated major vulnerabilities of modern computer architectures due to the microarchitectural~optimizations. Key components of processor microarchitectures are processor caches used for achieving high memory bandwidth and low latency for frequently accessed data. With frequently accessed data being brought and stored in caches, memory latency can be significantly reduced when data is fetched from the cache, as opposed to being fetched from the main memory. With limited processor chip area, however, the cache size cannot be very large. Thus, modern processors adopt a cache hierarchy with multiple levels of caches, where the cache close to processor is faster but smaller, and the cache far from processor is slower but larger. This leads to a fundamental property of modern processors: {\em the latency of accessing data in different cache levels and in main memory is different}. As a result, the timing of memory operations when fetching data from different cache levels, e.g., the timing of fetching data from closest-to-processor L1 cache vs. from main memory, can reveal secret-dependent information if attacker is able to observe the timing of these accesses and correlate them to the operation of the victim\u27s code. Further, due to limited size of the caches, memory accesses by a victim may displace attacker\u27s data from the cache, and with knowledge, or reverse-engineering, of the cache architecture, the attacker can learn some information about victim\u27s data based on the modifications to the state of the cache -- which can be observed by the timing~measurements. Caches are not only structures in the processor that can suffer from security vulnerabilities. As an essential mechanism to achieving high performance, cache-like structures are used pervasively in various processor components, such as the translation lookaside buffer (TLB) and processor frontend. Consequently, the vulnerabilities due to timing differences of accessing data in caches or cache-like structures affect many components of the~processor. The main goal of this dissertation is the {\em design of high performance and secure computer architectures}. Since the sophisticated hardware components such as caches, TLBs, value predictors, and processor frontend are critical to ensure high performance, realizing this goal requires developing fundamental techniques to guarantee security in the presence of timing differences of different processor operations. Furthermore, effective defence mechanisms can be only developed after developing a formal and systematic understanding of all the possible attacks that timing side-channels can lead to. To realize the research goals, the main main contributions of this dissertation~are: \begin{itemize}[noitemsep] \item Design and evaluation of a novel three-step cache timing model to understand theoretical vulnerabilities in caches \item Development of a benchmark suite that can test if processor caches or secure cache designs are vulnerable to certain theoretical vulnerabilities. \item Development of a timing vulnerability model to test TLBs and design of hardware defenses for the TLBs to address newly found vulnerabilities. \item Analysis of value predictor attacks and design of defenses for value predictors. \item Evaluation of vulnerabilities in processor frontends based on timing differences in the operation of the frontends. \item Development of a design-time security verification framework for secure processor architectures, using information flow tracking methods. \end{itemize} \newpage This dissertation combines the theoretical modeling and practical benchmarking analysis to help evaluate susceptibility of different architectures and microarchitectures to timing attacks on caches, TLBs, value predictors and processor frontend. Although cache timing side-channel attacks have been studied for more than a decade, there is no evidence that the previously-known attacks exhaustively cover all possible attacks. One of the initial research directions covered by this dissertation was to develop a model for cache timing attacks, which can help lead towards discovering all possible cache timing attacks. The proposed three-step cache timing vulnerability model provides a means to enumerate all possible interactions between the victim and attacker who are sharing a cache-like structure, producing the complete set of theoretical timing vulnerabilities. This dissertation also covers new theoretical cache timing attacks that are unknown prior to being found by the model. To make the advances in security not only theoretical, this dissertation also covers design of a benchmarking suite that runs on commodity processors and helps evaluate their cache\u27s susceptibility to attacks, as well as can run on simulators to test potential or future cache designs. As the dissertation later demonstrates, the three-step timing vulnerability model can be naturally applied to any cache-like structures such as TLBs, and the dissertation encompasses a three-step model for TLBs, uncovering of theoretical new TLB attacks, and proposals for defenses. Building on success of analyzing caches and TLBs for new timing attacks, this dissertation then discusses follow-on research on evaluation and uncovering of new timing vulnerabilities in processor frontends. Since security analysis should be applied not just to existing processor microarchitectural features, the dissertation further analyzes possible future features such as value predictors. Although not currently in use, value predictors are actively being researched and proposed for addition into future microarchitectures. This dissertation shows, however, that they are vulnerable to attacks. Lastly, based on findings of the security issues with existing and proposed processor features, this dissertation explores how to better design secure processors from ground up, and presents a design-time security verification framework for secure processor architectures, using information flow tracking methods

    Affordable techniques for dependable microprocessor design

    Get PDF
    As high computing power is available at an affordable cost, we rely on microprocessor-based systems for much greater variety of applications. This dependence indicates that a processor failure could have more diverse impacts on our daily lives. Therefore, dependability is becoming an increasingly important quality measure of microprocessors.;Temporary hardware malfunctions caused by unstable environmental conditions can lead the processor to an incorrect state. This is referred to as a transient error or soft error. Studies have shown that soft errors are the major source of system failures. This dissertation characterizes the soft error behavior on microprocessors and presents new microarchitectural approaches that can realize high dependability with low overhead.;Our fault injection studies using RISC processors have demonstrated that different functional blocks of the processor have distinct susceptibilities to soft errors. The error susceptibility information must be reflected in devising fault tolerance schemes for cost-sensitive applications. Considering the common use of on-chip caches in modern processors, we investigated area-efficient protection schemes for memory arrays. The idea of caching redundant information was exploited to optimize resource utilization for increased dependability. We also developed a mechanism to verify the integrity of data transfer from lower level memories to the primary caches. The results of this study show that by exploiting bus idle cycles and the information redundancy, an almost complete check for the initial memory data transfer is possible without incurring a performance penalty.;For protecting the processor\u27s control logic, which usually remains unprotected, we propose a low-cost reliability enhancement strategy. We classified control logic signals into static and dynamic control depending on their changeability, and applied various techniques including commit-time checking, signature caching, component-level duplication, and control flow monitoring. Our schemes can achieve more than 99% coverage with a very small hardware addition. Finally, a virtual duplex architecture for superscalar processors is presented. In this system-level approach, the processor pipeline is backed up by a partially replicated pipeline. The replication-based checker minimizes the design and verification overheads. For a large-scale superscalar processor, the proposed architecture can bring 61.4% reduction in die area while sustaining the maximum performance

    When a Patch is Not Enough - HardFails: Software-Exploitable Hardware Bugs

    Full text link
    In this paper, we take a deep dive into microarchitectural security from a hardware designer's perspective by reviewing the existing approaches to detect hardware vulnerabilities during the design phase. We show that a protection gap currently exists in practice that leaves chip designs vulnerable to software-based attacks. In particular, existing verification approaches fail to detect specific classes of vulnerabilities, which we call HardFails: these bugs evade detection by current verification techniques while being exploitable from software. We demonstrate such vulnerabilities in real-world SoCs using RISC-V to showcase and analyze concrete instantiations of HardFails. Patching these hardware bugs may not always be possible and can potentially result in a product recall. We base our findings on two extensive case studies: the recent Hack@DAC 2018 hardware security competition, where 54 independent teams of researchers competed world-wide over a period of 12 weeks to catch inserted security bugs in SoC RTL designs, and an in-depth systematic evaluation of state-of-the-art verification approaches. Our findings indicate that even combinations of techniques will miss high-impact bugs due to the large number of modules with complex interdependencies and fundamental limitations of current detection approaches. We also craft a real-world software attack that exploits one of the RTL bugs from Hack@DAC that evaded detection and discuss novel approaches to mitigate the growing problem of cross-layer bugs at design time

    An Effective Verification Solution for Modern Microprocessors.

    Full text link
    Over the past four decades microprocessors have come to be a vital and inseparable part of the modern world, becoming the digital brain of numerous electronic devices and gadgets that make today's lifestyle possible. Processors are capable of performing computation at astonishingly high speeds and are extremely integrated, occupying only a few square centimeters of silicon die. However, this computational power comes at a price: the task of verifying a modern microprocessor and guaranteeing correctness of its operation is increasingly challenging, even for most established processor vendors. Always attempting to deliver higher performance to end-users, processor manufacturers are forced to design progressively more complex circuits and employ immense verification teams to eliminate critical design bugs in a timely manner. Unfortunately, too often size doesn't seem to matter in verification, as schedules continue to slip and microprocessors find their way to the marketplace with design errors. This work describes a novel verification framework targeting specifically today's complex microprocessors. The scope of the work spans many levels of verification and different phases of the processor life-cycle, from validation of individual sub-modules to complete multi-core system, and from pre-silicon design verification to in-the-field hardware patching. In particular, our StressTest and MCjammer approaches enable efficient generation of high-quality tests at the pre-silicon level for individual cores and multi-core systems, respectively, using machine learning techniques and making the process as automatic as possible. On the other hand, Reversi and Dacota enable low cost validation in post-silicon, while delivering even higher coverage than pre-silicon techniques. Finally, the Field-repairable control logic (FRCL) and Caspar techniques allow designers to patch different classes of escaped errors in processors that are deployed in the field. The integrated set of solutions that we introduce with this thesis empowers processor vendors to drastically shorten their development timeline and, at the same time, to deliver more reliable and correct systems to their customers at a lower cost. Altogether, this work has the potential to solve the long-standing challenge of guaranteeing the complete functional correctness of modern microprocessors.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61656/1/ivagner_1.pd

    Energy analysis and optimisation techniques for automatically synthesised coprocessors

    Get PDF
    The primary outcome of this research project is the development of a methodology enabling fast automated early-stage power and energy analysis of configurable processors for system-on-chip platforms. Such capability is essential to the process of selecting energy efficient processors during design-space exploration, when potential savings are highest. This has been achieved by developing dynamic and static energy consumption models for the constituent blocks within the processors. Several optimisations have been identified, specifically targeting the most significant blocks in terms of energy consumption. Instruction encoding mechanism reduces both the energy and area requirements of the instruction cache; modifications to the multiplier unit reduce energy consumption during inactive cycles. Both techniques are demonstrated to offer substantial energy savings. The aforementioned techniques have undergone detailed evaluation and, based on the positive outcomes obtained, have been incorporated into Cascade, a system-on-chip coprocessor synthesis tool developed by Critical Blue, to provide automated analysis and optimisation of processor energy requirements. This thesis details the process of identifying and examining each method, along with the results obtained. Finally, a case study demonstrates the benefits of the developed functionality, from the perspective of someone using Cascade to automate the creation of an energy-efficient configurable processor for system-on-chip platforms

    Real-time trace decoding and monitoring for safety and security in embedded systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core

    Real-Time Trace Decoding and Monitoring for Safety and Security in Embedded Systems

    Get PDF
    Integrated circuits and systems can be found almost everywhere in today’s world. As their use increases, they need to be made safer and more perfor mant to meet current demands in processing power. FPGA integrated SoCs can provide the ideal trade-off between performance, adaptability, and energy usage. One of today’s vital challenges lies in updating existing fault tolerance techniques for these new systems while utilizing all available processing capa bilities, such as multi-core and heterogeneous processing units. Control-flow monitoring is one of the primary mechanisms described for error detection at the software architectural level for the highest grade of hazard level clas sifications (e.g., ASIL D) described in industry safety standards ISO-26262. Control-flow errors are also known to compose the majority of detected errors for ICs and embedded systems in safety-critical and risk-susceptible environ ments [5]. Software-based monitoring methods remain the most popular [6–8]. However, recent studies show that the overheads they impose make actual reliability gains negligible [9, 10]. This work proposes and demonstrates a new control flow checking method implemented in FPGA for multi-core embedded systems called control-flow trace checker (CFTC). CFTC uses existing trace and debug subsystems of modern processors to rebuild their execution states. It can iden tify any errors in real-time by comparing executed states to a set of permitted state transitions determined statically. This novel implementation weighs hardware resource trade-offs to target mul tiple independent tasks in multi-core embedded applications, as well as single core systems. The proposed system is entirely implemented in hardware and isolated from all monitored software components, requiring 2.4% of the target FPGA platform resources to protect an execution unit in its entirety. There fore, it avoids undesired overheads and maintains deterministic error detection latencies, which guarantees reliability improvements without impairing the target software system. Finally, CFTC is evaluated under different software i Resumo fault-injection scenarios, achieving detection rates of 100% of all control-flow errors to wrong destinations and 98% of all injected faults to program binaries. All detection times are further analyzed and precisely described by a model based on the monitor’s resources and speed and the software application’s control-flow structure and binary characteristics.Circuitos integrados estão presentes em quase todos sistemas complexos do mundo moderno. Conforme sua frequência de uso aumenta, eles precisam se tornar mais seguros e performantes para conseguir atender as novas demandas em potência de processamento. Sistemas em Chip integrados com FPGAs conseguem prover o balanço perfeito entre desempenho, adaptabilidade, e uso de energia. Um dos maiores desafios agora é a necessidade de atualizar técnicas de tolerância à falhas para estes novos sistemas, aproveitando os novos avanços em capacidade de processamento. Monitoramento de fluxo de controle é um dos principais mecanismos para a detecção de erros em nível de software para sistemas classificados como de alto risco (e.g. ASIL D), descrito em padrões de segurança como o ISO-26262. Estes erros são conhecidos por compor a maioria dos erros detectados em sistemas integrados [5]. Embora métodos de monitoramento baseados em software continuem sendo os mais populares [6–8], estudos recentes mostram que seus custos adicionais, em termos de performance e área, diminuem consideravelmente seus ganhos reais em confiabilidade [9, 10]. Propomos aqui um novo método de monitora mento de fluxo de controle implementado em FPGA para sistemas embarcados multi-core. Este método usa subsistemas de trace e execução de código para reconstruir o estado atual do processador, identificando erros através de com parações entre diferentes estados de execução da CPU. Propomos uma implementação que considera trade-offs no uso de recuros de sistema para monitorar múltiplas tarefas independetes. Nossa abordagem suporta o monitoramento de sistemas simples e também de sistemas multi-core multitarefa. Por fim, nossa técnica é totalmente implementada em hardware, evitando o uso de unidades de processamento de software que possa adicionar custos indesejáveis à aplicação em perda de confiabilidade. Propomos, assim, um mecanismo de verificação de fluxo de controle, escalável e extensível, para proteção de sistemas embarcados críticos e multi-core
    • …
    corecore