183,219 research outputs found

    MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor

    Get PDF
    This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented

    Racing to hardware-validated simulation

    Get PDF
    Processor simulators rely on detailed timing models of the processor pipeline to evaluate performance. The diversity in real-world processor designs mandates building flexible simulators that expose parts of the underlying model to the user in the form of configurable parameters. Consequently, the accuracy of modeling a real processor relies on both the accuracy of the pipeline model itself, and the accuracy of adjusting the configuration parameters according to the modeled processor. Unfortunately, processor vendors publicly disclose only a subset of their design decisions, raising the probability of introducing specification inaccuracies when modeling these processors. Inaccurately tuning model parameters deviates the simulated processor from the actual one. In the worst case, using improper parameters may lead to imbalanced pipeline models compromising the simulation output. Therefore, simulation models should be hardware-validated before using them for performance evaluation. As processors increase in complexity and diversity, validating a simulator model against real hardware becomes increasingly more challenging and time-consuming. In this work, we propose a methodology for validating simulation models against real hardware. We create a framework that relies on micro-benchmarks to collect performance statistics on real hardware, and machine learning-based algorithms to fine-tune the unknown parameters based on the accumulated statistics. We overhaul the Sniper simulator to support the ARM AArch64 instruction-set architecture (ISA), and introduce two new timing models for ARM-based in-order and out-of-order cores. Using our proposed simulator validation framework, we tune the in-order and out-of-order models to match the performance of a real-world implementation of the Cortex-A53 and Cortex-A72 cores with an average error of 7% and 15%, respectively, across a set of SPEC CPU2017 benchmarks

    Fault- and Yield-Aware On-Chip Memory Design and Management

    Get PDF
    Ever decreasing device size causes more frequent hard faults, which becomes a serious burden to processor design and yield management. This problem is particularly pronounced in the on-chip memory which consumes up to 70% of a processor' s total chip area. Traditional circuit-level techniques, such as redundancy and error correction code, become less effective in error-prevalent environments because of their large area overhead. In this work, we suggest an architectural solution to building reliable on-chip memory in the future processor environment. Our approaches have two parts, a design framework and architectural techniques for on-chip memory structures. Our design framework provides important architectural evaluation metrics such as yield, area, and performance based on low level defects and process variations parameters. Processor architects can quickly evaluate their designs' characteristics in terms of yield, area, and performance. With the framework, we develop architectural yield enhancement solutions for on-chip memory structures including L1 cache, L2 cache and directory memory. Our proposed solutions greatly improve yield with negligible area and performance overhead. Furthermore, we develop a decoupled yield model of compute cores and L2 caches in CMPs, which show that there will be many more L2 caches than compute cores in a chip. We propose efficient utilization techniques for excess caches. Evaluation results show that excess caches significantly improve overall performance of CMPs

    Autonomous satellite orbit determination during the development phases of the global positioning system

    Get PDF
    An onboard navigation system was developed to aid the design and evaluation of algorithms used in autonomous satellite navigation with Global Positioning System (GPS) data. The performance of the algorithms designed for a GPS Receiver/Processor Assembly (R/PA) intended for LANDSAT-D was investigated during the development phases of the GPS (four to six satellites in the constellation). This evaluation emphasized the effects on the orbit determination accuracy of the expected user clock errors, GPS satellite visibility, force model approximations, and state and covariance propagation approximations. Results are presented giving the sensitivity of orbit determination accuracy to these constraints

    Feasibility study for a numerical aerodynamic simulation facility. Volume 1

    Get PDF
    A Numerical Aerodynamic Simulation Facility (NASF) was designed for the simulation of fluid flow around three-dimensional bodies, both in wind tunnel environments and in free space. The application of numerical simulation to this field of endeavor promised to yield economies in aerodynamic and aircraft body designs. A model for a NASF/FMP (Flow Model Processor) ensemble using a possible approach to meeting NASF goals is presented. The computer hardware and software are presented, along with the entire design and performance analysis and evaluation

    Operating system support for overlapping-ISA heterogeneous multi-core architectures

    Full text link
    A heterogeneous processor consists of cores that are asymmetric in performance and functionality. Such a de-sign provides a cost-effective solution for processor man-ufacturers to continuously improve both single-thread per-formance and multi-thread throughput. This design, how-ever, faces significant challenges in the operating system, which traditionally assumes only homogeneous hardware. This paper presents a comprehensive study of OS support for heterogeneous architectures in which cores have asym-metric performance and overlapping, but non-identical in-struction sets. Our algorithms allow applications to trans-parently execute and fairly share different types of cores. We have implemented these algorithms in the Linux 2.6.24 kernel and evaluated them on an actual heterogeneous plat-form. Evaluation results demonstrate that our designs effi-ciently manage heterogeneous hardware and enable signifi-cant performance improvements for a range of applications.

    Processor Microarchitecture Security

    Get PDF
    As computer systems grow more and more complicated, various optimizations can unintentionally introduce security vulnerabilities in these systems. The vulnerabilities can lead to user information and data being compromised or stolen. In particular, the ending of both Moore\u27s law and Dennard scaling motivate the design of more exotic microarchitectural optimizations to extract more performance -- further exacerbating the security vulnerabilities. The performance optimizations often focus on sharing or re-using of hardware components within a processor, between different users or programs. Because of the sharing of the hardware, unintentional information leakage channels, through the shared components, can be created. Microarchitectural attacks, such as the high-profile Spectre and Meltdown attacks or the cache covert channels that they leverage, have demonstrated major vulnerabilities of modern computer architectures due to the microarchitectural~optimizations. Key components of processor microarchitectures are processor caches used for achieving high memory bandwidth and low latency for frequently accessed data. With frequently accessed data being brought and stored in caches, memory latency can be significantly reduced when data is fetched from the cache, as opposed to being fetched from the main memory. With limited processor chip area, however, the cache size cannot be very large. Thus, modern processors adopt a cache hierarchy with multiple levels of caches, where the cache close to processor is faster but smaller, and the cache far from processor is slower but larger. This leads to a fundamental property of modern processors: {\em the latency of accessing data in different cache levels and in main memory is different}. As a result, the timing of memory operations when fetching data from different cache levels, e.g., the timing of fetching data from closest-to-processor L1 cache vs. from main memory, can reveal secret-dependent information if attacker is able to observe the timing of these accesses and correlate them to the operation of the victim\u27s code. Further, due to limited size of the caches, memory accesses by a victim may displace attacker\u27s data from the cache, and with knowledge, or reverse-engineering, of the cache architecture, the attacker can learn some information about victim\u27s data based on the modifications to the state of the cache -- which can be observed by the timing~measurements. Caches are not only structures in the processor that can suffer from security vulnerabilities. As an essential mechanism to achieving high performance, cache-like structures are used pervasively in various processor components, such as the translation lookaside buffer (TLB) and processor frontend. Consequently, the vulnerabilities due to timing differences of accessing data in caches or cache-like structures affect many components of the~processor. The main goal of this dissertation is the {\em design of high performance and secure computer architectures}. Since the sophisticated hardware components such as caches, TLBs, value predictors, and processor frontend are critical to ensure high performance, realizing this goal requires developing fundamental techniques to guarantee security in the presence of timing differences of different processor operations. Furthermore, effective defence mechanisms can be only developed after developing a formal and systematic understanding of all the possible attacks that timing side-channels can lead to. To realize the research goals, the main main contributions of this dissertation~are: \begin{itemize}[noitemsep] \item Design and evaluation of a novel three-step cache timing model to understand theoretical vulnerabilities in caches \item Development of a benchmark suite that can test if processor caches or secure cache designs are vulnerable to certain theoretical vulnerabilities. \item Development of a timing vulnerability model to test TLBs and design of hardware defenses for the TLBs to address newly found vulnerabilities. \item Analysis of value predictor attacks and design of defenses for value predictors. \item Evaluation of vulnerabilities in processor frontends based on timing differences in the operation of the frontends. \item Development of a design-time security verification framework for secure processor architectures, using information flow tracking methods. \end{itemize} \newpage This dissertation combines the theoretical modeling and practical benchmarking analysis to help evaluate susceptibility of different architectures and microarchitectures to timing attacks on caches, TLBs, value predictors and processor frontend. Although cache timing side-channel attacks have been studied for more than a decade, there is no evidence that the previously-known attacks exhaustively cover all possible attacks. One of the initial research directions covered by this dissertation was to develop a model for cache timing attacks, which can help lead towards discovering all possible cache timing attacks. The proposed three-step cache timing vulnerability model provides a means to enumerate all possible interactions between the victim and attacker who are sharing a cache-like structure, producing the complete set of theoretical timing vulnerabilities. This dissertation also covers new theoretical cache timing attacks that are unknown prior to being found by the model. To make the advances in security not only theoretical, this dissertation also covers design of a benchmarking suite that runs on commodity processors and helps evaluate their cache\u27s susceptibility to attacks, as well as can run on simulators to test potential or future cache designs. As the dissertation later demonstrates, the three-step timing vulnerability model can be naturally applied to any cache-like structures such as TLBs, and the dissertation encompasses a three-step model for TLBs, uncovering of theoretical new TLB attacks, and proposals for defenses. Building on success of analyzing caches and TLBs for new timing attacks, this dissertation then discusses follow-on research on evaluation and uncovering of new timing vulnerabilities in processor frontends. Since security analysis should be applied not just to existing processor microarchitectural features, the dissertation further analyzes possible future features such as value predictors. Although not currently in use, value predictors are actively being researched and proposed for addition into future microarchitectures. This dissertation shows, however, that they are vulnerable to attacks. Lastly, based on findings of the security issues with existing and proposed processor features, this dissertation explores how to better design secure processors from ground up, and presents a design-time security verification framework for secure processor architectures, using information flow tracking methods
    corecore