3,472 research outputs found

    A Detailed Analysis of Contemporary ARM and x86 Architectures

    Get PDF
    RISC vs. CISC wars raged in the 1980s when chip area and processor design complexity were the primary constraints and desktops and servers exclusively dominated the computing landscape. Today, energy and power are the primary design constraints and the computing landscape is significantly different: growth in tablets and smartphones running ARM (a RISC ISA) is surpassing that of desktops and laptops running x86 (a CISC ISA). Further, the traditionally low-power ARM ISA is entering the high-performance server market, while the traditionally high-performance x86 ISA is entering the mobile low-power device market. Thus, the question of whether ISA plays an intrinsic role in performance or energy efficiency is becoming important, and we seek to answer this question through a detailed measurement based study on real hardware running real applications. We analyze measurements on the ARM Cortex-A8 and Cortex-A9 and Intel Atom and Sandybridge i7 microprocessors over workloads spanning mobile, desktop, and server computing. Our methodical investigation demonstrates the role of ISA in modern microprocessors? performance and energy efficiency. We find that ARM and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other. The ISA being RISC or CISC seems irrelevant

    Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

    Full text link
    Predicting the number of clock cycles a processor takes to execute a block of assembly instructions in steady state (the throughput) is important for both compiler designers and performance engineers. Building an analytical model to do so is especially complicated in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures in that it is tedious, error prone, and must be performed from scratch for each processor generation. In this paper we present Ithemal, the first tool which learns to predict the throughput of a set of instructions. Ithemal uses a hierarchical LSTM--based approach to predict throughput based on the opcodes and operands of instructions in a basic block. We show that Ithemal is more accurate than state-of-the-art hand-written tools currently used in compiler backends and static machine code analyzers. In particular, our model has less than half the error of state-of-the-art analytical models (LLVM's llvm-mca and Intel's IACA). Ithemal is also able to predict these throughput values just as fast as the aforementioned tools, and is easily ported across a variety of processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML) 201

    Cross-layer system reliability assessment framework for hardware faults

    Get PDF
    System reliability estimation during early design phases facilitates informed decisions for the integration of effective protection mechanisms against different classes of hardware faults. When not all system abstraction layers (technology, circuit, microarchitecture, software) are factored in such an estimation model, the delivered reliability reports must be excessively pessimistic and thus lead to unacceptably expensive, over-designed systems. We propose a scalable, cross-layer methodology and supporting suite of tools for accurate but fast estimations of computing systems reliability. The backbone of the methodology is a component-based Bayesian model, which effectively calculates system reliability based on the masking probabilities of individual hardware and software components considering their complex interactions. Our detailed experimental evaluation for different technologies, microarchitectures, and benchmarks demonstrates that the proposed model delivers very accurate reliability estimations (FIT rates) compared to statistically significant but slow fault injection campaigns at the microarchitecture level.Peer ReviewedPostprint (author's final draft

    Processing Posting Lists Using OpenCL

    Get PDF
    One of the main requirements of internet search engines is the ability to retrieve relevant results with faster response times. Yioop is an open source search engine designed and developed in PHP by Dr. Chris Pollett. The goal of this project is to explore the possibilities of enhancing the performance of Yioop by substituting resource-intensive existing PHP functions with C based native PHP extensions and the parallel data processing technology OpenCL. OpenCL leverages the Graphical Processing Unit (GPU) of a computer system for performance improvements. Some of the critical functions in search engines are resource-intensive in terms of processing power, memory, and I/O usage. The processing times vary based on the complexity and magnitude of data involved. This project involves different phases such as identifying critical resource intensive functions, initially replacing such methods with PHP Extensions, and eventually experimenting with OpenCL code. We also ran performance tests to measure the reduction in processing times. From our results, we concluded that PHP Extensions and OpenCL processing resulted in performance improvements

    Simulation-based Fault Injection with QEMU for Speeding-up Dependability Analysis of Embedded Software

    Get PDF
    Simulation-based fault injection (SFI) represents a valuable solu- tion for early analysis of software dependability and fault tolerance properties before the physical prototype of the target platform is available. Some SFI approaches base the fault injection strategy on cycle-accurate models imple- mented by means of Hardware Description Languages (HDLs). However, cycle- accurate simulation has revealed to be too time-consuming when the objective is to emulate the effect of soft errors on complex microprocessors. To overcome this issue, SFI solutions based on virtual prototypes of the target platform has started to be proposed. However, current approaches still present some draw- backs, like, for example, they work only for specific CPU architectures, or they require code instrumentation, or they have a different target (i.e., design errors instead of dependability analysis). To address these disadvantages, this paper presents an efficient fault injection approach based on QEMU, one of the most efficient and popular instruction-accurate emulator for several microprocessor architectures. As main goal, the proposed approach represents a non intrusive technique for simulating hardware faults affecting CPU behaviours. Perma- nent and transient/intermittent hardware fault models have been abstracted without losing quality for software dependability analysis. The approach mini- mizes the impact of the fault injection procedure in the emulator performance by preserving the original dynamic binary translation mechanism of QEMU. Experimental results for both x86 and ARM processors proving the efficiency and effectiveness of the proposed approach are presented
    corecore