78 research outputs found

    Software-Based Self-Test of Set-Associative Cache Memories

    Get PDF
    Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

    Proximity coherence for chip-multiprocessors

    Get PDF
    Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesigning coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design – Proximity Coherence – a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.EPSRC DTA research scholarshi

    On the design of architecture-aware algorithms for emerging applications

    Get PDF
    This dissertation maps various kernels and applications to a spectrum of programming models and architectures and also presents architecture-aware algorithms for different systems. The kernels and applications discussed in this dissertation have widely varying computational characteristics. For example, we consider both dense numerical computations and sparse graph algorithms. This dissertation also covers emerging applications from image processing, complex network analysis, and computational biology. We map these problems to diverse multicore processors and manycore accelerators. We also use new programming models (such as Transactional Memory, MapReduce, and Intel TBB) to address the performance and productivity challenges in the problems. Our experiences highlight the importance of mapping applications to appropriate programming models and architectures. We also find several limitations of current system software and architectures and directions to improve those. The discussion focuses on system software and architectural support for nested irregular parallelism, Transactional Memory, and hybrid data transfer mechanisms. We believe that the complexity of parallel programming can be significantly reduced via collaborative efforts among researchers and practitioners from different domains. This dissertation participates in the efforts by providing benchmarks and suggestions to improve system software and architectures.Ph.D.Committee Chair: Bader, David; Committee Member: Hong, Bo; Committee Member: Riley, George; Committee Member: Vuduc, Richard; Committee Member: Wills, Scot

    Table Driven Adaptive Effectively Heterogeneous Multi-core Architectures

    Get PDF
    Exploiting flexibilities and scope of multi-core architectures for performance enhancement is one of the highly used approaches used by many researchers. However, with increasing dynamic nature of the workloads of everyday computing, even general multi-core architectures seem to just touch an upper limit on the deliverable performance. This has paved way for meticulous consideration of heterogeneous multi-core architectures. Such architectures can be further enhanced, by making the heterogeneity of the cores dynamic in nature. s work proposes techniques, which change configurations of these cores dynamically with workload. Thus, depending on requirements and pre-programmed preferences, each core can arrange itself to be power optimized or speed optimized. In addition, the project has been designed using RTL (Verilog) to provide completely realistic grip on the silicon investment. The project can be simulated using SPEC2000 and SPEC2006 benchmarks, and is completely synthesizable using IBM_LPE library for 65nm (IBM65LPE).School of Electrical & Computer Engineerin

    A fault-tolerant last level cache for CMPs operating at ultra-low voltage

    Get PDF
    Voltage scaling to values near the threshold voltage is a promising technique to hold off the many-core power wall. However, as voltage decreases, some SRAM cells are unable to operate reliably and show a behavior consistent with a hard fault. Block disabling is a micro-architectural technique that allows low-voltage operation by deactivating faulty cache entries, at the expense of reducing the effective cache capacity. In the case of the last-level cache, this capacity reduction leads to an increase in off-chip memory accesses, diminishing the overall energy benefit of reducing the voltage supply. In this work, we exploit the reuse locality and the intrinsic redundancy of multi-level inclusive hierarchies to enhance the performance of block disabling with negligible cost. The proposed fault-aware last-level cache management policy maps critical blocks, those not present in private caches and with a higher probability of being reused, to active cache entries. Our evaluation shows that this fault-aware management results in up to 37.3% and 54.2% fewer misses per kilo instruction (MPKI) than block disabling for multiprogrammed and parallel workloads, respectively. This translates to performance enhancements of up to 13% and 34.6% for multiprogrammed and parallel workloads, respectively.Peer ReviewedPostprint (author's final draft

    Quality and Reliability of Elastomer Sockets

    Get PDF
    Integrated Circuit (IC) sockets provide hundreds to thousands of electrical interconnects in enterprise servers, where quality and reliability are critical for customer applications. The evaluation of IC sockets, according to current industry practices, relies on the execution of stress loads and stress levels that are defined by standards, having no consideration to the physics of failure (PoF), target operating environment, or contact resistance behavior over time. In a similar manner, monitoring of contact resistance during system operation has no considerations to the PoF or environmental conditions. In this dissertation a physics of failure approach was developed to model the reliability of elastomer sockets that are used in an enterprise server application. The temperature and relative humidity environment, at the IC socket contact interface, were characterized as a function of external environmental conditions and microprocessor activity. The study applied state-of-the-art health monitoring techniques to assess thermal gradients on the IC socket assembly and to establish an operating profile that could be used for the development of a PoF model. A methodology was developed for modeling and monitoring contact resistance of electrical interconnects. The technique combined a PoF model with the Sequential Probability Ratio Test (SPRT). In the methodology the resistance behavior is characterized as a function of temperature. The effective a-spot radius was extracted from the characterization data and modeled with a power-law. A PoF model was developed to estimate the resistance of an elastomer contact, based on the effective a-spot radius and the ambient temperature. The methodology was experimentally demonstrated with a temperature cycle test of the elastomer socket. During the evaluation the difference between estimated and observed resistance values were tested with the SPRT. The technique was shown to be very accurate for modeling contact resistance and to be highly sensitive for the detection of resistance degradation. A qualitative reliability model was developed for the mean contact resistance of an elastomer socket, using fundamental material properties and user defined failure criteria. To derive the model, the resistance behavior of contacts under nominal mechanical load was studied as a function of time and temperature. The elastomer contact was shown to have a very complex resistance behavior, which was modeled by multiple statistical distributions. It was shown that elastomer sockets, in spite of experiencing stress relaxation at the macroscale (elastomer), can exhibit decreases in contact resistance, a result of stress redistribution at the microscale (Ag particles), which increases Ag-Ag particle stress and the effective contact area

    Hardware synthesis from high-level scenario specifications

    Get PDF
    PhD ThesisThe behaviour of many systems can be partitioned into scenarios. These facilitate engineers’ understanding of the specifications, and can be composed into efficient implementations via a form of high-level synthesis. In this work, we focus on highly concurrent systems, whose scenarios are typically described using concurrency models such as partial orders, Petri nets and data-flow structures. In this thesis, we study different aspects of hardware synthesis from high-level scenario specifications. We propose new formal models to simplify the specification of concurrent systems, and algorithms for hardware synthesis and verification of the scenario-based models of such systems. We also propose solutions for mapping scenariobased systems on silicon and evaluate their efficiency. Our experiments show that the proposed approaches improve the design of concurrent systems. The new formalisms can break down complex specifications into significantly simpler scenarios automatically, and can be used to fully model the dataflow of operations of reconfigurable event-driven systems. The proposed heuristics for mapping the scenarios of a system to a digital circuit supports encoding constraints, unlike existing methods, and can cope with specifications comprising hundreds of scenarios at the cost of only 5% of area overhead compared to exact algorithms. These experiments are driven by three case studies: (1) hardware synthesis of control architectures, e.g. microprocessor control units; (2) acceleration of the ordinal pattern encoding, i.e. an algorithm for detecting repetitive patterns within data streams; (3) and acceleration of computational drug discovery, i.e. computation of shortest paths in large protein-interaction networks. Our findings are employed to design two prototypes, which have a practical value for the considered case studies. The ordinal pattern encoding accelerator is asynchronous, highly resilient to unstable voltage supply, and designed to perform a range of computations via runtime reconfiguration. The drug discovery accelerator is synchronous, and up to three orders of magnitude faster than conventional software implementations

    Cross-Layer Approaches for an Aging-Aware Design of Nanoscale Microprocessors

    Get PDF
    Thanks to aggressive scaling of transistor dimensions, computers have revolutionized our life. However, the increasing unreliability of devices fabricated in nanoscale technologies emerged as a major threat for the future success of computers. In particular, accelerated transistor aging is of great importance, as it reduces the lifetime of digital systems. This thesis addresses this challenge by proposing new methods to model, analyze and mitigate aging at microarchitecture-level and above
    corecore