26,974 research outputs found

    Adaptive runtime-assisted block prefetching on chip-multiprocessors

    Get PDF
    Memory stalls are a significant source of performance degradation in modern processors. Data prefetching is a widely adopted and well studied technique used to alleviate this problem. Prefetching can be performed by the hardware, or be initiated and controlled by software. Among software controlled prefetching we find a wide variety of schemes, including runtime-directed prefetching and more specifically runtime-directed block prefetching. This paper proposes a hybrid prefetching mechanism that integrates a software driven block prefetcher with existing hardware prefetching techniques. Our runtime-assisted software prefetcher brings large blocks of data on-chip with the support of a low cost hardware engine, and synergizes with existing hardware prefetchers that manage locality at a finer granularity. The runtime system that drives the prefetch engine dynamically selects which cache to prefetch to. Our evaluation on a set of scientific benchmarks obtains a maximum speed up of 32 and 10 % on average compared to a baseline with hardware prefetching only. As a result, we also achieve a reduction of up to 18 and 3 % on average in energy-to-solution.Peer ReviewedPostprint (author's final draft

    Soft-error resilient on-chip memory structures

    Get PDF
    Soft errors induced by energetic particle strikes in on-chip memory structures, such as L1 data/instruction caches and register files, have become an increasing challenge in designing new generation reliable microprocessors. Due to their transient/random nature, soft errors cannot be captured by traditional verification and testing process due to the irrelevancy to the correctness of the logic. This dissertation is thus focusing on the reliability characterization and cost-effective reliable design of on-chip memories against soft errors. Due to various performance, area/size, and energy constraints in various target systems, many existing unoptimized protection schemes on cache memories may eventually prove significantly inadequate and ineffective. This work develops new lifetime models for data and tag arrays residing in both the data and instruction caches. These models facilitate the characterization of cache vulnerability of the stored items at various lifetime phases. The design methodology is further exemplified by the proposed reliability schemes targeting at specific vulnerable phases. Benchmarking is carried out to showcase the effectiveness of these approaches. The tag array demands high reliability against soft errors while the data array is fully protected in on-chip caches, because of its crucial importance to the correctness of cache accesses. Exploiting the address locality of memory accesses, this work proposes a Tag Replication Buffer (TRB) to protect information integrity of the tag array in the data cache with low performance, energy and area overheads. To provide a comprehensive evaluation of the tag array reliability, this work also proposes a refined evaluation metric, detected-without-replica-TVF (DOR-TVF), which combines the TVF and access-with-replica (AWR) analysis. Based on the DOR-TVF analysis, a TRB scheme with early write-back (TRB-EWB) is proposed, which achieves a zero DOR-TVF at a negligible performance overhead. Recent research, as well as the proposed optimization schemes in this cache vulnerability study, have focused on the design of cost-effective reliable data caches in terms of performance, energy, and area overheads based on the assumption of fixed error rates. However, for systems in operating environments that vary with time or location, those schemes will be either insufficient or over-designed for the changing error rates. This work explores the design of a self-adaptive reliable data cache that dynamically adapts its employed reliability schemes to the changing operating environments in order to maintain a target reliability. The experimental evaluation shows that the self-adaptive data cache achieves similar reliability to a cache protected by the most reliable scheme, while simultaneously minimizing the performance and power overheads. Besides the data/instruction caches, protecting the register file and its data buses is crucial to reliable computing in high-performance microprocessors. Since the register file is in the critical path of the processor pipeline, any reliable design that increases either the pressure on the register file or the register file access latency is not desirable. This work proposes to exploit narrow-width register values, which represent the majority of generated values, for making the duplicates within the same register data item. A detailed architectural vulnerability factor (AVF) analysis shows that this in-register duplication (IRD) scheme significantly reduces the AVF in the register file compared to the conventional design. The experimental evaluation also shows that IRD provides superior read-with-duplicate (RWD) and error detection/recovery rates under heavy error injection as compared to previous reliability schemes, while only incurring a small power overhead. By integrating the proposed reliable designs in data/instruction caches and register files, the vulnerability of the entire microprocessor is dramatically reduced. The new lifetime model, the self-adaptive design and the narrow-width value duplication scheme proposed in this work can also provide guidance to architects toward highly efficient reliable system design

    Novel hybrid extraction systems for fetal heart rate variability monitoring based on non-invasive fetal electrocardiogram

    Get PDF
    This study focuses on the design, implementation and subsequent verification of a new type of hybrid extraction system for noninvasive fetal electrocardiogram (NI-fECG) processing. The system designed combines the advantages of individual adaptive and non-adaptive algorithms. The pilot study reviews two innovative hybrid systems called ICA-ANFIS-WT and ICA-RLS-WT. This is a combination of independent component analysis (ICA), adaptive neuro-fuzzy inference system (ANFIS) algorithm or recursive least squares (RLS) algorithm and wavelet transform (WT) algorithm. The study was conducted on clinical practice data (extended ADFECGDB database and Physionet Challenge 2013 database) from the perspective of non-invasive fetal heart rate variability monitoring based on the determination of the overall probability of correct detection (ACC), sensitivity (SE), positive predictive value (PPV) and harmonic mean between SE and PPV (F1). System functionality was verified against a relevant reference obtained by an invasive way using a scalp electrode (ADFECGDB database), or relevant reference obtained by annotations (Physionet Challenge 2013 database). The study showed that ICA-RLS-WT hybrid system achieve better results than ICA-ANFIS-WT. During experiment on ADFECGDB database, the ICA-RLS-WT hybrid system reached ACC > 80 % on 9 recordings out of 12 and the ICA-ANFIS-WT hybrid system reached ACC > 80 % only on 6 recordings out of 12. During experiment on Physionet Challenge 2013 database the ICA-RLS-WT hybrid system reached ACC > 80 % on 13 recordings out of 25 and the ICA-ANFIS-WT hybrid system reached ACC > 80 % only on 7 recordings out of 25. Both hybrid systems achieve provably better results than the individual algorithms tested in previous studies.Web of Science713178413175

    Efficient caching algorithms for memory management in computer systems

    Get PDF
    As disk performance continues to lag behind that of memory systems and processors, fully utilizing memory to reduce disk accesses is a highly effective effort to improve the entire system performance. Furthermore, to serve the applications running on a computer in distributed systems, not only the local memory but also the memory on remote servers must be effectively managed to minimize I/O operations. The critical challenges in an effective memory cache management include: (1) Insightfully understanding and quantifying the locality inherent in the memory access requests; (2) Effectively utilizing the locality information in replacement algorithms; (3) Intelligently placing and replacing data in the multi-level caches of a distributed system; (4) Ensuring that the overheads of the proposed schemes are acceptable.;This dissertation provides solutions and makes unique and novel contributions in application locality quantification, general replacement algorithms, low-cost replacement policy, thrashing protection, as well as multi-level cache management in a distributed system. First, the dissertation proposes a new method to quantify locality strength, and accurately to identify the data with strong locality. It also provides a new replacement algorithm, which significantly outperforms existing algorithms. Second, considering the extremely low-cost requirements on replacement policies in virtual memory management, the dissertation proposes a policy meeting the requirements, and considerably exceeding the performance existing policies. Third, the dissertation provides an effective scheme to protect the system from thrashing for running memory-intensive applications. Finally, the dissertation provides a multi-level block placement and replacement protocol in a distributed client-server environment, exploiting non-uniform locality strengths in the I/O access requests.;The methodology used in this study include careful application behavior characterization, system requirement analysis, algorithm designs, trace-driven simulation, and system implementations. A main conclusion of the work is that there is still much room for innovation and significant performance improvement for the seemingly mature and stable policies that have been broadly used in the current operating system design

    Dependable reconfigurable multi-sensor poles for security

    Get PDF
    Wireless sensor network poles for security monitoring under harsh environments require a very high dependability as they are safety-critical [1]. An example of a multi-sensor pole is shown. Crucial attribute in these systems for security, especially in harsh environment, is a high robustness and guaranteed availability during lifetime. This environment could include molest. In this paper, two approaches are used which are applied simultaneously but are developed in different projects. \u

    A two-level structure for advanced space power system automation

    Get PDF
    The tasks to be carried out during the three-year project period are: (1) performing extensive simulation using existing mathematical models to build a specific knowledge base of the operating characteristics of space power systems; (2) carrying out the necessary basic research on hierarchical control structures, real-time quantitative algorithms, and decision-theoretic procedures; (3) developing a two-level automation scheme for fault detection and diagnosis, maintenance and restoration scheduling, and load management; and (4) testing and demonstration. The outlines of the proposed system structure that served as a master plan for this project, work accomplished, concluding remarks, and ideas for future work are also addressed

    Energy-efficient wireless communication

    Get PDF
    In this chapter we present an energy-efficient highly adaptive network interface architecture and a novel data link layer protocol for wireless networks that provides Quality of Service (QoS) support for diverse traffic types. Due to the dynamic nature of wireless networks, adaptations in bandwidth scheduling and error control are necessary to achieve energy efficiency and an acceptable quality of service. In our approach we apply adaptability through all layers of the protocol stack, and provide feedback to the applications. In this way the applications can adapt the data streams, and the network protocols can adapt the communication parameters

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
    corecore