657 research outputs found

    Approaches to multiprocessor error recovery using an on-chip interconnect subsystem

    Get PDF
    For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment

    Automatic specification of reliability models for fault-tolerant computers

    Get PDF
    The calculation of reliability measures using Markov models is required for life-critical processor-memory-switch structures that have standby redundancy or that are subject to transient or intermittent faults or repair. The task of specifying these models is tedious and prone to human error because of the large number of states and transitions required in any reasonable system. Therefore, model specification is a major analysis bottleneck, and model verification is a major validation problem. The general unfamiliarity of computer architects with Markov modeling techniques further increases the necessity of automating the model specification. Automation requires a general system description language (SDL). For practicality, this SDL should also provide a high level of abstraction and be easy to learn and use. The first attempt to define and implement an SDL with those characteristics is presented. A program named Automated Reliability Modeling (ARM) was constructed as a research vehicle. The ARM program uses a graphical interface as its SDL, and it outputs a Markov reliability model specification formulated for direct use by programs that generate and evaluate the model

    GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation

    Full text link
    In this paper, we describe the architecture and performance of the GRAPE-6 system, a massively-parallel special-purpose computer for astrophysical NN-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional systems, though it can be used for collisionless systems. The main differences between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6 force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed 3 clock cycles to calculate one interaction), (b) the clock speed is increased from 32 to 90 MHz, and (c) the total number of processor chips is increased from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops. We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.

    Dynamic Lifetime Reliability and Energy Management for Network-on-Chip based Chip Multiprocessors

    Get PDF
    In this dissertation, we study dynamic reliability management (DRM) and dynamic energy management (DEM) techniques for network-on-chip (NoC) based chip multiprocessors (CMPs). In the first part, the proposed DRM algorithm takes both the computational and the communication components of the CMP into consideration and combines thread migration and dynamic voltage and frequency scaling (DVFS) as the two primary techniques to change the CMP operation. The goal is to increase the lifetime reliability of the overall system to the desired target with minimal performance degradation. The simulation results on a variety of benchmarks on 16 and 64 core NoC based CMP architectures demonstrate that lifetime reliability can be improved by 100% for an average performance penalty of 7.7% and 8.7% for the two CMP architectures. In the second part of this dissertation, we first propose novel algorithms that employ Kalman filtering and long short term memory (LSTM) for workload prediction. These predictions are then used as the basis on which voltage/frequency (V/F) pairs are selected for each core by an effective dynamic voltage and frequency scaling algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Secondly, we investigate the use of deep neural network (DNN) models for energy optimization under performance constraints in CMPs. The proposed algorithm is implemented in three phases. The first phase collects the training data by employing Kalman filtering for workload prediction and an efficient heuristic algorithm based on DVFS. The second phase represents the training process of the DNN model and in the last phase, the DNN model is used to directly identify V/F pairs that can achieve lower energy consumption without performance degradation beyond the acceptable threshold set by the user. Simulation results on 16 and 64 core NoC based architectures demonstrate that the proposed approach can achieve up to 55% energy reduction for 10% performance degradation constraints. Simulation experiments compare the proposed algorithm against existing approaches based on reinforcement learning and Kalman filtering and show that the proposed DNN technique provides average improvements in energy-delay-product (EDP) of 6.3% and 6% for the 16 core architecture and of 7.4% and 5.5% for the 64 core architecture

    Dynamic Thermal Management for Microprocessors

    Get PDF
    In deep submicron era, thermal hot spots and large temperature gradients significantly impact system reliability, performance, cost and leakage power. Dynamic thermal management techniques are designed to tackle the problems and control the chip temperature as well as power consumption. They refer to those techniques which enable the chip to autonomously modify the task execution and power dissipation characteristics so that lower-cost cooling solutions could be adopted while still guaranteeing safe temperature regulation. As long as the temperature is regulated, the system reliability can be improved, leakage power can be reduced and cooling system lifetime can be extended significantly. Multimedia applications are expected to form the largest portion of workload in general purpose PC and portable devices. The ever-increasing computation intensity of multimedia applications elevates the processor temperature and consequently impairs the reliability and performance of the system. In this thesis, we propose to perform dynamic thermal management using reinforcement learning algorithm for multimedia applications. The presented learning model does not need any prior knowledge of the workload information or the system thermal and power characteristics. It learns the temperature change and workload switching patterns by observing the temperature sensor and event counters on the processor, and finds the management policy that provides good performance-thermal tradeoff during the runtime. As the system complexity increases, it is more and more difficult to perform thermal management in a centralized manner because of state explosion and the overhead of monitoring the entire chip. In this thesis, we present a framework for distributed thermal management in many-core systems where balanced thermal profile can be achieved by proactive task migration among neighboring cores. The framework has a low cost agent residing in each core that observes the local workload and temperature and communicates with its nearest neighbor for task migration and exchange. By choosing only those migration requests that will result in balanced workload without generating thermal emergency, the presented framework maintains workload balance across the system and avoids unnecessary migration. Experimental results show that, our distributed management policy achieves almost the same performance as a global management policy when the tasks are initially randomly distributed. Compared with existing proactive task migration technique, our approach generates less hotspot, less migration overhead with negligible performance overhead. Temperature affects the leakage power and cooling power. In this thesis, we address the impact of task allocation on a processor\u27s leakage power and cooling fan power. Although the leakage power is determined by the average die temperature and the fan power is determined by the peak temperature, our analysis shows that the overall power can be minimized if a task allocation with minimum peak temperature is adopted together with an intelligent fan speed adjustment technique that finds the optimal tradeoff between fan power and leakage power. We further present a multi-agent distributed task migration technique that searches for the best task allocation during runtime. By choosing only those migration requests that will result chip maximum temperature reduction, the presented framework achieves large fan power savings as well as overall power reduction

    Design and development of a mechanism for low-latency real time audio processing on Linux

    Get PDF
    This work is focused on developing a mechanism to support audio processing on Linux operating system using soft real-time techniques, primarily resource based scheduling with feedback (adaptive reservations)

    Improving the Efficiency of Energy Harvesting Embedded System

    Get PDF
    In the past decade, mobile embedded systems, such as cell phones and tablets have infiltrated and dramatically transformed our life. The computation power, storage capacity and data communication speed of mobile devices have increases tremendously, and they have been used for more critical applications with intensive computation/communication. As a result, the battery lifetime becomes increasingly important and tends to be one of the key considerations for the consumers. Researches have been carried out to improve the efficiency of the lithium ion battery, which is a specific member in the more general Electrical Energy Storage (EES) family and is widely used in mobile systems, as well as the efficiency of other electrical energy storage systems such as supercapacitor, lead acid battery, and nickel–hydrogen battery etc. Previous studies show that hybrid electrical energy storage (HEES), which is a mixture of different EES technologies, gives the best performance. On the other hand, the Energy Harvesting (EH) technique has the potential to solve the problem once and for all by providing green and semi-permanent supply of energy to the embedded systems. However, the harvesting power must submit to the uncertainty of the environment and the variation of the weather. A stable and consistent power supply cannot always be guaranteed. The limited lifetime of the EES system and the unstableness of the EH system can be overcome by combining these two together to an energy harvesting embedded system and making them work cooperatively. In an energy harvesting embedded systems, if the harvested power is sufficient for the workload, extra power can be stored in the EES element; if the harvested power is short, the energy stored in the EES bank can be used to support the load demand. How much energy can be stored in the charging phase and how long the EES bank lifetime will be are affected by many factors including the efficiency of the energy harvesting module, the input/output voltage of the DC-DC converters, the status of the EES elements, and the characteristics of the workload. In this thesis, when the harvesting energy is abundant, our goal is to store as much surplus energy as possible in the EES bank under the variation of the harvesting power and the workload power. We investigate the impact of workload scheduling and Dynamic Voltage and Frequency Scaling (DVFS) of the embedded system on the energy efficiency of the EES bank in the charging phase. We propose a fast heuristic algorithm to minimize the energy overhead on the DC-DC converter while satisfying the timing constraints of the embedded workload and maximizing the energy stored in the HEES system. The proposed algorithm improves the efficiency of charging and discharging in an energy harvesting embedded system. On the other hand, when the harvesting rate is low, workload power consumption is supplied by the EES bank. In this case, we try to minimize the energy consumption on the embedded system to extend its EES bank life. In this thesis, we consider the scenario when workload has uncertainties and is running on a heterogeneous multi-core system. The workload variation is represented by the selection of conditional branches which activate or deactivate a set of instructions belonging to a task. We employ both task scheduling and DVFS techniques for energy optimization. Our scheduling algorithm considers the statistical information of the workload to minimize the mean power consumption of the application while satisfying a hard deadline constraint. The proposed DVFS algorithm has pseudo linear complexity and achieves comparable energy reduction as the solutions found by mathematical programming. Due to its capability of slack reclaiming, our DVFS technique is less sensitive to small change in hardware or workload and works more robustly than other techniques without slack reclaiming

    Design of a diversity enforcement module for safety critical processing systems

    Get PDF
    Safety-critical systems must adhere to specific functional safety standards describing the development process for those systems. One key requirement is the ability to avoid a single fault from causing a system failure, or in other words, avoiding Common Cause Failures (CCFs). Redundancy is a usual solution against CCFs. However, some specific CCFs may affect redundant components identically (e.g., voltage droops, clock interferences), hence potentially leading to identical errors that may go unnoticed and cause a failure. Diversity is often deployed along with redundancy to avoid also those CCFs. In the particular case of computing elements (e.g., cores), this is usually realized with some form of lockstep execution where two identical cores execute the same software, but with some time shift among them (aka staggering). Therefore, both cores have different state at any point in time and faults affecting both cores lead to different errors, which can be detected by comparing the outputs. Unfortunately, existing solutions have some non-negligible costs: (i) hardware-only solutions hide half of the cores making them non-user visible, hence halving platform performance even for non-critical tasks. Conversely, (ii) software-only solutions are much more flexible but impose the use of a third core to run the lockstep monitor, and require large staggering which has significant impact in performance for short programs. This thesis devises a new solution aiming at combining the advantages of existing solutions. Our proposal, a hardware diversity-enforcement module (referred to as SafeDE), is an efficient hardware realization of the software monitor. Therefore, it does not hide any core to the end user, it does not require a third core for monitoring purposes, and allows operating with tiny staggering (e.g., few tens of cycles instead of hundreds of thousands as required for the software-only solution). We implement and integrate SafeDE in a space multicore prototype in an FPGA and validate that it effectively achieves its requirements with negligible hardware costs. Moreover, this work has already led to the publication of two peer-reviewed articles in especialized conferences and journals

    High performance computing of explicit schemes for electrofusion jointing process based on message-passing paradigm

    Get PDF
    The research focused on heterogeneous cluster workstations comprising of a number of CPUs in single and shared architecture platform. The problem statements under consideration involved one dimensional parabolic equations. The thermal process of electrofusion jointing was also discussed. Numerical schemes of explicit type such as AGE, Brian, and Charlies Methods were employed. The parallelization of these methods were based on the domain decomposition technique. Some parallel performance measurement for these methods were also addressed. Temperature profile of the one dimensional radial model of the electrofusion process were also given
    • …
    corecore