18 research outputs found

    Reliability-Aware Design for Nanometer-Scale Devices, January 2008

    Get PDF
    Continuous transistor scaling due to improvements in CMOS devices and manufacturing technologies is increasing processor power densities and temperatures; thus, creating challenges to maintain manufacturing yield rates and reliable devices in their expected lifetimes for latest nanometer-scale dimensions. In fact, new system and processor microarchitectures require new reliability-aware design methods and exploration tools that can face these challenges without significantly increasing manufacturing cost, reducing system performance or imposing large area overheads due to redundancy. In this paper we overview the latest approaches in reliability modeling and variability-tolerant design for latest technology nodes, and advocate the need of reliability-aware design for forthcoming consumer electronics. Moreover, we illustrate with a case study of an embedded processor that effec- tive reliability-aware design can be achieved in nanometer-scale devices through integral design approaches that covers modeling and exploration of reliability effects, and hardware-software architectural techniques to provide reliability-enhanced solutions at both microarchitectural- and system-level

    Leakage and temperature aware server control for improving energy efficiency in data centers

    Get PDF
    Reducing the energy consumption for computation and cooling in servers is a major challenge considering the data center energy costs today. To ensure energy-efficient operation of servers in data centers, the relationship among computa- tional power, temperature, leakage, and cooling power needs to be analyzed. By means of an innovative setup that enables monitoring and controlling the computing and cooling power consumption separately on a commercial enterprise server, this paper studies temperature-leakage-energy tradeoffs, obtaining an empirical model for the leakage component. Using this model, we design a controller that continuously seeks and settles at the optimal fan speed to minimize the energy consumption for a given workload. We run a customized dynamic load-synthesis tool to stress the system. Our proposed cooling controller achieves up to 9% energy savings and 30W reduction in peak power in comparison to the default cooling control scheme

    A cyber-physical approach to combined HW-SW monitoring for improving energy efficiency in data centers

    Get PDF
    High-Performance Computing, Cloud computing and next-generation applications such e-Health or Smart Cities have dramatically increased the computational demand of Data Centers. The huge energy consumption, increasing levels of CO2 and the economic costs of these facilities represent a challenge for industry and researchers alike. Recent research trends propose the usage of holistic optimization techniques to jointly minimize Data Center computational and cooling costs from a multilevel perspective. This paper presents an analysis on the parameters needed to integrate the Data Center in a holistic optimization framework and leverages the usage of Cyber-Physical systems to gather workload, server and environmental data via software techniques and by deploying a non-intrusive Wireless Sensor Net- work (WSN). This solution tackles data sampling, retrieval and storage from a reconfigurable perspective, reducing the amount of data generated for optimization by a 68% without information loss, doubling the lifetime of the WSN nodes and allowing runtime energy minimization techniques in a real scenario

    Self-organizing maps versus growing neural Gas in detecting anomalies in data centers

    Get PDF
    Reliability is one of the key performance factors in data centres. The out-of-scale energy costs of these facilities lead data centre operators to increase the ambient temperature of the data room to decrease cooling costs. However, increasing ambient temperature reduces the safety margins and can result in a higher number of anomalous events. Anomalies in the data centre need to be detected as soon as possible to optimize cooling efficiency and mitigate the harmful effects over servers. This article proposes the usage of clustering-based outlier detection techniques coupled with a trust and reputation system engine to detect anomalies in data centres. We show how self-organizing maps or growing neural gas can be applied to detect cooling and workload anomalies, respectively, in a real data centre scenario with very good detection and isolation rates, in a way that is robust to the malfunction of the sensors that gather server and environmental information

    Thermal-Aware Data Flow Analysis

    Get PDF
    This paper suggests that the thermal state of a processor can be approximated using data flow analysis. The results of this analysis can be used to evaluate the efficacy of thermal-aware compilation strategies, or as input to thermal-aware optimizations that occur in the early stages of back-end compilation. We propose different ways how the exploitation of thermal behavior knowledge can be included in the different compilation phases. Copyright 2009 ACM

    IP-XACT for Smart Systems Design: Extensions for the Integration of Functional and Extra-Functional Models

    Get PDF
    Smart systems are miniaturized devices integrating computation, communication, sensing and actuation. As such, their design can not focus solely on functional behavior, but it must rather take into account different extra-functional concerns, such as power consumption or reliability. Any smart system can thus be modeled through a number of views, each focusing on a specific concern. Such views may exchange information, and they must thus be simulated simultaneously to reproduce mutual influence of the corresponding concerns. This paper shows how the IP-XACT standard, with some necessary extensions, can effectively support this simultaneous simulation. The extended IP-XACT descriptions allow to model extra-functional properties with a homogeneous format, defined by analysing requirements and characteristic of three main concerns, i.e., power, temperature and reliability. The IP-XACT descriptions are then used to automatically generate a skeleton of the simulation infrastructure in SystemC. The skeleton can be easily populated with models available in the literature, thus reaching simultaneous simulation of multiple concerns

    Self-Organizing maps for detecting abnormal thermal behavior in data centers

    Get PDF
    The increasing success of Cloud Computing applications and online services has contributed to the unsustainability of data center facilities in terms of energy consumption. Higher resource demand has increased the electricity required by computation and cooling resources, leading to power shortages and outages, specially in urban infrastructures. Current energy reduction strategies for Cloud facilities usually disregard the data center topology, the contribution of cooling consumption and the scalability of optimization strategies. Our work tackles the energy challenge by proposing a temperature-aware {VM} allocation policy based on a {Trust-and-Reputation} System ({TRS}). A {TRS} meets the requirements for inherently distributed environments such as data centers, and allows the implementation of autonomous and scalable {VM} allocation techniques. For this purpose, we model the relationships between the different computational entities, synthesizing this information in one single metric. This metric, called reputation, would be used to optimize the allocation of {VMs} in order to reduce energy consumption. We validate our approach with a state-of-the-art Cloud simulator using real Cloud traces. Our results show considerable reduction in energy consumption, reaching up to 46.16\% savings in computing power and 17.38\% savings in cooling, without {QoS} degradation while keeping servers below thermal redlining. Moreover, our results show the limitations of the {PUE} ratio as a metric for energy efficiency. To the best of our knowledge, this paper is the first approach in combining {Trust-and-Reputation} systems with Cloud Computing {VM} allocation

    HTPCP: GNSS-R multi-channel correlation waveforms post-process solution for GOLD-RTR Instrument

    Get PDF
    Global navigation satellite system reflectometry (GNSS-R) remote sensing is a new remote sensing technique of satellite navigation application. Essentially, it entails a method of remote sensing that receives and processes microwave signals reflected from various surfaces to extract useful information about those surfaces. The GPS open-loop differential real-time receiver (GOLD-RTR) instrument was designed by the ICE (IEEC-CSIC)1 to gather global positioning satellite system signals after they have been reflected from suitable surfaces (e.g. sea, ice and ground). In this paper, the problem of real-time postprocessing design is addressed in order to process the multichannel cross correlations waveform. This work is to realize real time single correlation integration algorithm (SCI) on the proposed novel platform, named as Heterogeneous Transmission and Parallel Computing Platform (HTPCP). The numerical results show that system throughput can reach up to about 1.669MB/sec. Comparing with the state-of-the-art serial SW solution, the processing time of SCI algorithm can improve about 19%. The coherent integration time can improve 8.17 times comparing with the conventional Symmetric Multiprocessing (SMP). And the parallel computing speed of HTPCP outperforms SMP

    Runtime data center temperature prediction using Grammatical Evolution techniques

    Get PDF
    Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEMinisterio de Economía y Competitividad (MINECO)pu

    Dynamic thermal management in 3D multicore architectures

    Get PDF
    Technology scaling has caused the feature sizes to shrink continuously, whereas interconnects, unlike transistors, have not followed the same trend. Designing 3D stack architectures is a recently proposed approach to overcome the power consumption and delay problems associated with the interconnects by reducing the length of the wires going across the chip. However, 3D integration introduces serious thermal challenges due to the high power density resulting from placing computational units on top of each other. In this work, we first investigate how the existing thermal management, power management and job scheduling policies affect the thermal behavior in 3D chips. We then propose a dynamic thermally-aware job scheduling technique for 3D systems to reduce the thermal problems at very low performance cost. Our technique can also be integrated with power management policies to reduce energy consumption while avoiding the thermal hot spots and large temperature variations
    corecore