459 research outputs found

    Power-aware caches for GPGPUs

    Get PDF
    In this thesis, we propose two optimization techniques to reduce power consumption in L1 caches (data, texture and constant), shared memory and L2 cache. The first optimization technique targets static power. Evaluation of GPGPU applications shows that once a cache block is accessed by a thread, it takes several hundreds of clock cycles until the same block is accessed again. The long inter-access cycle can be used to put cache cells into drowsy mode and reduce static power. While drowsy cells reduce static power, they increase access time as voltage of a cache cell in drowsy mode should be raised before the block can be accessed. To mitigate performance impact of drowsy cells, we propose a novel technique called coarse grained drowsy mode. In coarse grained drowsy mode, we partition each cache into regions of consecutive cache blocks and wake up a region upon cache access. Due to temporal and spatial locality of cache accesses, this method dramatically reduces performance impact caused by drowsy cells. The second optimization technique relies on branch divergence in GPGPUs. The execution model in GPGPUs is Single Instruction Multiple Thread (SIMT) which means processing cores execute the same instruction with different data for GPGPU threads. The SIMT execution model may result in divergence of threads when a control instruction is executed. GPGPUs execute branch instructions in two phases. In the first phase, threads in the taken path are active and the rest are idle. In the second phase, threads in the not-taken path are executed and the rest are idle. Contemporary GPGPUs access all portions of cache blocks, although some threads are idle due to branch divergence. We propose accessing only portions of cache blocks corresponding to active threads. By disabling unnecessary sections of cache blocks, we are able to reduce dynamic power of caches. Our results show that on average, the two optimization techniques together reduce power of caches by up to 98% and 15% for static and dynamic power, respectively

    Microarchitectural techniques to reduce energy consumption in the memory hierarchy

    Get PDF
    This thesis states that dynamic profiling of the memory reference stream can improve energy and performance in the memory hierarchy. The research presented in this theses provides multiple instances of using lightweight hardware structures to profile the memory reference stream. The objective of this research is to develop microarchitectural techniques to reduce energy consumption at different levels of the memory hierarchy. Several simple and implementable techniques were developed as a part of this research. One of the techniques identifies and eliminates redundant refresh operations in DRAM and reduces DRAM refresh power. Another, reduces leakage energy in L2 and higher level caches for multiprocessor systems. The emphasis of this research has been to develop several techniques of obtaining energy savings in caches using a simple hardware structure called the counting Bloom filter (CBF). CBFs have been used to predict L2 cache misses and obtain energy savings by not accessing the L2 cache on a predicted miss. A simple extension of this technique allows CBFs to do way-estimation of set associative caches to reduce energy in cache lookups. Another technique using CBFs track addresses in a Virtual Cache and reduce false synonym lookups. Finally this thesis presents a technique to reduce dynamic power consumption in level one caches using significance compression. The significant energy and performance improvements demonstrated by the techniques presented in this thesis suggest that this work will be of great value for designing memory hierarchies of future computing platforms.Ph.D.Committee Chair: Lee, Hsien-Hsin S.; Committee Member: Cahtterjee,Abhijit; Committee Member: Mukhopadhyay, Saibal; Committee Member: Pande, Santosh; Committee Member: Yalamanchili, Sudhaka

    Human Health Engineering Volume II

    Get PDF
    In this Special Issue on “Human Health Engineering Volume II”, we invited submissions exploring recent contributions to the field of human health engineering, i.e., technology for monitoring the physical or mental health status of individuals in a variety of applications. Contributions could focus on sensors, wearable hardware, algorithms, or integrated monitoring systems. We organized the different papers according to their contributions to the main parts of the monitoring and control engineering scheme applied to human health applications, namely papers focusing on measuring/sensing physiological variables, papers highlighting health-monitoring applications, and examples of control and process management applications for human health. In comparison to biomedical engineering, we envision that the field of human health engineering will also cover applications for healthy humans (e.g., sports, sleep, and stress), and thus not only contribute to the development of technology for curing patients or supporting chronically ill people, but also to more general disease prevention and optimization of human well-being

    Evolution of diaphragmatic function in children under mechanical ventilation

    Full text link
    Introduction : La dysfonction diaphragmatique est trĂšs frĂ©quente chez des patients adultes aux soins intensifs et elle est associĂ©e Ă  des Ă©volutions cliniques dĂ©favorables. Il n’y a pour l’instant aucune mĂ©thode reconnue pour Ă©valuer la fonction du diaphragme chez l’enfant sous ventilation mĂ©canique (VM), et aucune Ă©tude dĂ©crivant son Ă©volution dans le temps chez cette population. MĂ©thodes : Dans ce travail, nous avons Ă©valuĂ© la fonction contractile du diaphragme chez des enfants sous ventilation invasive aux soins intensifs pĂ©diatriques (SIP) et en salle d’opĂ©ration (SOP). Pour ce faire, la pression au tube endotrachĂ©al (Paw) et l’activitĂ© Ă©lectrique du diaphragme (EAdi) Ă©taient enregistrĂ©es simultanĂ©ment lors de respirations spontanĂ©es pendant une brĂšve manƓuvre d’occlusion des voies respiratoires. Afin de prendre en compte la commande respiratoire, un ratio d’efficience neuro-mĂ©canique (NME, Paw/EAdi) a d’abord Ă©tĂ© calculĂ© puis validĂ© par une analyse de variabilitĂ©. La fonction du diaphragme a ensuite Ă©tĂ© comparĂ©e entre les deux populations, et son Ă©volution dans le temps au sein du groupe SIP dĂ©crite. RĂ©sultats : Le NME mĂ©dian Ă©tait la mesure de fonction diaphragmatique la plus fiable, avec un coefficient de variation de 23.7% et 21.1% dans les groups SIP et SOP, respectivement. Le NME dans le groupe SIP aprĂšs 21 heures de VM (1.80 cmH2O/ÎŒV, IQR 1.25–2.39) Ă©tait significativement infĂ©rieur Ă  celui du groupe SOP (3.65 cmH2O/ÎŒV, IQR 3.45–4.24, p = 0.015). Dans le groupe SIP, le NME n’a pas diminuĂ© de façon significative pendant la VM (coefficient de corrĂ©lation -0.011, p = 0.133). Conclusion : La fonction diaphragmatique peut ĂȘtre mesurĂ©e au chevet des enfants sous VM par de brĂšves manƓuvres d’occlusion. L’efficience du diaphragme Ă©tait significativement plus Ă©levĂ©e dans un groupe sain que dans une cohorte d’enfants critiquement malades, mais elle Ă©tait stable dans ce groupe avec une commande respiratoire prĂ©servĂ©e. Dans le futur, les contributions relatives de la maladie critique et de la ventilation mĂ©canique sur la fonction diaphragmatique devront ĂȘtre mieux caractĂ©risĂ©es avant de procĂ©der Ă  l’évaluation de potentielles interventions visant Ă  protĂ©ger le diaphragme.Introduction : Diaphragmatic dysfunction is highly prevalent in adult critical care and is associated with worse outcomes. There is at present no recognized method to assess diaphragmatic function in children under mechanical ventilation (MV) and no study describing its evolution over time in this population. Methods : In this work, we have assessed the contractile function of the diaphragm in children under invasive MV in the pediatric intensive care unit (PICU) and in the operating room (OR). This was done by simultaneously recording airway pressure at the endotracheal tube (Paw) and electrical activity of the diaphragm (EAdi) over consecutive spontaneous breaths during brief airway occlusion maneuvers. In order to account for central respiratory drive, a neuro-mechanical efficiency ratio (NME, Paw/EAdi) was first computed and then validated using variability analysis. Diaphragmatic function was then compared between the two populations and its evolution over time in the PICU group described. Results : Median NME was the most reliable measure of diaphragmatic function with a coefficient of variation of 23.7% and 21.1% in the PICU and OR groups, respectively. NME in the PICU group after 21 hours of MV (1.80 cmH2O/ÎŒV, IQR 1.25–2.39) was significantly lower than in the OR group (3.65 cmH2O/ÎŒV, IQR 3.45–4.24, p = 0.015). In the PICU group, NME did not decrease significantly over time under MV (correlation coefficient -0.011, p = 0.133). Conclusion : Diaphragmatic function can be measured at the bedside of children under MV using brief airway occlusions. Diaphragm efficiency was significantly higher in healthy controls than in a cohort of critically ill children, but it was stable over time under MV in this group with preserved respiratory drive. In the future, the relative contributions of critical illness and mechanical ventilation on diaphragmatic function should be better characterized before evaluating potential interventions aimed at protecting the diaphragm

    Brain attack : a new approach to stroke and transient ischaemic attack

    Get PDF

    Energy Efficiency and Performance in Multiprocessors Systems on Chip

    Get PDF
    As process technology shrinks, the transistor count on CPUs has increased. The breakdown of Dennard scaling has led to diminishing returns in terms of performance per power. A trend which promises to impact future CPU designs. This breakdown is due in part to the increase in transistor leakage driven static power. We, now, have constrained energy and power budgets. Thus, energy consumption has to be justified by an increased in performance. Simultaneously, architects have shifted to chip multiprocessors(CMPs) designs with large shared last level cache(LLC) to mitigate the cost of long latency off-chip memory access. A primary reason for that shift is the power efficiency of CMPs. Additionally, technology scaling has allowed the integration of platform components on the chip; a design referred to as multiprocessors system on chip (MpSoC). This integration improves the system performance as the communication latency between the components is reduced. Memory subsystems are essential to CPUs performance. Larger caches provide the CPU faster access to a larger data set. Consequently, the size of last level caches have increased to become a significant leakage power dissipation source. We propose a technique to facilitate power gating a partition of the LLC by migrating the high temporal blocks to a live partition; Thus reducing the performance impact. Given the high latency of memory subsystems, prefetching improves CPU performance by speculating future memory accesses and requesting the data ahead of the demand. In the context of CMPs running multiple concurrent processes, prefetching accuracy is critical to prevent cache pollution effects. Furthermore, given the current constraint energy environment, there is a need for lightweight prefetchers with high accuracy. To this end, we present BFetch a lightweight and accurate prefetcher driven by control flow predictions and effective address speculation. MpSoCs have mostly been used in mobile devices. The energy constraint is more pronounced in MpSoCs-based, battery powered mobile devices. The need to address the energy consumption in MpSoCs is further accentuated by the proliferation of mobile devices. This dissertation presents a framework to optimize the energy in MpSoCs. The proposed framework minimizes the energy consumption while meeting performance and power budgets constraints. We first apply this framework to the CPU then extend it to accommodate the GPU

    Detection of Driver Drowsiness and Distraction Using Computer Vision and Machine Learning Approaches

    Get PDF
    Drowsiness and distracted driving are leading factor in most car crashes and near-crashes. This research study explores and investigates the applications of both conventional computer vision and deep learning approaches for the detection of drowsiness and distraction in drivers. In the first part of this MPhil research study conventional computer vision approaches was studied to develop a robust drowsiness and distraction system based on yawning detection, head pose detection and eye blinking detection. These algorithms were implemented by using existing human crafted features. Experiments were performed for the detection and classification with small image datasets to evaluate and measure the performance of system. It was observed that the use of human crafted features together with a robust classifier such as SVM gives better performance in comparison to previous approaches. Though, the results were satisfactorily, there are many drawbacks and challenges associated with conventional computer vision approaches, such as definition and extraction of human crafted features, thus making these conventional algorithms to be subjective in nature and less adaptive in practice. In contrast, deep learning approaches automates the feature selection process and can be trained to learn the most discriminative features without any input from human. In the second half of this research study, the use of deep learning approaches for the detection of distracted driving was investigated. It was observed that one of the advantages of the applied methodology and technique for distraction detection includes and illustrates the contribution of CNN enhancement to a better pattern recognition accuracy and its ability to learn features from various regions of a human body simultaneously. The comparison of the performance of four convolutional deep net architectures (AlexNet, ResNet, MobileNet and NASNet) was carried out, investigated triplet training and explored the impact of combining a support vector classifier (SVC) with a trained deep net. The images used in our experiments with the deep nets are from the State Farm Distracted Driver Detection dataset hosted on Kaggle, each of which captures the entire body of a driver. The best results were obtained with the NASNet trained using triplet loss and combined with an SVC. It was observed that one of the advantages of deep learning approaches are their ability to learn discriminative features from various regions of a human body simultaneously. The ability has enabled deep learning approaches to reach accuracy at human level.

    Speculative Techniques for Memory Hierarchy Management

    Get PDF
    The “Memory Wall” [1], is the gap in performance between the processor and the main memory. Over the last 30 years computer architects have added multiple levels of cache to fill this gap, cache levels that are closer to the processors are smaller and faster. On the other hand, the levels that are far from the processors are bigger and slower. However the processors are still exposed to the latency of DRAM on misses. Therefore, speculative memory management techniques such as prefetching are used in modern microprocessors to bridge this gap in performance. First, we propose Synchronization-aware Hardware Prefetching for Chip Multiprocessors, a novel hardware data prefetching scheme designed for prefetching shared-memory, multi- threaded workloads. This is the first work we are aware of to characterize the causes of poor prefetching performance in shared- memory multi-threaded applications. These are the inability to prefetch beyond synchronization points and tendency to prefetch shared data before it has been written. SB-Fetch, a low-complexity, low-overhead prefetcher design that addresses both issues. Second, we propose a new prefetching algorithm, Set-Level Adaptive Prefetching for Com- pressed Caches (SLAP-CC), which seeks to address this problem by varying the prefetching aggressiveness based on how much effective capacity is available in each set. The ontribu- tions of this work is characterize the increase and per-set variability of cache efficiency which typical cache compression schemes create, and propose a new prefetching scheme, SLAP-CC, designed to leverage this cache efficiency variability. Third, we propose a new a scheduling mechanism that predicts the hard- to-prefetch loads at issue time and preemptively schedule them for execution as soon as they are ready, to allow the cache hierarchy to start the mishandling mechanism sooner. Such scheduling mechanism reduces the miss penalty on the dependent instructions after a hard-to-prefetch loads
    • 

    corecore