611 research outputs found

    Using Intelligent Prefetching to Reduce the Energy Consumption of a Large-scale Storage System

    Get PDF
    Many high performance large-scale storage systems will experience significant workload increases as their user base and content availability grow over time. The U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) center hosts one such system that has recently undergone a period of rapid growth as its user population grew nearly 400% in just about three years. When administrators of these massive storage systems face the challenge of meeting the demands of an ever increasing number of requests, the easiest solution is to integrate more advanced hardware to existing systems. However, additional investment in hardware may significantly increase the system cost as well as daily power consumption. In this paper, we present evidence that well-selected software level optimization is capable of achieving comparable levels of performance without the cost and power consumption overhead caused by physically expanding the system. Specifically, we develop intelligent prefetching algorithms that are suitable for the unique workloads and user behaviors of the world\u27s largest satellite images distribution system managed by USGS EROS. Our experimental results, derived from real-world traces with over five million requests sent by users around the globe, show that the EROS hybrid storage system could maintain the same performance with over 30% of energy savings by utilizing our proposed prefetching algorithms, compared to the alternative solution of doubling the size of the current FTP server farm

    When Backpressure Meets Predictive Scheduling

    Full text link
    Motivated by the increasing popularity of learning and predicting human user behavior in communication and computing systems, in this paper, we investigate the fundamental benefit of predictive scheduling, i.e., predicting and pre-serving arrivals, in controlled queueing systems. Based on a lookahead window prediction model, we first establish a novel equivalence between the predictive queueing system with a \emph{fully-efficient} scheduling scheme and an equivalent queueing system without prediction. This connection allows us to analytically demonstrate that predictive scheduling necessarily improves system delay performance and can drive it to zero with increasing prediction power. We then propose the \textsf{Predictive Backpressure (PBP)} algorithm for achieving optimal utility performance in such predictive systems. \textsf{PBP} efficiently incorporates prediction into stochastic system control and avoids the great complication due to the exponential state space growth in the prediction window size. We show that \textsf{PBP} can achieve a utility performance that is within O(ϵ)O(\epsilon) of the optimal, for any ϵ>0\epsilon>0, while guaranteeing that the system delay distribution is a \emph{shifted-to-the-left} version of that under the original Backpressure algorithm. Hence, the average packet delay under \textsf{PBP} is strictly better than that under Backpressure, and vanishes with increasing prediction window size. This implies that the resulting utility-delay tradeoff with predictive scheduling beats the known optimal [O(ϵ),O(log(1/ϵ))][O(\epsilon), O(\log(1/\epsilon))] tradeoff for systems without prediction

    A low-power, high-performance speech recognition accelerator

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for largevocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.Peer ReviewedPostprint (author's final draft

    Space-Efficient Predictive Block Management

    Get PDF
    With growing disk and storage capacities, the amount of required metadata for tracking all blocks in a system becomes a daunting task by itself. In previous work, we have demonstrated a system software effort in the area of predictive data grouping for reducing power and latency on hard disks. The structures used, very similar to prior efforts in prefetching and prefetch caching, track access successor information at the block level, keeping a fixed number of immediate successors per block. While providing powerful predictive expansion capabilities and being more space efficient in the amount of required metadata than many previous strategies, there remains a growing concern of how much data is actually required. In this paper, we present a novel method of storing equivalent information, SESH, a Space Efficient Storage of Heredity. This method utilizes the high amount of block-level predictability observed in a number of workload trace sets to reduce the overall metadata storage by up to 99% without any loss of information. As a result, we are able to provide a predictive tool that is adaptive, accurate, and robust in the face of workload noise, for a tiny fraction of the metadata cost previously anticipated; in some cases, reducing the required size from 12 gigabytes to less than 150 megabytes

    EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE

    Get PDF
    The performance gap between processors and storage systems has been increasingly critical overthe years. Yet the performance disparity remains, and further, storage energy consumption israpidly becoming a new critical problem. While smarter caching and predictive techniques domuch to alleviate this disparity, the problem persists, and data storage remains a growing contributorto latency and energy consumption.Attempts have been made at data layout maintenance, or intelligent physical placement ofdata, yet in practice, basic heuristics remain predominant. Problems that early studies soughtto solve via layout strategies were proven to be NP-Hard, and data layout maintenance todayremains more art than science. With unknown potential and a domain inherently full of uncertainty,layout maintenance persists as an area largely untapped by modern systems. But uncertainty inworkloads does not imply randomness; access patterns have exhibited repeatable, stable behavior.Predictive information can be gathered, analyzed, and exploited to improve data layouts. Ourgoal is a dynamic, robust, sustainable predictive engine, aimed at improving existing layouts byreplicating data at the storage device level.We present a comprehensive discussion of the design and construction of such a predictive engine,including workload evaluation, where we present and evaluate classical workloads as well asour own highly detailed traces collected over an extended period. We demonstrate significant gainsthrough an initial static grouping mechanism, and compare against an optimal grouping method ofour own construction, and further show significant improvement over competing techniques. We also explore and illustrate the challenges faced when moving from static to dynamic (i.e. online)grouping, and provide motivation and solutions for addressing these challenges. These challengesinclude metadata storage, appropriate predictive collocation, online performance, and physicalplacement. We reduced the metadata needed by several orders of magnitude, reducing the requiredvolume from more than 14% of total storage down to less than 12%. We also demonstrate how ourcollocation strategies outperform competing techniques. Finally, we present our complete modeland evaluate a prototype implementation against real hardware. This model was demonstrated tobe capable of reducing device-level accesses by up to 65%
    corecore