6,435 research outputs found

    SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

    Full text link
    Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if not controlled well, is missed deadlines for HWAs and low CPU performance. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by prioritizing the GPU close to the time when it has to complete a frame. We observe two major problems when such an approach is adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because they are prioritized only close to their deadlines. Second, such an approach does not consider the diverse memory access characteristics of different applications running on CPUs and HWAs, leading to low performance for latency-sensitive CPU applications and deadline misses for some HWAs, including GPUs. In this paper, we propose a Simple Quality of service Aware memory Scheduler for Heterogeneous systems (SQUASH), that overcomes these problems using three key ideas, with the goal of meeting deadlines of HWAs while providing high CPU performance. First, SQUASH prioritizes a HWA when it is not on track to meet its deadline any time during a deadline period. Second, SQUASH prioritizes HWAs over memory-intensive CPU applications based on the observation that the performance of memory-intensive applications is not sensitive to memory latency. Third, SQUASH treats short-deadline HWAs differently as they are more likely to miss their deadlines and schedules their requests based on worst-case memory access time estimates. Extensive evaluations across a wide variety of different workloads and systems show that SQUASH achieves significantly better CPU performance than the best previous scheduler while always meeting the deadlines for all HWAs, including GPUs, thereby largely improving frame rates

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Scheduling of Early Quantum Tasks

    Get PDF
    An Early Quantum Task (EQT) is a Quantum EDF task that has shrunk its first period into one quantum time slot. Its purpose is to be executed as soon as possible, without causing deadline overflow of other tasks. We will derive the conditions under which an EQT can be admitted and can have an immediate start. The advantage of scheduling EQTs is shown by its use in a buffered multi-media server. The EQT is associated with a multimedia stream and it will use its first invocation to fill the buffer, such that a client can start receiving data immediately

    Centralized vs distributed communication scheme on switched ethernet for embedded military applications

    Get PDF
    Current military communication network is a generation old and is no longer effective in meeting the emerging requirements imposed by the future embedded military applications. Therefore, a new interconnection system is needed to overcome these limitations. Two new communication networks based upon Full Duplex Switched Ethernet are presented herein in this aim. The first one uses a distributed communication scheme where equipments can emit their data simultaneously, which clearly improves system’s throughput and flexibility. However, migrating all existing applications into a compliant form could be an expensive step. To avoid this process, the second proposal consists in keeping the current centralized communication scheme. Our objective is to assess and compare the real time guarantees that each proposal can offer. The paper includes the functional description of each proposed communication network and a military avionic application to highlight proposals ability to support the required time constrained communications

    Minimizing buffer requirements in video-on-demand servers

    Get PDF
    23rd Euromicro Conference EUROMICRO 97: 'New Frontiers of Information Technology', Budapest, Hungary, 1-4 Sept 1997Memory management is a key issue when designing cost effective video on demand servers. State of the art techniques, like double buffering, allocate buffers in a per stream basis and require huge amounts of memory. We propose a buffering policy, namely Single Pair of Buffers, that dramatically reduces server memory requirements by reserving a pair of buffers per storage device. By considering in detail disk and network interaction, we have also identified the particular conditions under which this policy can be successfully applied to engineer video on demand servers. Reduction factors of two orders of magnitude compared to the double buffering approach can be obtained. Current disk and network parameters make this technique feasible.Publicad

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
    corecore