39 research outputs found

    Data Cache Prefetching Using a Global History Buffer

    Full text link

    Topology-aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors

    Get PDF
    Current design complexity trends, poor wire scalability, and power limitations argue in favor of highly modular onchip systems. Today’s state-of-the-art CMPs already feature up to a hundred discrete cores. With increasing levels of integration, CMPs with hundreds of cores, cache tiles, and specialized accelerators are anticipated in the near future. Meanwhile, server consolidation and cloud computing paradigms have emerged as profit vehicles for exploiting abundant resources of chip-multiprocessors. As multiple, potentially malevolent, users begin to share virtualized resources of a single chip, CMP-level quality-of-service (QOS) support becomes necessary to provide performance isolation, service guarantees, and security. This work takes a topology-aware approach to on-chip QOS. We propose to segregate shared resources, such as memory controllers and accelerators, into dedicated islands (shared regions) of the chip with full hardware QOS support. We rely on a richly connected Multidrop Express Channel (MECS) topology to connect individual nodes to shared regions, foregoing QOS support in much of the substrate and eliminating its respective overheads. We evaluate several topologies for the QOSenabled shared regions, focusing on the interaction between network-on-chip (NOC) and QOS metrics. We explore a new topology called Destination Partitioned Subnets (DPS), which uses a light-weight dedicated network for each destination node. On synthetic workloads, DPS nearly matches or outperforms other topologies with comparable bisection bandwidth in terms of performance, area overhead, energyefficiency, fairness, and preemption resilience.

    Volume I. Introduction to DUNE

    Get PDF
    The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay—these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Experiment (DUNE) is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model. The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. This TDR is intended to justify the technical choices for the far detector that flow down from the high-level physics goals through requirements at all levels of the Project. Volume I contains an executive summary that introduces the DUNE science program, the far detector and the strategy for its modular designs, and the organization and management of the Project. The remainder of Volume I provides more detail on the science program that drives the choice of detector technologies and on the technologies themselves. It also introduces the designs for the DUNE near detector and the DUNE computing model, for which DUNE is planning design reports. Volume II of this TDR describes DUNE\u27s physics program in detail. Volume III describes the technical coordination required for the far detector design, construction, installation, and integration, and its organizational structure. Volume IV describes the single-phase far detector technology. A planned Volume V will describe the dual-phase technology

    Deep Underground Neutrino Experiment (DUNE), far detector technical design report, volume III: DUNE far detector technical coordination

    Get PDF
    The preponderance of matter over antimatter in the early universe, the dynamics of the supernovae that produced the heavy elements necessary for life, and whether protons eventually decay—these mysteries at the forefront of particle physics and astrophysics are key to understanding the early evolution of our universe, its current state, and its eventual fate. The Deep Underground Neutrino Experiment (DUNE) is an international world-class experiment dedicated to addressing these questions as it searches for leptonic charge-parity symmetry violation, stands ready to capture supernova neutrino bursts, and seeks to observe nucleon decay as a signature of a grand unified theory underlying the standard model. The DUNE far detector technical design report (TDR) describes the DUNE physics program and the technical designs of the single- and dual-phase DUNE liquid argon TPC far detector modules. Volume III of this TDR describes how the activities required to design, construct, fabricate, install, and commission the DUNE far detector modules are organized and managed. This volume details the organizational structures that will carry out and/or oversee the planned far detector activities safely, successfully, on time, and on budget. It presents overviews of the facilities, supporting infrastructure, and detectors for context, and it outlines the project-related functions and methodologies used by the DUNE technical coordination organization, focusing on the areas of integration engineering, technical reviews, quality assurance and control, and safety oversight. Because of its more advanced stage of development, functional examples presented in this volume focus primarily on the single-phase (SP) detector module

    Highly-parallelized simulation of a pixelated LArTPC on a GPU

    Get PDF
    The rapid development of general-purpose computing on graphics processing units (GPGPU) is allowing the implementation of highly-parallelized Monte Carlo simulation chains for particle physics experiments. This technique is particularly suitable for the simulation of a pixelated charge readout for time projection chambers, given the large number of channels that this technology employs. Here we present the first implementation of a full microphysical simulator of a liquid argon time projection chamber (LArTPC) equipped with light readout and pixelated charge readout, developed for the DUNE Near Detector. The software is implemented with an end-to-end set of GPU-optimized algorithms. The algorithms have been written in Python and translated into CUDA kernels using Numba, a just-in-time compiler for a subset of Python and NumPy instructions. The GPU implementation achieves a speed up of four orders of magnitude compared with the equivalent CPU version. The simulation of the current induced on 10^3 pixels takes around 1 ms on the GPU, compared with approximately 10 s on the CPU. The results of the simulation are compared against data from a pixel-readout LArTPC prototype

    A Fair Thread-Aware Memory Scheduling Algorithm for Chip Multiprocessor

    No full text

    Exploring Data Prefetching Mechanisms for Last Level Cache in Chip Multi-Processors

    No full text

    Low-Cost Adaptive Data Prefetching

    No full text

    Software-hardware cooperative DRAM bank partitioning for chip multiprocessors

    No full text
    Abstract. DRAM row buffer conflicts can increase the memory access latency significantly for single-threaded applications. In a chip multiprocessor system, multiple applications competing for DRAM will suffer additional row buffer conflicts due to interthread interference. This paper presents a new hardware and software cooperative DRAM bank partitioning method that combines page coloring and XOR cache mapping to evaluate the benefit potential of reducing interthread interference. Using SPECfp2000 as our benchmarks, our simulation results show that our scheme can boost the performance of the most benchmark combinations tested, with the speedups of up to 13%, 14 % and 8.06 % observed for two cores (with 16 banks), two cores (with 32 banks) and four cores (wit
    corecore