80 research outputs found

    AVX Overhead Mitigation: OS Support for Power-Limited Systems

    Get PDF
    In den vergangenen Jahren haben sich Prozessoren dahingehend entwickelt, dass sie mehr und mehr durch ihre Leistungsabgabe begrenzt wurden. Aus diesem Grund ist eines der Hauptziele der Strategie zur Auswahl der Frequenz aktueller Prozessoren, die Performance innerhalb der gegebenen Leistungsgrenzen zu maximieren. Auf diesen Prozessoren beeinflusst deshalb die Wahl der Instruktionen mittlerweile oft die Betriebsfrequenz, da unterschiedliche Instruktionen unterschiedlich viel Energie benötigen. Ein Beispiel dafür findet sich in aktuellen Intel-Server-Prozessoren mit Unterstützung für AVX2- und AVX-512-Instruktionen. Diese Prozessoren reduzieren die Frequenz von CPU-Kernen, die leistungsintensiven Code mit solchen SIMD-Instruktionen ausführen. Die dadurch verursachte Frequenzreduktion reduziert den Durchsatz dieses SIMD-Codes, beeinflusst aber auch anderen weniger leistungsintensiven Code, der zum Beispiel gleichzeitig auf einem anderen Hardware-Thread des gleichen physischen CPU-Kerns ausgeführt wird. Diese Arbeit konzentriert sich auf die Kosten, die in solchen Situationen dadurch entstehen, dass einfacher, wenig leistungsintensiver Code mit einer suboptimalen CPU-Frequenz ausgeführt wird. Diese Kosten - Remote AVX Overhead genannt - verlangsamen laut früherer Arbeiten manche Workloads um bis zu 30% und stellen eine große Herausforderung bei der Verwendung von AVX2 und AVX-512 dar. Remote AVX Overhead kann jedoch, wie wir zeigen, größtenteils durch im Betriebssystem umgesetzte Techniken vermieden werden. In dieser Arbeit führen wir eine umfangreiche Analyse der Gründe für Remote AVX Overhead durch und beschreiben einen Profiler, der in der Lage ist, Remote AVX Overhead mit hoher Genauigkeit zu quantifizieren. Zudem zeigen wir, dass die Frequenzauswahlstrategie existierender Prozessoren in den meisten Fällen wenig Potential für Performance-Verbesserungen bietet - stattdessen sind wesentliche Verbesserungen durch Betriebssystemmodifikationen möglich. Wir beschreiben zwei Modifikationen des Schedulers, die in unterschiedlichen Situationen dem Einfluss von Remote AVX Overhead entgegenwirken. Zuerst zeigen wir, wie dadurch, dass AVX-512-Nutzung auf wenige Prozessorkerne beschränkt wird und Tasks auf passende Kerne migriert werden, Remote AVX Overhead stark reduziert wird, wenn Teile der ausgeführten Software AVX-512 verwenden. Danach zeigen wir, wie eine Priorisierung von Tasks, die durch Remote AVX Overhead verlangsamt werden, die Performance-Isolierung zwischen Tasks in Situationen verbessert, wenn das System teilweise entweder AVX2 oder AVX-512 ausführt. Unsere Arbeit demonstriert, dass das Betriebssystem wesentlich stärker in der Wahl von Prozessorfrequenzen in aktuellen und zukünftigen leistungslimitierten Systemen involviert sein muss. Zudem zeigen wir eine Reihe von möglichen Verbesserungen existierender Prozessorarchitekturen auf, die eine noch effektivere Reduktion des Einflusses von Effekten wie Remote AVX Overhead ermöglichen würden

    Transition Radiation Spectra of Electrons from 1 to 10 GeV/c in Regular and Irregular Radiators

    Full text link
    We present measurements of the spectral distribution of transition radiation generated by electrons of momentum 1 to 10 GeV/c in different radiator types. We investigate periodic foil radiators and irregular foam and fiber materials. The transition radiation photons are detected by prototypes of the drift chambers to be used in the Transition Radiation Detector (TRD) of the ALICE experiment at CERN, which are filled with a Xe, CO2 (15 %) mixture. The measurements are compared to simulations in order to enhance the quantitative understanding of transition radiation production, in particular the momentum dependence of the transition radiation yield.Comment: 18 pages, 15 figures, submitted to Nucl. Instr. Meth. Phys. Res.

    Reducing Response Time with Preheated Caches

    Get PDF
    CPU performance is increasingly limited by thermal dissipation, and soon aggressive power management will be beneficial for performance. Especially, temporarily idle parts of the chip (including the caches) should be power-gated in order to reduce leakage power. Current CPUs already lose their cache state whenever the CPU is idle for extended periods of time, which causes a performance loss when execution is resumed, due to the high number of cache misses when the working set is fetched from external memory. In a server system, the first network request during this period suffers from increased response time. We present a technique to reduce this overhead by preheating the caches in advance before the network request arrives at the server: Our design predicts the working set of the server application by analyzing the cache contents after similar requests have been processed. As soon as an estimate of the working set is available, a predictable network architecture starts to announce future incoming network packets to the server, which then loads the predicted working set into the cache. Our experiments show that, if this preheating step is complete when the network packet arrives, the response time overhead is reduced by an average of 80%

    Small animal positron emission tomography with multi-wire proportional counters

    Full text link
    Diese Arbeit behandelt Simulation, Design, Konstruktion und Test von Vieldrahtproportionalzähler (MWPC) basierten Detektoren für die Positronen-Emissions-Tomographie. Die kommerzielle quadHIDAC Klein-Tier PET Kamera wurde durch Simulationen untersucht und quantifiziert. Außerdem wurde die MSPET Kamera, ein optimierter, neuartiger Vieldrahtproportionalkammer-basierter Detektor für hochauflösende PET entwickelt, gebaut und getestet. This thesis treats the simulation, design, construction, testing and performance evaluation of multi-wire proportional counter (MWPC) based detectors for positron emission tomography. Simulations where used for the evaluation and quantification of the commercial quadHIDAC small animal PET camera. In addition the MSPET device, a new, optimised, multi-wire proportional counter-based detector for high-resolution PET was developed, constructed and tested

    LoGV: Low-overhead GPGPU Virtualization

    Get PDF
    Over the last few years, running high performance computing applications in the cloud has become feasible. At the same time, GPGPUs are delivering unprecedented performance for HPC applications. Cloud providers thus face the challenge to integrate GPGPUs into their virtualized platforms, which has proven difficult for current virtualization stacks. In this paper, we present LoGV, an approach to virtualize GPGPUs by leveraging protection mechanisms already present in modern hardware. LoGV enables sharing of GPGPUs between VMs as well as VM migration without modifying the host driver or the guest’s CUDA runtime. LoGV allocates resources securely in the hypervisor which then grants applications direct access to these resources, relying on GPGPU hardware features to guarantee mutual protection between applications. Experiments with our prototype have shown an overhead of less than 4% compared to native execution

    Model-based comparison of organ at risk protection between VMAT and robustly optimised IMPT plans

    Get PDF
    The comparison between intensity-modulated proton therapy (IMPT) and volume-modulated arc therapy (VMAT) plans, based on models of normal tissue complication probabilities (NTCP), can support the choice of radiation modality. IMPT irradiation plans for 50 patients with head and neck tumours originally treated with photon therapy have been robustly optimised against density and setup uncertainties. The dose distribution has been calculated with a Monte Carlo (MC) algorithm. The comparison of the plans was based on dose-volume parameters in organs at risk (OARs) and NTCP-calculations for xerostomia, sticky saliva, dysphagia and tube feeding using Langendijk's model-based approach. While the dose distribution in the target volumes is similar, the IMPT plans show better protection of OARs. Therefore, it is not the high dose confirmation that constitutes the advantage of protons, but it is the reduction of the mid-to-low dose levels compared to photons. This work investigates to what extent the advantages of proton radiation are beneficial for the patient's post-therapeutic quality of life (QoL). As a result, approximately one third of the patients examined benefit significantly from proton therapy with regard to possible late side effects. Clinical data is needed to confirm the model-based calculations.</p

    Alignment of the ALICE Inner Tracking System with cosmic-ray tracks

    Get PDF
    37 pages, 15 figures, revised version, accepted by JINSTALICE (A Large Ion Collider Experiment) is the LHC (Large Hadron Collider) experiment devoted to investigating the strongly interacting matter created in nucleus-nucleus collisions at the LHC energies. The ALICE ITS, Inner Tracking System, consists of six cylindrical layers of silicon detectors with three different technologies; in the outward direction: two layers of pixel detectors, two layers each of drift, and strip detectors. The number of parameters to be determined in the spatial alignment of the 2198 sensor modules of the ITS is about 13,000. The target alignment precision is well below 10 micron in some cases (pixels). The sources of alignment information include survey measurements, and the reconstructed tracks from cosmic rays and from proton-proton collisions. The main track-based alignment method uses the Millepede global approach. An iterative local method was developed and used as well. We present the results obtained for the ITS alignment using about 10^5 charged tracks from cosmic rays that have been collected during summer 2008, with the ALICE solenoidal magnet switched off.Peer reviewe

    GPrioSwap : Towards a Swapping Policy for GPUs

    Get PDF
    Over the last few years, Graphics Processing Units (GPUs) have become popular in computing, and have found their way into a number of cloud platforms. However, integrating a GPU into a cloud environment requires the cloud provider to efficiently virtualize the GPU. While several research projects have addressed this challenge in the past, few of these projects attempt to properly enable sharing of GPU memory between multiple clients: To date, GPUswap is the only project that enables sharing of GPU memory without inducing unnecessary application overhead, while maintaining both fairness and high utilization of GPU memory. However, GPUswap includes only a rudimentary swapping policy, and therefore induces a rather large application overhead. In this paper, we work towards a practicable swapping policy for GPUs. To that end, we analyze the behavior of various GPU applications to determine their memory access patterns. Based on our insights about these patterns, we derive a swapping policy that includes a developer-assigned priority for each GPU buffer in its swapping decisions. Experiments with our prototype implementation show that a swapping policy based on buffer priorities can significantly reduce the swapping overhead
    corecore