1,753 research outputs found

    Octopus - an energy-efficient architecture for wireless multimedia systems

    Get PDF
    Multimedia computing and mobile computing are two trends that will lead to a new application domain in the near future. However, the technological challenges to establishing this paradigm of computing are non-trivial. Personal mobile computing offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The approach we made to achieve such a system is to use autonomous, adaptable modules, interconnected by a switch rather than by a bus, and to offload as much as work as possible from the CPU to programmable modules that is placed in the data streams. A reconfigurable internal communication network switch called Octopus exploits locality of reference and eliminates wasteful data copies

    Improving Compute & Data Efficiency of Flexible Architectures

    Get PDF

    Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs

    Get PDF
    © 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Viena, Austria, October 2023. To access the final edited and published work see https://doi.org/10.1109/PACT58117.2023.00019Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a no-overhead quasi-optimal caching technique for caching textures in graphics workloads. Textures make up a significant part of main memory traffic in mobile GPUs, which contributes to the total GPU energy consumption. Since texture accesses use a shared L2 cache, improving the L2 texture caching efficiency would decrease main memory traffic, thus improving energy efficiency, which is crucial for mobile GPUs. Our proposal reaches quasi-optimality by exploiting the frame-to-frame reuse of textures in graphics. We do this by traversing frames in a boustrophedonic1 manner w.r.t. the frame-to-frame tile order. We first approximate the texture access trace to a circular trace and then forge a formal proof for our proposal being optimal for such traces. We also complement the proof with empirical data that demonstrates the quasi-optimality of our no-cost proposal

    Whirlpool: Improving Dynamic Cache Management with Static Data Classification

    Get PDF
    Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques, such as scratchpads or reuse hints, use static information about how programs access data to manage the memory hierarchy. Static techniques are effective on regular programs, but because they set fixed policies, they are vulnerable to changes in program behavior or available cache space. Instead, most systems rely on dynamic caching policies that adapt to observed program behavior. Unfortunately, dynamic policies spend significant resources trying to learn how programs use memory, and yet they often perform worse than a static policy. We present Whirlpool, a novel approach that combines static information with dynamic policies to reap the benefits of each. Whirlpool statically classifies data into pools based on how the program uses memory. Whirlpool then uses dynamic policies to tune the cache to each pool. Hence, rather than setting policies statically, Whirlpool uses static analysis to guide dynamic policies. We present both an API that lets programmers specify pools manually and a profiling tool that discovers pools automatically in unmodified binaries. We evaluate Whirlpool on a state-of-the-art NUCA cache. Whirlpool significantly outperforms prior approaches: on sequential programs, Whirlpool improves performance by up to 38% and reduces data movement energy by up to 53%; on parallel programs, Whirlpool improves performance by up to 67% and reduces data movement energy by up to 2.6x.National Science Foundation (U.S.) (grant CCF-1318384)National Science Foundation (U.S.) (CAREER-1452994)Samsung (Firm) (GRO award

    Improving Molecular Force Fields Across Configurational Space by Combining Supervised and Unsupervised Machine Learning

    Get PDF
    The training set of atomic configurations is key to the performance of any Machine Learning Force Field (MLFF) and, as such, the training set selection determines the applicability of the MLFF model for predictive molecular simulations. However, most atomistic reference datasets are inhomogeneously distributed across configurational space (CS), thus choosing the training set randomly or according to the probability distribution of the data leads to models whose accuracy is mainly defined by the most common close-to-equilibrium configurations in the reference data. In this work, we combine unsupervised and supervised ML methods to bypass the inherent bias of the data for common configurations, effectively widening the applicability range of MLFF to the fullest capabilities of the dataset. To achieve this goal, we first cluster the CS into subregions similar in terms of geometry and energetics. We iteratively test a given MLFF performance on each subregion and fill the training set of the model with the representatives of the most inaccurate parts of the CS. The proposed approach has been applied to a set of small organic molecules and alanine tetrapeptide, demonstrating an up to two-fold decrease in the root mean squared errors for force predictions of these molecules. This result holds for both kernel-based methods (sGDML and GAP/SOAP models) and deep neural networks (SchNet model). For the latter, the developed approach simultaneously improves both energy and forces, bypassing the compromise to be made when employing mixed energy/force loss functions
    • …
    corecore