1,753 research outputs found
Octopus - an energy-efficient architecture for wireless multimedia systems
Multimedia computing and mobile computing are two trends that will lead to a new application domain in the near future. However, the technological challenges to establishing this paradigm of computing are non-trivial. Personal mobile computing offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The approach we made to achieve such a system is to use autonomous, adaptable modules, interconnected by a switch rather than by a bus, and to offload as much as work as possible from the CPU to programmable modules that is placed in the data streams. A reconfigurable internal communication network switch called Octopus exploits locality of reference and eliminates wasteful data copies
Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs
© 2023 Copyright held by the owner/author(s). This document is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/
This document is the Accepted version of a Published Work that appeared in final form in 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Viena, Austria, October 2023. To access the final edited and published work see https://doi.org/10.1109/PACT58117.2023.00019Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore replacement or bypassing policies. In this paper, however, we surpass this exploration by fabricating a formal proof for a no-overhead quasi-optimal caching technique for caching textures in graphics workloads. Textures make up a significant part of main memory traffic in mobile GPUs, which contributes to the total GPU energy consumption. Since texture accesses use a shared L2 cache, improving the L2 texture caching efficiency would decrease main memory traffic, thus improving energy efficiency, which is crucial for mobile GPUs. Our proposal reaches quasi-optimality by exploiting the frame-to-frame reuse of textures in graphics. We do this by traversing frames in a boustrophedonic1 manner w.r.t. the frame-to-frame tile order. We first approximate the texture access trace to a circular trace and then forge a formal proof for our proposal being optimal for such traces. We also complement the proof with empirical data that demonstrates the quasi-optimality of our no-cost proposal
Whirlpool: Improving Dynamic Cache Management with Static Data Classification
Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques, such as scratchpads or reuse hints, use static information about how programs access data to manage the memory hierarchy. Static techniques are effective on regular programs, but because they set fixed policies, they are vulnerable to changes in program behavior or available cache space. Instead, most systems rely on dynamic caching policies that adapt to observed program behavior. Unfortunately, dynamic policies spend significant resources trying to learn how programs use memory, and yet they often perform worse than a static policy. We present Whirlpool, a novel approach that combines static information with dynamic policies to reap the benefits of each. Whirlpool statically classifies data into pools based on how the program uses memory. Whirlpool then uses dynamic policies to tune the cache to each pool. Hence, rather than setting policies statically, Whirlpool uses static analysis to guide dynamic policies. We present both an API that lets programmers specify pools manually and a profiling tool that discovers pools automatically in unmodified binaries.
We evaluate Whirlpool on a state-of-the-art NUCA cache. Whirlpool significantly outperforms prior approaches: on sequential programs, Whirlpool improves performance by up to 38% and reduces data movement energy by up to 53%; on parallel programs, Whirlpool improves performance by up to 67% and reduces data movement energy by up to 2.6x.National Science Foundation (U.S.) (grant CCF-1318384)National Science Foundation (U.S.) (CAREER-1452994)Samsung (Firm) (GRO award
Improving Molecular Force Fields Across Configurational Space by Combining Supervised and Unsupervised Machine Learning
The training set of atomic configurations is key to the performance of any
Machine Learning Force Field (MLFF) and, as such, the training set selection
determines the applicability of the MLFF model for predictive molecular
simulations. However, most atomistic reference datasets are inhomogeneously
distributed across configurational space (CS), thus choosing the training set
randomly or according to the probability distribution of the data leads to
models whose accuracy is mainly defined by the most common close-to-equilibrium
configurations in the reference data. In this work, we combine unsupervised and
supervised ML methods to bypass the inherent bias of the data for common
configurations, effectively widening the applicability range of MLFF to the
fullest capabilities of the dataset. To achieve this goal, we first cluster the
CS into subregions similar in terms of geometry and energetics. We iteratively
test a given MLFF performance on each subregion and fill the training set of
the model with the representatives of the most inaccurate parts of the CS. The
proposed approach has been applied to a set of small organic molecules and
alanine tetrapeptide, demonstrating an up to two-fold decrease in the root mean
squared errors for force predictions of these molecules. This result holds for
both kernel-based methods (sGDML and GAP/SOAP models) and deep neural networks
(SchNet model). For the latter, the developed approach simultaneously improves
both energy and forces, bypassing the compromise to be made when employing
mixed energy/force loss functions
- …