Search CORE

102 research outputs found

Speculative Techniques for Memory Hierarchy Management

Author: Albarakat Laith
Publication venue
Publication date: 20/05/2021
Field of study

The “Memory Wall” [1], is the gap in performance between the processor and the main memory. Over the last 30 years computer architects have added multiple levels of cache to fill this gap, cache levels that are closer to the processors are smaller and faster. On the other hand, the levels that are far from the processors are bigger and slower. However the processors are still exposed to the latency of DRAM on misses. Therefore, speculative memory management techniques such as prefetching are used in modern microprocessors to bridge this gap in performance. First, we propose Synchronization-aware Hardware Prefetching for Chip Multiprocessors, a novel hardware data prefetching scheme designed for prefetching shared-memory, multi- threaded workloads. This is the first work we are aware of to characterize the causes of poor prefetching performance in shared- memory multi-threaded applications. These are the inability to prefetch beyond synchronization points and tendency to prefetch shared data before it has been written. SB-Fetch, a low-complexity, low-overhead prefetcher design that addresses both issues. Second, we propose a new prefetching algorithm, Set-Level Adaptive Prefetching for Com- pressed Caches (SLAP-CC), which seeks to address this problem by varying the prefetching aggressiveness based on how much effective capacity is available in each set. The ontribu- tions of this work is characterize the increase and per-set variability of cache efficiency which typical cache compression schemes create, and propose a new prefetching scheme, SLAP-CC, designed to leverage this cache efficiency variability. Third, we propose a new a scheduling mechanism that predicts the hard- to-prefetch loads at issue time and preemptively schedule them for execution as soon as they are ready, to allow the cache hierarchy to start the mishandling mechanism sooner. Such scheduling mechanism reduces the miss penalty on the dependent instructions after a hard-to-prefetch loads

Texas A&M Repository

Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures

Author: Holtryd Nadja
Publication venue
Publication date: 01/01/2023
Field of study

With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications

Chalmers Research

Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction

Author: Balachandran Shankar
Bera Rahul
Kanellopoulos Konstantinos
Mutlu Onur
Novo David
Olgun Ataberk
Sadrosadati Mohammad
Publication venue
Publication date: 30/09/2022
Field of study

Long-latency load requests continue to limit the performance of high-performance processors. To increase the latency tolerance of a processor, architects have primarily relied on two key techniques: sophisticated data prefetchers and large on-chip caches. In this work, we show that: 1) even a sophisticated state-of-the-art prefetcher can only predict half of the off-chip load requests on average across a wide range of workloads, and 2) due to the increasing size and complexity of on-chip caches, a large fraction of the latency of an off-chip load request is spent accessing the on-chip cache hierarchy. The goal of this work is to accelerate off-chip load requests by removing the on-chip cache access latency from their critical path. To this end, we propose a new technique called Hermes, whose key idea is to: 1) accurately predict which load requests might go off-chip, and 2) speculatively fetch the data required by the predicted off-chip loads directly from the main memory, while also concurrently accessing the cache hierarchy for such loads. To enable Hermes, we develop a new lightweight, perceptron-based off-chip load prediction technique that learns to identify off-chip load requests using multiple program features (e.g., sequence of program counters). For every load request, the predictor observes a set of program features to predict whether or not the load would go off-chip. If the load is predicted to go off-chip, Hermes issues a speculative request directly to the memory controller once the load's physical address is generated. If the prediction is correct, the load eventually misses the cache hierarchy and waits for the ongoing speculative request to finish, thus hiding the on-chip cache hierarchy access latency from the critical path of the off-chip load. Our evaluation shows that Hermes significantly improves performance of a state-of-the-art baseline. We open-source Hermes.Comment: To appear in 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 202

arXiv.org e-Print Archive

A Survey of Anticipatory Mobile Networking: Context-Based Classification, Prediction Methodologies, and Optimization Techniques

Author: Bui Nicola
CESANA MATTEO
Hosseini S. Amir
Liao Qi
MALANCHINI ILARIA
Widmer Joerg
Publication venue
Publication date: 01/01/2017
Field of study

A growing trend for information technology is to not just react to changes, but anticipate them as much as possible. This paradigm made modern solutions, such as recommendation systems, a ubiquitous presence in today's digital transactions. Anticipatory networking extends the idea to communication technologies by studying patterns and periodicity in human behavior and network dynamics to optimize network performance. This survey collects and analyzes recent papers leveraging context information to forecast the evolution of network conditions and, in turn, to improve network performance. In particular, we identify the main prediction and optimization tools adopted in this body of work and link them with objectives and constraints of the typical applications and scenarios. Finally, we consider open challenges and research directions to make anticipatory networking part of next generation networks

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Eugene: Towards deep intelligence as a service

Author: ABDELZAHER Tarek
HAO Yifan
HU Shaohan
JAYARAJAH Kasthuri
LIU Dongxin
LIU Shengzhong
MISRA Archan
PIAO Ailing
SHAO Huajie
WEERAKOON Dulanga
YAO Shuochao
ZHAO Yiran
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2019
Field of study

National Research Foundation (NRF) Singapore under its International Research Centres in Singapore Funding Initiativ

Crossref

Institutional Knowledge at Singapore Management University

Prefetching techniques for client server object-oriented database systems

Author: Knafla Nils
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is perfect and no prefetching technique can entirely eliminate delay due to page fetch latency. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The ..

CiteSeerX

Edinburgh Research Archive

Reliable Software for Unreliable Hardware - A Cross-Layer Approach

Author: Rehman Semeen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

A novel cross-layer reliability analysis, modeling, and optimization approach is proposed in this thesis that leverages multiple layers in the system design abstraction (i.e. hardware, compiler, system software, and application program) to exploit the available reliability enhancing potential at each system layer and to exchange this information across multiple system layers

KITopen

The 11th Conference of PhD Students in Computer Science

Author
Publication venue
Publication date: 01/01/2018
Field of study

University of Szeged

Prediction-based techniques for the optimization of mobile networks

Author: Bui Nicola
Publication venue
Publication date: 01/01/2017
Field of study

Mención Internacional en el título de doctorMobile cellular networks are complex system whose behavior is characterized by the superposition of several random phenomena, most of which, related to human activities, such as mobility, communications and network usage. However, when observed in their totality, the many individual components merge into more deterministic patterns and trends start to be identifiable and predictable. In this thesis we analyze a recent branch of network optimization that is commonly referred to as anticipatory networking and that entails the combination of prediction solutions and network optimization schemes. The main intuition behind anticipatory networking is that knowing in advance what is going on in the network can help understanding potentially severe problems and mitigate their impact by applying solution when they are still in their initial states. Conversely, network forecast might also indicate a future improvement in the overall network condition (i.e. load reduction or better signal quality reported from users). In such a case, resources can be assigned more sparingly requiring users to rely on buffered information while waiting for the better condition when it will be more convenient to grant more resources. In the beginning of this thesis we will survey the current anticipatory networking panorama and the many prediction and optimization solutions proposed so far. In the main body of the work, we will propose our novel solutions to the problem, the tools and methodologies we designed to evaluate them and to perform a real world evaluation of our schemes. By the end of this work it will be clear that not only is anticipatory networking a very promising theoretical framework, but also that it is feasible and it can deliver substantial benefit to current and next generation mobile networks. In fact, with both our theoretical and practical results we show evidences that more than one third of the resources can be saved and even larger gain can be achieved for data rate enhancements.Programa Oficial de Doctorado en Ingeniería TelemáticaPresidente: Albert Banchs Roca.- Presidente: Pablo Serrano Yañez-Mingot.- Secretario: Jorge Ortín Gracia.- Vocal: Guevara Noubi

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo