7,998 research outputs found
Static locality analysis for cache management
Most memory references in numerical codes correspond to array references whose indices are affine functions of surrounding loop indices. These array references follow a regular predictable memory pattern that can be analysed at compile time. This analysis can provide valuable information like the locality exhibited by the program, which can be used to implement more intelligent caching strategy. In this paper we propose a static locality analysis oriented to the management of data caches. We show that previous proposals on locality analysis are not appropriate when the proposals have a high conflict miss ratio. This paper examines those proposals by introducing a compile-time interference analysis that significantly improve the performance of them. We first show how this analysis can be used to characterize the dynamic locality properties of numerical codes. This evaluation show for instance that a large percentage of references exhibit any type of locality. This motivates the use of a dual data cache, which has a module specialized to exploit temporal locality, and a selective cache respectively. Then, the performance provided by these two cache organizations is evaluated. In both organizations, the static locality analysis is responsible for tagging each memory instruction accordingly to the particular type(s) of locality that it exhibits.Peer ReviewedPostprint (published version
Living on the Edge: The Role of Proactive Caching in 5G Wireless Networks
This article explores one of the key enablers of beyond G wireless
networks leveraging small cell network deployments, namely proactive caching.
Endowed with predictive capabilities and harnessing recent developments in
storage, context-awareness and social networks, peak traffic demands can be
substantially reduced by proactively serving predictable user demands, via
caching at base stations and users' devices. In order to show the effectiveness
of proactive caching, we examine two case studies which exploit the spatial and
social structure of the network, where proactive caching plays a crucial role.
Firstly, in order to alleviate backhaul congestion, we propose a mechanism
whereby files are proactively cached during off-peak demands based on file
popularity and correlations among users and files patterns. Secondly,
leveraging social networks and device-to-device (D2D) communications, we
propose a procedure that exploits the social structure of the network by
predicting the set of influential users to (proactively) cache strategic
contents and disseminate them to their social ties via D2D communications.
Exploiting this proactive caching paradigm, numerical results show that
important gains can be obtained for each case study, with backhaul savings and
a higher ratio of satisfied users of up to and , respectively.
Higher gains can be further obtained by increasing the storage capability at
the network edge.Comment: accepted for publication in IEEE Communications Magazin
Near-optimal loop tiling by means of cache miss equations and genetic algorithms
The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transformations such as loop tiling, which is a code transformation targeted to reduce capacity misses. This paper presents a novel systematic approach to perform near-optimal loop tiling based on an accurate data locality analysis (cache miss equations) and a powerful technique to search the solution space that is based on a genetic algorithm. The results show that this approach can remove practically all capacity misses for all considered benchmarks. The reduction of replacement misses results in a decrease of the miss ratio that can be as significant as a factor of 7 for the matrix multiply kernel.Peer ReviewedPostprint (published version
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
Analytic, first-principles performance modeling of distributed-memory
applications is difficult due to a wide spectrum of random disturbances caused
by the application and the system. These disturbances (commonly called "noise")
destroy the assumptions of regularity that one usually employs when
constructing simple analytic models. Despite numerous efforts to quantify,
categorize, and reduce such effects, a comprehensive quantitative understanding
of their performance impact is not available, especially for long delays that
have global consequences for the parallel application. In this work, we
investigate various traces collected from synthetic benchmarks that mimic real
applications on simulated and real message-passing systems in order to pinpoint
the mechanisms behind delay propagation. We analyze the dependence of the
propagation speed of idle waves emanating from injected delays with respect to
the execution and communication properties of the application, study how such
delays decay under increased noise levels, and how they interact with each
other. We also show how fine-grained noise can make a system immune against the
adverse effects of propagating idle waves. Our results contribute to a better
understanding of the collective phenomena that manifest themselves in
distributed-memory parallel applications.Comment: 10 pages, 9 figures; title change
- …