23,415 research outputs found
Boosting Multi-Core Reachability Performance with Shared Hash Tables
This paper focuses on data structures for multi-core reachability, which is a
key component in model checking algorithms and other verification methods. A
cornerstone of an efficient solution is the storage of visited states. In
related work, static partitioning of the state space was combined with
thread-local storage and resulted in reasonable speedups, but left open whether
improvements are possible. In this paper, we present a scaling solution for
shared state storage which is based on a lockless hash table implementation.
The solution is specifically designed for the cache architecture of modern
CPUs. Because model checking algorithms impose loose requirements on the hash
table operations, their design can be streamlined substantially compared to
related work on lockless hash tables. Still, an implementation of the hash
table presented here has dozens of sensitive performance parameters (bucket
size, cache line size, data layout, probing sequence, etc.). We analyzed their
impact and compared the resulting speedups with related tools. Our
implementation outperforms two state-of-the-art multi-core model checkers (SPIN
and DiVinE) by a substantial margin, while placing fewer constraints on the
load balancing and search algorithms.Comment: preliminary repor
On the Performance Prediction of BLAS-based Tensor Contractions
Tensor operations are surging as the computational building blocks for a
variety of scientific simulations and the development of high-performance
kernels for such operations is known to be a challenging task. While for
operations on one- and two-dimensional tensors there exist standardized
interfaces and highly-optimized libraries (BLAS), for higher dimensional
tensors neither standards nor highly-tuned implementations exist yet. In this
paper, we consider contractions between two tensors of arbitrary dimensionality
and take on the challenge of generating high-performance implementations by
resorting to sequences of BLAS kernels. The approach consists in breaking the
contraction down into operations that only involve matrices or vectors. Since
in general there are many alternative ways of decomposing a contraction, we are
able to methodically derive a large family of algorithms. The main contribution
of this paper is a systematic methodology to accurately identify the fastest
algorithms in the bunch, without executing them. The goal is instead
accomplished with the help of a set of cache-aware micro-benchmarks for the
underlying BLAS kernels. The predictions we construct from such benchmarks
allow us to reliably single out the best-performing algorithms in a tiny
fraction of the time taken by the direct execution of the algorithms.Comment: Submitted to PMBS1
End-to-end resource management for federated delivery of multimedia services
Recently, the Internet has become a popular platform for the delivery of multimedia content. Currently, multimedia services are either offered by Over-the-top (OTT) providers or by access ISPs over a managed IP network. As OTT providers offer their content across the best-effort Internet, they cannot offer any Quality of Service (QoS) guarantees to their users. On the other hand, users of managed multimedia services are limited to the relatively small selection of content offered by their own ISP. This article presents a framework that combines the advantages of both existing approaches, by dynamically setting up federations between the stakeholders involved in the content delivery process. Specifically, the framework provides an automated mechanism to set up end-to-end federations for QoS-aware delivery of multimedia content across the Internet. QoS contracts are automatically negotiated between the content provider, its customers, and the intermediary network domains. Additionally, a federated resource reservation algorithm is presented, which allows the framework to identify the optimal set of stakeholders and resources to include within a federation. Its goal is to minimize delivery costs for the content provider, while satisfying customer QoS requirements. Moreover, the presented framework allows intermediary storage sites to be included in these federations, supporting on-the-fly deployment of content caches along the delivery paths. The algorithm was thoroughly evaluated in order to validate our approach and assess the merits of including intermediary storage sites. The results clearly show the benefits of our method, with delivery cost reductions of up to 80 % in the evaluated scenario
- …