23,415 research outputs found

    Boosting Multi-Core Reachability Performance with Shared Hash Tables

    Get PDF
    This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

    On the Performance Prediction of BLAS-based Tensor Contractions

    Full text link
    Tensor operations are surging as the computational building blocks for a variety of scientific simulations and the development of high-performance kernels for such operations is known to be a challenging task. While for operations on one- and two-dimensional tensors there exist standardized interfaces and highly-optimized libraries (BLAS), for higher dimensional tensors neither standards nor highly-tuned implementations exist yet. In this paper, we consider contractions between two tensors of arbitrary dimensionality and take on the challenge of generating high-performance implementations by resorting to sequences of BLAS kernels. The approach consists in breaking the contraction down into operations that only involve matrices or vectors. Since in general there are many alternative ways of decomposing a contraction, we are able to methodically derive a large family of algorithms. The main contribution of this paper is a systematic methodology to accurately identify the fastest algorithms in the bunch, without executing them. The goal is instead accomplished with the help of a set of cache-aware micro-benchmarks for the underlying BLAS kernels. The predictions we construct from such benchmarks allow us to reliably single out the best-performing algorithms in a tiny fraction of the time taken by the direct execution of the algorithms.Comment: Submitted to PMBS1

    End-to-end resource management for federated delivery of multimedia services

    Get PDF
    Recently, the Internet has become a popular platform for the delivery of multimedia content. Currently, multimedia services are either offered by Over-the-top (OTT) providers or by access ISPs over a managed IP network. As OTT providers offer their content across the best-effort Internet, they cannot offer any Quality of Service (QoS) guarantees to their users. On the other hand, users of managed multimedia services are limited to the relatively small selection of content offered by their own ISP. This article presents a framework that combines the advantages of both existing approaches, by dynamically setting up federations between the stakeholders involved in the content delivery process. Specifically, the framework provides an automated mechanism to set up end-to-end federations for QoS-aware delivery of multimedia content across the Internet. QoS contracts are automatically negotiated between the content provider, its customers, and the intermediary network domains. Additionally, a federated resource reservation algorithm is presented, which allows the framework to identify the optimal set of stakeholders and resources to include within a federation. Its goal is to minimize delivery costs for the content provider, while satisfying customer QoS requirements. Moreover, the presented framework allows intermediary storage sites to be included in these federations, supporting on-the-fly deployment of content caches along the delivery paths. The algorithm was thoroughly evaluated in order to validate our approach and assess the merits of including intermediary storage sites. The results clearly show the benefits of our method, with delivery cost reductions of up to 80 % in the evaluated scenario
    • …
    corecore