7,868 research outputs found

    Analytical/ML Mixed Approach for Concurrency Regulation in Software Transactional Memory

    Get PDF
    In this article we exploit a combination of analytical and Machine Learning (ML) techniques in order to build a performance model allowing to dynamically tune the level of concurrency of applications based on Software Transactional Memory (STM). Our mixed approach has the advantage of reducing the training time of pure machine learning methods, and avoiding approximation errors typically affecting pure analytical approaches. Hence it allows very fast construction of highly reliable performance models, which can be promptly and effectively exploited for optimizing actual application runs. We also present a real implementation of a concurrency regulation architecture, based on the mixed modeling approach, which has been integrated with the open source Tiny STM package, together with experimental data related to runs of applications taken from the STAMP benchmark suite demonstrating the effectiveness of our proposal. © 2014 IEEE

    DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

    Full text link
    We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

    Multi-threaded Output in CMS using ROOT

    Get PDF
    CMS has worked aggressively to make use of multi-core architectures, routinely running 4- to 8-core production jobs in 2017. The primary impediment to efficiently scaling beyond 8 cores has been our ROOT-based output module, which has been necessarily single threaded. In this paper we explore the changes made to the CMS framework and our ROOT output module to overcome the previous scaling limits, using two new ROOT features: the \texttt{TBufferMerger} asynchronous file merger, and Implicit Multi-Threading. We examine the architecture of the new parallel output module, the specific accommodations and modifications that were made to ensure compatibility with the CMS framework scheduler, and the performance characteristics of the new output module.Comment: Submitted to CHEP 2018 - 23rd International Conference on Computing in High Energy and Nuclear Physics; 6 pages, 4 figures, uses webofc clas
    • …
    corecore