37,424 research outputs found

    A Fast Causal Profiler for Task Parallel Programs

    Full text link
    This paper proposes TASKPROF, a profiler that identifies parallelism bottlenecks in task parallel programs. It leverages the structure of a task parallel execution to perform fine-grained attribution of work to various parts of the program. TASKPROF's use of hardware performance counters to perform fine-grained measurements minimizes perturbation. TASKPROF's profile execution runs in parallel using multi-cores. TASKPROF's causal profile enables users to estimate improvements in parallelism when a region of code is optimized even when concrete optimizations are not yet known. We have used TASKPROF to isolate parallelism bottlenecks in twenty three applications that use the Intel Threading Building Blocks library. We have designed parallelization techniques in five applications to in- crease parallelism by an order of magnitude using TASKPROF. Our user study indicates that developers are able to isolate performance bottlenecks with ease using TASKPROF.Comment: 11 page

    Coz: Finding Code that Counts with Causal Profiling

    Full text link
    Improving performance is a central concern for software developers. To locate optimization opportunities, developers rely on software profilers. However, these profilers only report where programs spent their time: optimizing that code may have no impact on performance. Past profilers thus both waste developer time and make it difficult for them to uncover significant optimization opportunities. This paper introduces causal profiling. Unlike past profiling approaches, causal profiling indicates exactly where programmers should focus their optimization efforts, and quantifies their potential impact. Causal profiling works by running performance experiments during program execution. Each experiment calculates the impact of any potential optimization by virtually speeding up code: inserting pauses that slow down all other code running concurrently. The key insight is that this slowdown has the same relative effect as running that line faster, thus "virtually" speeding it up. We present Coz, a causal profiler, which we evaluate on a range of highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite. Coz identifies previously unknown optimization opportunities that are both significant and targeted. Guided by Coz, we improve the performance of Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as much as 68%; in most cases, these optimizations involve modifying under 10 lines of code.Comment: Published at SOSP 2015 (Best Paper Award

    Task scheduling techniques for asymmetric multi-core systems

    Get PDF
    As performance and energy efficiency have become the main challenges for next-generation high-performance computing, asymmetric multi-core architectures can provide solutions to tackle these issues. Parallel programming models need to be able to suit the needs of such systems and keep on increasing the application’s portability and efficiency. This paper proposes two task scheduling approaches that target asymmetric systems. These dynamic scheduling policies reduce total execution time either by detecting the longest or the critical path of the dynamic task dependency graph of the application, or by finding the earliest executor of a task. They use dynamic scheduling and information discoverable during execution, fact that makes them implementable and functional without the need of off-line profiling. In our evaluation we compare these scheduling approaches with two existing state-of the art heterogeneous schedulers and we track their improvement over a FIFO baseline scheduler. We show that the heterogeneous schedulers improve the baseline by up to 1.45 in a real 8-core asymmetric system and up to 2.1 in a simulated 32-core asymmetric chip.This work has been supported by the Spanish Government (SEV2015-0493), by the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), by the RoMoL ERC Advanced Grant (GA 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project receives funding from the EU’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no 610402 and from the EU’s H2020 Framework Programme (H2020/2014-2020) under grant agreement no 671697. M. Moretó has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship number JCI-2012-15047. M. Casas is supported by the Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the 7th R&D Framework Programme of the European Union (Contract 2013 BP B 00243).Peer ReviewedPostprint (author's final draft

    RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

    Get PDF
    Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

    Time for change: a new training programme for morpho-molecular pathologists?

    Get PDF
    The evolution of cellular pathology as a specialty has always been driven by technological developments and the clinical relevance of incorporating novel investigations into diagnostic practice. In recent years, the molecular characterisation of cancer has become of crucial relevance in patient treatment both for predictive testing and subclassification of certain tumours. Much of this has become possible due to the availability of next-generation sequencing technologies and the whole-genome sequencing of tumours is now being rolled out into clinical practice in England via the 100 000 Genome Project. The effective integration of cellular pathology reporting and genomic characterisation is crucial to ensure the morphological and genomic data are interpreted in the relevant context, though despite this, in many UK centres molecular testing is entirely detached from cellular pathology departments. The CM-Path initiative recognises there is a genomics knowledge and skills gap within cellular pathology that needs to be bridged through an upskilling of the current workforce and a redesign of pathology training. Bridging this gap will allow the development of an integrated 'morphomolecular pathology' specialty, which can maintain the relevance of cellular pathology at the centre of cancer patient management and allow the pathology community to continue to be a major influence in cancer discovery as well as playing a driving role in the delivery of precision medicine approaches. Here, several alternative models of pathology training, designed to address this challenge, are presented and appraised

    HeteroCore GPU to exploit TLP-resource diversity

    Get PDF
    • …
    corecore