11,606 research outputs found

    Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels

    Full text link
    Useful models of loop kernel runtimes on out-of-order architectures require an analysis of the in-core performance behavior of instructions and their dependencies. While an instruction throughput prediction sets a lower bound to the kernel runtime, the critical path defines an upper bound. Such predictions are an essential part of analytic (i.e., white-box) performance models like the Roofline and Execution-Cache-Memory (ECM) models. They enable a better understanding of the performance-relevant interactions between hardware architecture and loop code. The Open Source Architecture Code Analyzer (OSACA) is a static analysis tool for predicting the execution time of sequential loops. It previously supported only x86 (Intel and AMD) architectures and simple, optimistic full-throughput execution. We have heavily extended OSACA to support ARM instructions and critical path prediction including the detection of loop-carried dependencies, which turns it into a versatile cross-architecture modeling tool. We show runtime predictions for code on Intel Cascade Lake, AMD Zen, and Marvell ThunderX2 micro-architectures based on machine models from available documentation and semi-automatic benchmarking. The predictions are compared with actual measurements.Comment: 6 pages, 3 figure

    Enhanced Machine Learning Techniques for Early HARQ Feedback Prediction in 5G

    Full text link
    We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by machine learning techniques as a path towards ultra-reliable and low-latency communication (URLLC). To this end, we propose machine learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders. These methods are evaluated based on their prospects of complying with the URLLC requirements of effective block error rates below 10−510^{-5} at small latency overheads. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads, and show the benefit over regular HARQ and existing E-HARQ schemes without machine learning.Comment: 14 pages, 15 figures; accepted versio

    Mechanical properties and thermal behaviour of two-stage concrete containing palm oil fuel ash

    Get PDF
    Two-stage concrete (TSC) is a special type of concrete which is made by placing coarse aggregate in a formwork and injecting a grout either by pump or under the gravity force to fill the voids. Over the decades, the application of supplementary cementing materials in conventional concrete has become widespread, and this trend is expected to continue in TSC as well. Palm oil fuel ash (POFA) is one of the ashes which has been recognized as a good pozzolanic material. This paper presents the experimental results on the performance behaviour of POFA in developing physical and mechanical properties of two-stage concrete. Four concrete mixes namely, TSC with 100% OPC as a control, and TSC with 10, 20 and 30% POFA were cast, and the temperature growth due to heat of hydration and heat transfer in the mixes was recorded. It has been found that POFA significantly reduced the temperature rise in two-stage aggregate concrete and delayed the transfer of heat to the mass of concrete. The compressive and tensile strengths, however, increased with the replacement of up to 20% POFA. The results obtained and the observation made in this study suggest that the substitution of OPC by POFA is beneficial, particularly for prepacked mass concrete where thermal cracking due to extreme heat rise is of great importance

    Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

    Full text link
    Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces.Comment: 86 pages, 15 figure
    • …
    corecore