9 research outputs found

    DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

    Full text link
    DNN workloads can be scheduled onto DNN accelerators in many different ways: from layer-by-layer scheduling to cross-layer depth-first scheduling (a.k.a. layer fusion, or cascaded execution). This results in a very broad scheduling space, with each schedule leading to varying hardware (HW) costs in terms of energy and latency. To rapidly explore this vast space for a wide variety of hardware architectures, analytical cost models are crucial to estimate scheduling effects on the HW level. However, state-of-the-art cost models are lacking support for exploring the complete depth-first scheduling space, for instance focusing only on activations while ignoring weights, or modeling only DRAM accesses while overlooking on-chip data movements. These limitations prevent researchers from systematically and accurately understanding the depth-first scheduling space. After formalizing this design space, this work proposes a unified modeling framework, DeFiNES, for layer-by-layer and depth-first scheduling to fill in the gaps. DeFiNES enables analytically estimating the hardware cost for possible schedules in terms of both energy and latency, while considering data access at every memory level. This is done for each schedule and HW architecture under study by optimally choosing the active part of the memory hierarchy per unique combination of operand, layer, and feature map tile. The hardware costs are estimated, taking into account both data computation and data copy phases. The analytical cost model is validated against measured data from a taped-out depth-first DNN accelerator, DepFiN, showing good modeling accuracy at the end-to-end neural network level. A comparison with generalized state-of-the-art demonstrates up to 10X better solutions found with DeFiNES.Comment: Accepted by HPCA 202

    SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators

    No full text
    To meet the growing need for computational power for DNNs, multiple specialized hardware architectures have been proposed. Each DNN layer should be mapped onto the hardware with the most efficient schedule, however, SotA schedulers struggle to consistently provide optimum schedules in a reasonable time across all DNN-HW combinations. This paper proposes SALSA, a fast dual-engine scheduler to generate optimal execution schedules for both even and uneven mapping. We introduce a new strategy, combining exhaustive search with simulated annealing to address the dynamic nature of the loop ordering design space size across layers. SALSA is extensively benchmarked against two SotA schedulers, LOMA [1] and Timeloop [2] on 5 different DNNs, on average SALSA finds schedules with 11.9% and 7.6% lower energy while speeding-up the search by 1.7× and 24× compared to LOMA and Timeloop, respectively

    Midrapidity antiproton-to-proton ratio in pp collisons root s=0.9 and 7 TeV measured by the ALICE experiment

    No full text
    The ratio of the yields of antiprotons to protons in pp collisions has been measured by the ALICE experiment at root s = 0.9 and 7 TeV during the initial running periods of the Large Hadron Collider. The measurement covers the transverse momentum interval 0.45 < p(t) < 1.05 GeV/c and rapidity vertical bar y vertical bar < 0.5. The ratio is measured to be R-vertical bar y vertical bar<0.5 = 0.957 +/- 0.006(stat) +/- 0.0014(syst) at 0.9 Tev and R-vertical bar y vertical bar<0.5 = 0.991 +/- 0.005 +/- 0.014(syst) at 7 TeV and it is independent of both rapidity and transverse momentum. The results are consistent with the conventional model of baryon-number transport and set stringent limits on any additional contributions to baryon-number transfer over very large rapidity intervals in pp collisions

    Centrality dependence of the charged-particle multiplicity density at mid-rapidity in Pb-Pb collisions at sNN\sqrt{s_{NN}} = 2.76 TeV

    No full text
    The centrality dependence of the charged-particle multiplicity density at mid-rapidity in Pb-Pb collisions at sNN\sqrt{s_{NN}} = 2.76 TeV is presented. The charged-particle density normalized per participating nucleon pair increases by about a factor 2 from peripheral (70-80%) to central (0-5%) collisions. The centrality dependence is found to be similar to that observed at lower collision energies. The data are compared with models based on different mechanisms for particle production in nuclear collisions.The centrality dependence of the charged-particle multiplicity density at mid-rapidity in Pb-Pb collisions at sNN\sqrt{s_{\rm NN}} = 2.76 TeV is presented. The charged-particle density normalized per participating nucleon pair increases by about a factor 2 from peripheral (70-80%) to central (0-5%) collisions. The centrality dependence is found to be similar to that observed at lower collision energies. The data are compared with models based on different mechanisms for particle production in nuclear collisions
    corecore