29 research outputs found

    Mix-GEMM: An efficient HW-SW architecture for mixed-precision quantized deep neural networks inference on edge devices

    Get PDF
    Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a promising research direction toward efficient deep learning computations on edge and mobile devices. On one side, recent progress of Quantization-Aware Training (QAT) frameworks aimed at improving the accuracy of extremely quantized DNNs allows achieving results close to Floating-Point 32 (FP32), and provides high flexibility concerning the data sizes selection. Unfortunately, current Central Processing Unit (CPU) architectures and Instruction Set Architectures (ISAs) targeting resource-constrained devices present limitations on the range of data sizes supported to compute DNN kernels.This paper presents Mix-GEMM, a hardware-software co-designed architecture capable of efficiently computing quantized DNN convolutional kernels based on byte and sub-byte data sizes. Mix-GEMM accelerates General Matrix Multiplication (GEMM), representing the core kernel of DNNs, supporting all data size combinations from 8- to 2-bit, including mixed-precision computations, and featuring performance that scale with the decreasing of the computational data sizes. Our experimental evaluation, performed on representative quantized Convolutional Neural Networks (CNNs), shows that a RISC-V based edge System-on-Chip (SoC) integrating Mix-GEMM achieves up to 1.3 TOPS/W in energy efficiency, and up to 13.6 GOPS in throughput, gaining from 5.3× to 15.1× in performance over the OpenBLAS GEMM frameworks running on a commercial RISC-V based edge processor. By performing synthesis and Place and Route (PnR) of the enhanced SoC in Global Foundries 22nm FDX technology, we show that Mix-GEMM only accounts for 1% of the overall area consumption.This research was supported by the ERDF Operational Program of Catalonia 2014-2020, with a grant from the Spanish State Research Agency [PID2019-107255GB] and with DRAC project [001-P-001723], by the grant [PID2019-107255G-C21] funded by MCIN/AEI/ 10.13039/501100011033, by the Generalitat de Catalunya [2017-SGR-1328], and by Lenovo-BSC Contract-Framework (2020). The Spanish Ministry of Economy, Industry and Competitiveness has partially supported M. Doblas through an FPU fellowship [FPU20-04076] and M. Moreto through a Ramon y Cajal fellowship [RYC-2016-21104].Peer ReviewedPostprint (author's final draft

    AHRB: A High-Performance Time-Composable AMBA AHB Bus

    Get PDF
    Abstract-Hard real-time systems are moving toward complex systems comprising chips with different IP components connected with standard buses. AMBA is one of the most used bus interfaces and has already been included in processors in the real-time domain. However, AMBA was not designed to provide time composable Worst Case Execution Time (WCET) estimates, which are desirable to reduce timing validation and verification costs. This paper analyzes and extends the AMBA Advanced Highperformance Bus (AHB) specification to enable time-composable WCET estimates by design. Concretely, (1) we analyze in detail the AMBA AHB in the context of hard real-time systems proving that it fails to provide time composability; (2) we define a restricted subset of AMBA AHB features, named restricted AHB (resAHB), that allows deriving time-composable, yet not tight, WCET estimates; and (3) we define an extension of resAHB, named Advanced High-performance Real-time Bus (AHRB), that includes the timing constraints in the specification. This allows deriving time-composable and tight WCET estimates. Our results show that AHRB can provide 3.5x tighter estimates than resAHB on average for EEMBC benchmarks

    A dualcriticality memory controler (DCmc): Proposal and evaluation of a space case study

    Get PDF
    Abstract-Multicore Dual-Criticality systems comprise two types of applications, each with a different criticality level. In the space domain these types are referred as payload and control applications, which have high-performance and realtime requirements respectively. In order to control the interaction (contention) among payload and control applications in the access to the main memory, reaching the goals of highbandwidth for the former and guaranteed timing bounds for the latter, we propose a Dual-Criticality memory controller (DCmc). DCmc virtually divides memory banks into real-time and high-performance banks, deploying a different request scheduler policy to each bank type, which facilitates achieving both goals. Our evaluation with a multicore cycle-accurate simulator and a real space case study shows that DCmc enables deriving tight WCET estimates, regardless of the co-running payload applications, hence effectively isolating the effect of contention in the access to memory. DCmc also enables payload applications exploiting memory locality, which is needed for high performance

    Fibers are not (P)Threads: The Case for Loose Coupling of Asynchronous Programming Models and MPI Through Continuations

    Full text link
    Asynchronous programming models (APM) are gaining more and more traction, allowing applications to expose the available concurrency to a runtime system tasked with coordinating the execution. While MPI has long provided support for multi-threaded communication and non-blocking operations, it falls short of adequately supporting APMs as correctly and efficiently handling MPI communication in different models is still a challenge. Meanwhile, new low-level implementations of light-weight, cooperatively scheduled execution contexts (fibers, aka user-level threads (ULT)) are meant to serve as a basis for higher-level APMs and their integration in MPI implementations has been proposed as a replacement for traditional POSIX thread support to alleviate these challenges. In this paper, we first establish a taxonomy in an attempt to clearly distinguish different concepts in the parallel software stack. We argue that the proposed tight integration of fiber implementations with MPI is neither warranted nor beneficial and instead is detrimental to the goal of MPI being a portable communication abstraction. We propose MPI Continuations as an extension to the MPI standard to provide callback-based notifications on completed operations, leading to a clear separation of concerns by providing a loose coupling mechanism between MPI and APMs. We show that this interface is flexible and interacts well with different APMs, namely OpenMP detached tasks, OmpSs-2, and Argobots.Comment: 12 pages, 7 figures Published in proceedings of EuroMPI/USA '20, September 21-24, 2020, Austin, TX, US

    BSC Post-processed Seasonal Climate Forecast for vineyard management

    No full text
    <p>The Climate Services Team at the Barcelona Supercomputing Center has deployed a climate service for vineyard management in the context of the vitiGEOSS project. This dataset results from post-processing, i.e. by downscaling, calibrating and assessing, the seasonal climate prediction system SEAS5 (ECMWF).</p> <p>Probabilistic predictions have as output several solutions (ensemble members) to account for forecast uncertainty. The forecast information is conveyed as probabilities, in this case as the probabilities of occurrence of three categories or terciles (below normal, normal and above normal).  The categories are defined based on the terciles of the model climatology distribution over a period in the past.  Additional information regarding the probability of occurrence of extremes is also provided, considered as the probability of not reaching the 10th percentile or surpassing the 90th percentile of the model climatology distribution. The skill scores provide information on the forecast quality (fair Ranked Probability Skill Score for the tercile categories and fair Brier Skill Score for the probabilities of extremes). A positive skill score indicates that the prediction is good (better than using average past conditions) in the long term. In contrast, a negative skill score indicates a prediction is not beating the climatological forecast.</p> <ul> <li> <p>Prediction system: European Center for Medium-Range Weather Forecasts  (ECMWF) SEAS5 and post-processed by BSC.</p> </li> <li> <p>Issue frequency: Monthly (~15th of each month)</p> </li> <li> <p>Lead times: months 1 to 3 (e.g. For a forecast initialised in June, forecasts will be monthly averages for July, August and September). The initialization date is indicated in the name of each file (e.g. 20210701). </p> </li> <li> <p>Variables: mean, minimum and maximum 2 m temperature, accumulated precipitation, incoming solar radiation</p> </li> <li> <p>Ensemble size: 51 members</p> </li> <li> <p>Postprocessing: Downscaling from original (1°x 1°) resolution to 0.1°x 0.1° for three domains and monthly calibration with variance inflation.  </p> </li> <li> <p>Spatial coverage of the domains: </p> </li> <ul> <li> <p>Catalonia region is indicated by ‘cat’ and covers latitudes [10 N, 44 N], and longitudes [1 W, 4 E]. The latitude indices range [1:41], and the longitude indices range [1:51].</p> </li> <li> <p>Douro region is indicated by ’douro’ and covers latitudes [40 N, 43N ] and longitudes [9 W, 6 W]. The latitude indices range [1:31], and the longitude indices range [1:31].</p> </li> <li> <p>Campana region is indicated by ‘campania’ and covers latitudes [39 N, 43 N] and longitudes [13 E,17.3 E]. The latitude indices range [1:41], and the longitude indices range [1:44]. </p> </li> </ul> </ul> <p> </p> <p>For each prediction, there are several files containing the seasonal variables values, probabilities, definition of the categories and skill scores.</p> <ul> <li> <p>Forecast probabilities</p> </li> </ul> <p>E.g t2_campania_prob_20210701.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘prob’ and the initialization date of the forecasts (1st of the month).</p> <p>It contains the forecast probabilities in (%) of each tercile category below normal (prob_bn), normal (prob_n) and above normal (prob_an) and the probability of lower extreme (prob_bp10) and the probability of upper extreme (prob_ap90). The latitude, longitude, and lead time (months 1 to 3) can be selected.</p> <p><strong> </strong></p> <ul> <li> <p>Forecast ensemble members</p> </li> </ul> <p>  E.g. t2_campania_20210701.ncml</p> <p>The file name contains the name of the variable, domain and initialization date of the forecasts (1st of the month).</p> <p>It contains the 51 absolute values of the forecast variables in their corresponding units (see Table 2).  The latitude, longitude, and lead time (months 1 to 3) can be selected.</p> <p><strong> </strong></p> <ul> <li> <p>Category limits</p> </li> </ul> <p>E.g. t2_campania-percentiles_month07.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘percentiles’ and the month for which the category limits apply. </p> <p>It contains the limits of the predicted categories ( below normal, normal and above normal). These categories are defined with respect to a period in the past. The 33rd, 66th percentiles (p33 and p66) divide the model climatological distribution into 3 equiprobable categories. The 33th percentile is the boundary between below normal and normal, and the 66th percentile is the boundary between the normal and above normal categories. The 10th and 90th percentiles, which define the threshold for the lower and upper extreme conditions, are also provided (p10 and p90). It should be noted that the definition of the categories is specific to each location (latitude and longitude), initialization month and lead time (valid month).</p> <p><strong> </strong></p> <ul> <li> <p>Skill scores</p> </li> </ul> <p>E.g t2_campania-skill_month07.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘skill’ and the month for which the skill scores apply. </p> <p>It contains the measures of forecast quality, the fair Ranked probability score for terciles (rpss) and the fair Brier Skill Score for lower and upper extremes (bsp10 and bsp90). It should be noted that the skill level is specific to each location (latitude and longitude), initialization month and lead time (valid month).</p&gt

    BSC Post-processed Sub-seasonal Climate Forecast for vineyard management

    No full text
    <p>The Climate Services Team at the Barcelona Supercomputing Center has deployed a climate service for vineyard management in the context of the vitiGEOSS project. This dataset results from post-processing, i.e. by downscaling, calibrating and assessing, the subeasonal climate prediction system NCEP-CFSv2.</p> <p>Probabilistic predictions have as output several solutions (ensemble members) to account for forecast uncertainty. The forecast information is conveyed as probabilities, in this case as the probabilities of occurrence of three categories or terciles (below normal, normal and above normal).  The categories are defined based on the terciles of the model climatology distribution over a period in the past.  Additional information regarding the probability of occurrence of extremes is also provided, considered as the probability of not reaching the 10th percentile or surpassing the 90th percentile of the model climatology distribution. The skill scores provide information on the forecast quality (fair Ranked Probability Skill Score for the tercile categories and fair Brier Skill Score for the probabilities of extremes). A positive skill score indicates that the prediction is good (better than using average past conditions) in the long term, while a negative skill score indicates a prediction is not beating the climatological forecast.</p> <ul> <li> <p>Prediction system: National Centers for Environmental Prediction (NCEP) CFSv2, post-processed by BSC (create a lagged ensemble, downscaling and calibration).</p> </li> <li> <p>Issue frequency: Weekly (Initialization every Thursday, post-processed prediction every Friday).</p> </li> <li> <p>Lead times: weeks 1 to 4 (e.g. For a forecast issued on Friday 4th November, forecasts will be weekly averages starting the following Monday-Thursday and the 4 following weeks (e.g. week 1 will be 8th-15th November). The initialization date is indicated in the name of each file (e.g. 20211104). </p> </li> <li> <p>Variables: mean, minimum and maximum 2 m temperature, accumulated precipitation, and incoming solar radiation.</p> </li> <li> <p>Ensemble size: 48 members</p> </li> <li> <p>Postprocessing: Create a lagged ensemble of 48 ensemble members, downscaling from the original (1°x 1°) resolution to 0.1°x 0.1° for the three domains and weekly calibration with variance inflation.  </p> </li> <li> <p>Spatial coverage of the domains: </p> </li> <ul> <li> <p>Catalonia region is indicated by ‘cat’ and covers latitudes [10 N, 44 N], and longitudes [1 W, 4 E]. The latitude indices range [1:41], and the longitude indices range [1:51].</p> </li> <li> <p>Douro region is indicated by ‘douro’ and covers latitudes [40 N, 43N ] and longitudes [9 W, 6 W]. The latitude indices range [1:31], and the longitude indices range [1:31].</p> </li> <li> <p>Campana region is indicated by ‘campania’ and covers latitudes [39 N, 43 N] and longitudes [13 E,17.3 E]. The latitude indices range [1:41], and the longitude indices range [1:44]. </p> </li> </ul> </ul> <p>The specific latitude and longitude indices to extract the predictions corresponding to each vitiGEOSS site are indicated in Table 2.</p> <ul> <li> <p>Forecast probabilities</p> </li> </ul> <p>E.g t2_campania_prob_20211104.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘prob’ and the initialization date of the forecasts (Always a Thursday).</p> <p>It contains the forecast probabilities in (%) of each tercile category below normal (prob_bn), normal (prob_n) and above normal (prob_an) and the probability of lower extreme (prob_bp10) and the probability of upper extreme (prob_ap90). The latitude, longitude and lead time (weeks 1 to 4) can be selected.</p> <ul> <li> <p>Forecast ensemble members</p> </li> </ul> <p>  E.g. t2_campania_20211104.ncml</p> <p>The file name contains the name of the variable, domain and initialization date of the forecasts (Always a Thursday).</p> <p>It contains the 48 absolute values of the forecast variables in their corresponding units (see Table 2).  The latitude, longitude and lead time (weeks 1 to 4) can be selected.</p> <ul> <li> <p>Category limits</p> </li> </ul> <p>E.g. t2_campania_percentiles_week44.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘percentiles’ and the month for which the category limits apply. </p> <p>It contains the limits of the predicted categories ( below normal, normal and above normal). These categories are defined with respect to a period in the past. The 33rd, 66th percentiles (p33 and p66) divide the model climatological distribution into 3 equiprobable categories. The 33rd percentile is the boundary between below-normal and normal, and the 66th percentile is the boundary between the normal and above-normal categories. The 10th and 90th percentiles, which define the threshold for the lower and upper extreme conditions, are also provided (p10 and p90). It should be noted that the definition of the categories is specific to each location (latitude and longitude), initialization month and lead time (valid month).</p> <ul> <li> <p>Skill scores</p> </li> </ul> <p>E.g t2_campania_skill_week44.ncml</p> <p>The file name contains the name of the variable, domain, the label ‘skill’ and the week of the year for which the skill scores apply. </p> <p>It contains the measures of forecast quality, the fair Ranked probability score for terciles (rpss) and the fair Brier Skill Score for lower and upper extremes (bsp10 and bsp90). It should be noted that the skill level is specific to each location (latitude and longitude), initialization and lead time (valid week).</p&gt
    corecore