33 research outputs found

    High level synthesis of RDF queries for graph analytics

    Get PDF
    In this paper we present a set of techniques that enable the synthesis of efficient custom accelerators for memory intensive, irregular applications. To address the challenges of irregular applications (large memory footprint, unpredictable fine-grained data accesses, and high synchronization intensity), and exploit their opportunities (thread level parallelism, memory level parallelism), we propose a novel accelerator design that employs an adaptive and Distributed Controller (DC) architecture, and a Memory Interface Controller (MIC) that supports concurrent and atomic memory operations on a multi-ported/multi-banked shared memory. Among the multitude of algorithms that may benefit from our solution, we focus on the acceleration of graph analytics applications and, in particular, on the synthesis of SPARQL queries on Resource Description Framework (RDF) databases. We achieve this objective by incorporating the synthesis techniques into Bambu, an Open Source high-level synthesis tools, and interfacing it with GEMS, the Graph database Engine for Multithreaded Systems. The GEMS' front-end generates optimized C implementations of the input queries, modeled as graph pattern matching algorithms, which are then automatically synthesized by Bambu. We validate our approach by synthesizing several SPARQL queries from the Lehigh University Benchmark (LUBM)

    Impact of Different Operational Definitions of Sarcopenia on Prevalence in a Population-Based Sample: The Salus in Apulia Study

    Get PDF
    Background: In 2010, the European Working Group on Sarcopenia in Older People (EWGSOP1) issued its first operational definition to diagnose sarcopenia. This was updated in 2019 with a revised sequence of muscle mass and muscle strength (EWGSOP2). The aim of the study was to investigate the impact of these different operational definitions on sarcopenia prevalence in a representative population-based sample. Methods: For each algorithm, the prevalence of sarcopenia-related categories was calculated and related to sociodemographic and lifestyle variables, anthropometric parameters, and laboratory biomarkers. The present analysis used data from the Salus in Apulia Study (Italy, 740 subjects, mean age 75.5 ± 5.9 years, 54% women). Results: The application of the EWGSOP1 adapted algorithm resulted in 85% [95% confidence intervals (CI): 82–88%] non-sarcopenic subjects, 10% (95% CI: 8–12%) pre-sarcopenic subjects, and 5% (95% CI: 3–7%) sarcopenic/severe sarcopenic subjects. The sarcopenia-related categories were inversely related to weight and body mass index (BMI), particularly in overweight/obese subjects, and these categories showed favorable metabolic biomarkers. The EWGSOP2 algorithm yielded 73% (95% CI: 69–76%) non-sarcopenic subjects, 24% (95% CI: 21–27%) probably sarcopenic subjects, and 4% (95% CI: 2–5%) sarcopenic subjects. Conclusions: The present study identified BMI as a potential confounder of the prevalence estimates of sarcopenia-related categories in population-based settings with different EWGSOP operational definitions

    An automated flow for the High Level Synthesis of coarse grained parallel applications

    No full text
    High Level Synthesis (HLS) provides a way to significantly enhance the productivity of embedded system designers, by enabling the automatic or semiautomatic generation of hardware accelerators starting from high level descriptions with (usually software) programming languages. Typical HLS approaches build a centralized Finite State Machine (FSM) to control the generated datapath, performing the operations according to a pre-determined, static schedule. However, FSM-based approaches are only able to extract parallelism within a single execution flow. In the presence of coarse grained parallelism, in the form of concurrent function calls or parallel control structures, they either serialize all the operations, or build excessively complex controllers, aiming at executing as many operation as possible in a single control step (i.e., they try to extract as much instruction level parallelism as possible). The resulting controllers occupy an excessive amount of area or lead to very low operating frequencies. In this paper we propose a methodology for the HLS of accelerators supporting parallel execution and dynamic scheduling. The approach exploits an adaptive distributed controller, composed of a set of communicating elements associated with each operation. This controller design enables supporting multiple concurrent execution flows, thus increasing parallelism exploitation beyond instruction level parallelism. The approach also supports variable latency operations, such as memory accesses and speculative operations. We apply our methodology on a set of typical HLS benchmarks, and demonstrate valuable speed ups with limited area overheads with respect to conventional FSM-based flows

    Scheduling independent liveness analysis for register binding in high level synthesis

    No full text
    Classical techniques for register allocation and binding require the definition of the program execution order, since a partial ordering relation between operations must be induced to perform liveness analysis through data-flow equations. In High Level Synthesis (HLS) flows this is commonly obtained through the scheduling task. However for some HLS approaches, such a relation can be difficult to be computed, or not statically computable at all, and adopting conventional register binding techniques, even when feasible, cannot guarantee maximum performances. To overcome these issues we introduce a novel scheduling-independent liveness analysis methodology, suitable for dynamic scheduling architectures. Such liveness analysis is exploited in register binding using standard graph coloring techniques, and unlike other approaches it avoids the insertion of structural dependencies, introduced to prevent run-time resource conflicts in dynamic scheduling environments. The absence of additional dependencies avoids performance degradation and makes parallelism exploitation independent from the register binding task, while on average not impacting on area, as shown through the experimental results
    corecore