101 research outputs found

    Static Graphs for Coding Productivity in OpenACC

    Get PDF
    The main contribution of this work is to increase the coding productivity for GPU programming by using the concept of Static Graphs. To do so, we have combined the new CUDA Graph API with the OpenACC programming model. We use as test cases a well-known and widely used problems in HPC and AI: the Particle Swarm Optimization. We complement the OpenACC functionality with the use of CUDA Graph, achieving accelerations of more than one order of magnitude, and a performance very close to a reference and optimized CUDA code. Finally, we propose a new specification to incorporate the concept of Static Graphs into the OpenACC specification.This project has received funding from the EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051.Peer ReviewedPostprint (author's final draft

    Towards enhancing coding productivity for GPU programming using static graphs

    Get PDF
    The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.his research was funded by EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, accessed on 13 April 2022).Peer ReviewedPostprint (published version

    Computer-Assisted Coding: Post ICD-10 Implementation

    Get PDF
    Computer-assisted coding (CAC) has been around since the 1950s and is projecting to reach $4.75 Billion by 2022. However, it has not been on the hospitals’ priority list until 2014 before the implementation of ICD-10 in 2015. Computer-assisted coding is a technology software that helps streamline the coding workflow, reduce backlogs by increasing productivity, and help coders navigate through more extended, more complex charts more quickly. The technology is a type of artificial intelligence. The idea of computer-assisted became more front-line with the implementation of electronic health records (EHRs) and the demands of a more restrictive reimbursement from payers. Accuracy, consistency, and, most assuredly, productivity has been of great importance to all organizations. Due to the increase in advanced technologies, computer-assisted coding has advanced in its performance. However, the question remains as to if it has lived up to the recent hype before the implementation of ICD-10 to increase productivity, accuracy, consistency, improve clinical documentation, etc. This study was conducted using a questionnaire to survey the Tennessee Health Information Management (THIMA) community members as to the effectiveness of computer-assisted coding five years after the implementation of ICD-10. The results of the survey show that there are organizations that are still not using CAC. The overall perception of the respondents feel CAC is not a must-have technology to code efficiently but, with the CAC, the overall coding process is satisfactory but still needs improvement

    HEP-Frame: A software engineered framework to aid the development and efficient multicore execution of scientific code

    Get PDF
    This communication presents an evolutionary soft- ware prototype of a user-centered Highly Efficient Pipelined Framework, HEP-Frame, to aid the development of sustainable parallel scientific code with a flexible pipeline structure. HEP- Frame is the result of a tight collaboration between computational scientists and software engineers: it aims to improve scientists coding productivity, ensuring an efficient parallel execution on a wide set of multicore systems, with both HPC and HTC techniques. Current prototype complies with the requirements of an actual scientific code, includes desirable sustainability features and supports at compile time additional plugin interfaces for other scientific fields. The porting and development productivity was assessed and preliminary efficiency results are promising.This work was supported by FCT (Fundação para a Ciência e Tecnologia) within Project Scope (UID/CEC/00319/2013), by LIP (Laboratório de Instrumentação e Física Experimental de Partículas) and by Project Search-ON2 (NORTE-07-0162- FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework, through the European Regional Development Fund

    Automation on the generation of genome scale metabolic models

    Full text link
    Background: Nowadays, the reconstruction of genome scale metabolic models is a non-automatized and interactive process based on decision taking. This lengthy process usually requires a full year of one person's work in order to satisfactory collect, analyze and validate the list of all metabolic reactions present in a specific organism. In order to write this list, one manually has to go through a huge amount of genomic, metabolomic and physiological information. Currently, there is no optimal algorithm that allows one to automatically go through all this information and generate the models taking into account probabilistic criteria of unicity and completeness that a biologist would consider. Results: This work presents the automation of a methodology for the reconstruction of genome scale metabolic models for any organism. The methodology that follows is the automatized version of the steps implemented manually for the reconstruction of the genome scale metabolic model of a photosynthetic organism, {\it Synechocystis sp. PCC6803}. The steps for the reconstruction are implemented in a computational platform (COPABI) that generates the models from the probabilistic algorithms that have been developed. Conclusions: For validation of the developed algorithm robustness, the metabolic models of several organisms generated by the platform have been studied together with published models that have been manually curated. Network properties of the models like connectivity and average shortest mean path of the different models have been compared and analyzed.Comment: 24 pages, 2 figures, 2 table

    SPar: A DSL for High-Level and Productive Stream Parallelism

    Get PDF
    This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness

    A real-time interpolator for parametric curves

    Get PDF
    Driven by the ever increasing need for the high-speed high-accuracy machining of freeform surfaces, the interpolators for parametric curves become highly desirable, as they can eliminate the feedrate and acceleration fluctuation due to the discontinuity in the first derivatives along the linear tool path. The interpolation for parametric curves is essentially an optimization problem, and it is extremely difficult to get the time-optimal solution. This paper presents a novel real-time interpolator for parametric curves (RTIPC), which provides a near time-optimal solution. It limits the machine dynamics (axial velocities, axial accelerations and jerk) and contour error through feedrate lookahead and acceleration lookahead operations, meanwhile, the feedrate is maintained as high as possible with minimum fluctuation. The lookahead length is dynamically adjusted to minimize the computation load. And the numerical integration error is considered during the lookahead calculation. Two typical parametric curves are selected for both numerical simulation and experimental validation, a cubic phase plate freeform surface is also machined. The numerical simulation is performed using the software (open access information is in the Acknowledgment section) that implements the proposed RTIPC, the results demonstrate the effectiveness of the RTIPC. The real-time performance of the RTIPC is tested on the in-house developed controller, which shows satisfactory efficiency. Finally, machining trials are carried out in comparison with the industrial standard linear interpolator and the state-of-the-art Position-Velocity-Time (PVT) interpolator, the results show the significant advantages of the RTIPC in coding, productivity and motion smoothness

    Particle-In-Cell Simulation using Asynchronous Tasking

    Get PDF
    Recently, task-based programming models have emerged as a prominent alternative among shared-memory parallel programming paradigms. Inherently asynchronous, these models provide native support for dynamic load balancing and incorporate data flow concepts to selectively synchronize the tasks. However, tasking models are yet to be widely adopted by the HPC community and their effective advantages when applied to non-trivial, real-world HPC applications are still not well comprehended. In this paper, we study the parallelization of a production electromagnetic particle-in-cell (EM-PIC) code for kinetic plasma simulations exploring different strategies using asynchronous task-based models. Our fully asynchronous implementation not only significantly outperforms a conventional, synchronous approach but also achieves near perfect scaling for 48 cores.Comment: To be published on the 27th European Conference on Parallel and Distributed Computing (Euro-Par 2021
    • …
    corecore