101 research outputs found
Static Graphs for Coding Productivity in OpenACC
The main contribution of this work is to increase the coding productivity for GPU programming by using the concept of Static Graphs. To do so, we have combined the new CUDA Graph API with the OpenACC programming model. We use as test cases a well-known and widely used problems in HPC and AI: the Particle Swarm Optimization. We complement the OpenACC functionality with the use of CUDA Graph, achieving accelerations of more than one order of magnitude, and a performance very close to a reference and optimized CUDA code. Finally, we propose a new specification to incorporate the concept of Static Graphs into the OpenACC specification.This project has received funding from the EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051.Peer ReviewedPostprint (author's final draft
Towards enhancing coding productivity for GPU programming using static graphs
The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.his research was funded by EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, accessed on 13 April 2022).Peer ReviewedPostprint (published version
Computer-Assisted Coding: Post ICD-10 Implementation
Computer-assisted coding (CAC) has been around since the 1950s and is projecting to reach $4.75 Billion by 2022. However, it has not been on the hospitals’ priority list until 2014 before the implementation of ICD-10 in 2015. Computer-assisted coding is a technology software that helps streamline the coding workflow, reduce backlogs by increasing productivity, and help coders navigate through more extended, more complex charts more quickly. The technology is a type of artificial intelligence. The idea of computer-assisted became more front-line with the implementation of electronic health records (EHRs) and the demands of a more restrictive reimbursement from payers. Accuracy, consistency, and, most assuredly, productivity has been of great importance to all organizations. Due to the increase in advanced technologies, computer-assisted coding has advanced in its performance. However, the question remains as to if it has lived up to the recent hype before the implementation of ICD-10 to increase productivity, accuracy, consistency, improve clinical documentation, etc. This study was conducted using a questionnaire to survey the Tennessee Health Information Management (THIMA) community members as to the effectiveness of computer-assisted coding five years after the implementation of ICD-10. The results of the survey show that there are organizations that are still not using CAC. The overall perception of the respondents feel CAC is not a must-have technology to code efficiently but, with the CAC, the overall coding process is satisfactory but still needs improvement
HEP-Frame: A software engineered framework to aid the development and efficient multicore execution of scientific code
This communication presents an evolutionary soft- ware prototype of a user-centered Highly Efficient Pipelined Framework, HEP-Frame, to aid the development of sustainable parallel scientific code with a flexible pipeline structure. HEP- Frame is the result of a tight collaboration between computational scientists and software engineers: it aims to improve scientists coding productivity, ensuring an efficient parallel execution on a wide set of multicore systems, with both HPC and HTC techniques. Current prototype complies with the requirements of an actual scientific code, includes desirable sustainability features and supports at compile time additional plugin interfaces for other scientific fields. The porting and development productivity was assessed and preliminary efficiency results are promising.This work was supported by FCT (Fundação para a CiĂŞncia e Tecnologia) within Project Scope (UID/CEC/00319/2013), by LIP (LaboratĂłrio de Instrumentação e FĂsica Experimental de PartĂculas) and by Project Search-ON2 (NORTE-07-0162- FEDER-000086), co-funded by the North Portugal Regional Operational Programme (ON.2 - O Novo Norte), under the National Strategic Reference Framework, through the European Regional Development Fund
Automation on the generation of genome scale metabolic models
Background: Nowadays, the reconstruction of genome scale metabolic models is
a non-automatized and interactive process based on decision taking. This
lengthy process usually requires a full year of one person's work in order to
satisfactory collect, analyze and validate the list of all metabolic reactions
present in a specific organism. In order to write this list, one manually has
to go through a huge amount of genomic, metabolomic and physiological
information. Currently, there is no optimal algorithm that allows one to
automatically go through all this information and generate the models taking
into account probabilistic criteria of unicity and completeness that a
biologist would consider. Results: This work presents the automation of a
methodology for the reconstruction of genome scale metabolic models for any
organism. The methodology that follows is the automatized version of the steps
implemented manually for the reconstruction of the genome scale metabolic model
of a photosynthetic organism, {\it Synechocystis sp. PCC6803}. The steps for
the reconstruction are implemented in a computational platform (COPABI) that
generates the models from the probabilistic algorithms that have been
developed. Conclusions: For validation of the developed algorithm robustness,
the metabolic models of several organisms generated by the platform have been
studied together with published models that have been manually curated. Network
properties of the models like connectivity and average shortest mean path of
the different models have been compared and analyzed.Comment: 24 pages, 2 figures, 2 table
SPar: A DSL for High-Level and Productive Stream Parallelism
This paper introduces SPar, an internal C++ Domain-Specific Language (DSL) that supports the development of classic stream parallel applications. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. A set of tools process SPar code (C++ annotated code using the SPar attributes) to generate FastFlow C++ code that exploits the stream parallelism denoted by SPar annotations while targeting shared memory multi-core architectures. We outline the main SPar features along with the main implementation techniques and tools. Also, we show the results of experiments assessing the feasibility of the entire approach as well as SPar's performance and expressiveness
A real-time interpolator for parametric curves
Driven by the ever increasing need for the high-speed high-accuracy machining of freeform surfaces, the interpolators for parametric curves become highly desirable, as they can eliminate the feedrate and acceleration fluctuation due to the discontinuity in the first derivatives along the linear tool path. The interpolation for parametric curves is essentially an optimization problem, and it is extremely difficult to get the time-optimal solution. This paper presents a novel real-time interpolator for parametric curves (RTIPC), which provides a near time-optimal solution. It limits the machine dynamics (axial velocities, axial accelerations and jerk) and contour error through feedrate lookahead and acceleration lookahead operations, meanwhile, the feedrate is maintained as high as possible with minimum fluctuation. The lookahead length is dynamically adjusted to minimize the computation load. And the numerical integration error is considered during the lookahead calculation. Two typical parametric curves are selected for both numerical simulation and experimental validation, a cubic phase plate freeform surface is also machined. The numerical simulation is performed using the software (open access information is in the Acknowledgment section) that implements the proposed RTIPC, the results demonstrate the effectiveness of the RTIPC. The real-time performance of the RTIPC is tested on the in-house developed controller, which shows satisfactory efficiency. Finally, machining trials are carried out in comparison with the industrial standard linear interpolator and the state-of-the-art Position-Velocity-Time (PVT) interpolator, the results show the significant advantages of the RTIPC in coding, productivity and motion smoothness
Particle-In-Cell Simulation using Asynchronous Tasking
Recently, task-based programming models have emerged as a prominent
alternative among shared-memory parallel programming paradigms. Inherently
asynchronous, these models provide native support for dynamic load balancing
and incorporate data flow concepts to selectively synchronize the tasks.
However, tasking models are yet to be widely adopted by the HPC community and
their effective advantages when applied to non-trivial, real-world HPC
applications are still not well comprehended. In this paper, we study the
parallelization of a production electromagnetic particle-in-cell (EM-PIC) code
for kinetic plasma simulations exploring different strategies using
asynchronous task-based models. Our fully asynchronous implementation not only
significantly outperforms a conventional, synchronous approach but also
achieves near perfect scaling for 48 cores.Comment: To be published on the 27th European Conference on Parallel and
Distributed Computing (Euro-Par 2021
- …