55 research outputs found

    Acceleration of a Full-scale Industrial CFD Application with OP2

    Get PDF

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

    Get PDF
    SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain-specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3× speedups on CPU nodes, while GPUs provide 5× speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL)

    Analysis of Severity and Anatomical Distribution of Diabetic Foot Ulcers; A Single Unit Experience

    Get PDF
    Diabetes is the commonest cause of foot ulceration in developing countries leading to severe morbidity and mortality. The main aim of this study was to assess anatomical distribution of diabetic foot lesions, categorize it according to Wagner wound grading, find any association between smoking packs years and the severity of the foot lesions. Also to assess the relationship between the bony deformities and anatomical distribution of the ulcers. This was a cross sectional descriptive study conducted in a casualty surgical unit in a tertiary care teaching hospital for a period of 4 months. 91 diabetic patients with a diabetes related foot lesion were enrolled after simple randomization. Pretested interviewer administered questionnaire was used to gather data. Variety of soft tissue and bony changes of diabetic foot were assessed. Lesions were classified according to Wagner classification. Data was analysed using Epidata software. From the 91 participants, 55 (61 %) were males and 36 (39%) females. Mean age was at 60. 12 ± 10. 19 years. Median diabetes duration was 10 years (Interquartile range = 4.25 – 16.75). Wagner grade 1, 2, 3, 4 and 5 were17.7%, 40.65%, 28.8%, 13.3% and 0% respectively. Commonest ulcer location was margins of foot (31.87%). There was no statistically significant association between the pack years of cigarette smoking males and severity of foot lesions (Spearman’s rank correlation coefficient = - 0.037, p = 0.82). Patients with claw and hammer toe deformities had their ulcers located in fingertips and toes (p = 0.0185). But there was no statistically significant association with flat foot deformity and ulcer distribution on any particular anatomical area in the foot (p = 0.0511). In conclusion there is a statistically significant association between toe deformities and ulcer occurrence in finger tips. No significant correlation between severity of smoking and severity of foot lesions among males is present.  KEYWORDS: Diabetic ulcers, Diabetic foot lesions, Wagner classification, Dlcer distributio

    Prognostic model to predict postoperative acute kidney injury in patients undergoing major gastrointestinal surgery based on a national prospective observational cohort study.

    Get PDF
    Background: Acute illness, existing co-morbidities and surgical stress response can all contribute to postoperative acute kidney injury (AKI) in patients undergoing major gastrointestinal surgery. The aim of this study was prospectively to develop a pragmatic prognostic model to stratify patients according to risk of developing AKI after major gastrointestinal surgery. Methods: This prospective multicentre cohort study included consecutive adults undergoing elective or emergency gastrointestinal resection, liver resection or stoma reversal in 2-week blocks over a continuous 3-month period. The primary outcome was the rate of AKI within 7 days of surgery. Bootstrap stability was used to select clinically plausible risk factors into the model. Internal model validation was carried out by bootstrap validation. Results: A total of 4544 patients were included across 173 centres in the UK and Ireland. The overall rate of AKI was 14·2 per cent (646 of 4544) and the 30-day mortality rate was 1·8 per cent (84 of 4544). Stage 1 AKI was significantly associated with 30-day mortality (unadjusted odds ratio 7·61, 95 per cent c.i. 4·49 to 12·90; P < 0·001), with increasing odds of death with each AKI stage. Six variables were selected for inclusion in the prognostic model: age, sex, ASA grade, preoperative estimated glomerular filtration rate, planned open surgery and preoperative use of either an angiotensin-converting enzyme inhibitor or an angiotensin receptor blocker. Internal validation demonstrated good model discrimination (c-statistic 0·65). Discussion: Following major gastrointestinal surgery, AKI occurred in one in seven patients. This preoperative prognostic model identified patients at high risk of postoperative AKI. Validation in an independent data set is required to ensure generalizability

    Exploring the role of individual level and firm level dynamic capabilities in SMEs’ internationalization

    Get PDF
    This paper presents a multi-level model that examines the impact of dynamic capabilities on the internationalization of SMEs while taking into account the interactions among them. The purpose of the research is to understand the applicability of dynamic capabilities at the individual and the firm level to the SME internationalization process in developing country context and to assess to what extent a firm’s asset position and individual level dynamic capabilities influence the generation of firm level dynamic capabilities in SMEs. First, the dynamic capabilities theory was theoretically linked to the internationalization phenomenon. The relationships among firm-level dynamic capabilities, individual-level dynamic capabilities (owner specific dynamic capabilities), and internationalization were identified. The research framework and hypotheses were developed and empirically tested with 197 SMEs. The findings established that owner-specific dynamic capabilities have a positive influence on both firm dynamic capabilities and internationalization, and firm dynamic capabilities positively influence internationalization. It was also found that the market assets position measured as perceptual environmental dynamism positively influenced firm dynamic capabilities but structural and reputational asset positions of SMEs did not influence generation of firm dynamic capabilities. Moreover, firm dynamic capabilities had a mediation effect in the relationship between owner-specific dynamic capabilities and internationalization. Theoretically, this confirms the relevance of dynamic capability theory to internationalization and the possibility of integrating existing internationalization theories. Entrepreneurs, SME managers, and policy-makers could gain valuable insights on how entrepreneur and firm capabilities lead to better international prospects from this outcome

    Acceleration of a Full-scale Industrial CFD Application with OP2

    Get PDF
    Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc., capable of performing complex simulations over highly detailed unstructured mesh geometries. Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework, demonstrating the viability of such a high-level programming approach. OP2 targets the domain of unstructured mesh problems and enables execution on a range of back-end hardware platforms. We chart the conversion of Hydra to OP2, and map out the key difficulties encountered in the process. Specifically we show how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application and how different optimizations targeting contrasting parallel architectures can be applied to the whole application, seamlessly, reducing developer effort and increasing code longevity. Performance results demonstrate that not only the same runtime performance as that of the hand-tuned original code could be achieved, but it can be significantly improved on conventional processor systems, and many-core systems. Our results provide evidence of how high-level frameworks such as OP2 enable portability across a wide range of contrasting platforms and their significant utility in achieving near-optimal performance without the intervention of the application programmer

    Designing OP2 for GPU Architectures

    No full text
    OP2 is an “active ” library framework for the solution of unstructured mesh applications. It aims to decouple the specification of a scientific application from its parallel implementation to achieve code longevity and near-optimal performance through re-targeting the back-end to different multi-core/many-core hardware. This paper presents the design of the current OP2 library for generating efficient code targeting contemporary GPU platforms. In this we focus on some of the software architecture design choices and low-level optimizations to maximize performance on NVIDIA’s Fermi architecture GPUs. The performance impact of these design choices is quantified on two NVIDIA GPUs (GTX560Ti, Tesla C2070) using the end-to-end performance of an industrial representative CFD application developed using the OP2 API. Results show that for each system, a number of key configuration parameters need to be set carefully in order to gain good performance. Utilizing a recently developed auto-tuning framework, we explore the effect of these parameters, their limitations and insights into optimizations for improved performance. Keywords: Performance, GPU, CUDA, Unstructured mesh applications, auto-tunin

    Post-operative critical care and outcomes of limb replantation: Experience in a developing country

    No full text
    Replantation is the treatment of choice for traumatic amputation. Its success rates vary, reaching 80% in world\u27s best centres. This study analyses management practices of replantation in a regional centre in a developing country. Out of six replantations, four were successful. The median warm ischaemia time of the severed limb was 4.5 h (range 1–13.5) and the median duration of general anaesthesia required for initial surgery was 6.25 h (range 4.7–8.0). All patients needed intensive care following replantation for a median of 7 days (range 5–15). Pulse oximetry values were observed to be the same in the graft and the patient in successful cases. Two grafts failed
    corecore