761 research outputs found

    Beyond 16GB : out-of-core stencil computations

    Get PDF
    Stencil computations are a key class of applications, widely used in the scientific computing community, and a class that has particularly benefited from performance improvements on architectures with high memory bandwidth. Unfortunately, such architectures come with a limited amount of fast memory, which is limiting the size of the problems that can be efficiently solved. In this paper, we address this challenge by applying the well-known cache-blocking tiling technique to large scale stencil codes implemented using the OPS domain specific language, such as CloverLeaf 2D, CloverLeaf 3D, and OpenSBLI. We introduce a number of techniques and optimisations to help manage data resident in fast memory, and minimise data movement. Evaluating our work on Intel's Knights Landing Platform as well as NVIDIA P100 GPUs, we demonstrate that it is possible to solve 3 times larger problems than the on-chip memory size with at most 15% loss in efficiency

    Acceleration of a Full-scale Industrial CFD Application with OP2

    Get PDF

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    Tannic acid is not mutagenic in germ cells but weakly genotoxic in somatic cells of Drosophila melanogaster

    Get PDF
    Tannic acid (TA) was tested for genotoxic activity in three different assays (1-3) in Drosophila melanogaster by feeding of larvae or adult flies. TA did not induce sex-linked recessive lethals (1) nor sex-chromosome loss, mosaicism or non-disjunction (2) in male germ cells. In the wing somatic mutation and recombination test (SMART) (3) TA was found to be toxic for larvae of the high bioactivation cross and produced a weak positive response. These results suggest that this compound, when administered orally to larvae or adults of D.melanogaster, is not mutagenic and clastogenic in male germ cells, but weakly genotoxic in somatic cells of the wing imaginal dis

    OP2-Clang : a source-to-source translator using Clang/LLVM LibTooling

    Get PDF
    Domain Specific Languages or Active Library frameworks have recently emerged as an important method for gaining performance portability, where an application can be efficiently executed on a wide range of HPC architectures without significant manual modifications. Embedded DSLs such as OP2, provides an API embedded in general purpose languages such as C/C++/Fortran. They rely on source-to-source translation and code refactorization to translate the higher-level API calls to platform specific parallel implementations. OP2 targets the solution of unstructured-mesh computations, where it can generate a variety of parallel implementations for execution on architectures such as CPUs, GPUs, distributed memory clusters and heterogeneous processors making use of a wide range of platform specific optimizations. Compiler tool-chains supporting source-to-source translation of code written in mainstream languages currently lack the capabilities to carry out such wide-ranging code transformations. Clang/LLVM’s Tooling library (LibTooling) has long been touted as having such capabilities but have only demonstrated its use in simple source refactoring tasks. In this paper we introduce OP2-Clang, a source-to-source translator based on LibTooling, for OP2’s C/C++ API, capable of generating target parallel code based on SIMD, OpenMP, CUDA and their combinations with MPI. OP2-Clang is designed to significantly reduce maintenance, particularly making it easy to be extended to generate new parallelizations and optimizations for hardware platforms. In this research, we demonstrate its capabilities including (1) the use of LibTooling’s AST matchers together with a simple strategy that use parallelization templates or skeletons to significantly reduce the complexity of generating radically different and transformed target code and (2) chart the challenges and solution to generating optimized parallelizations for OpenMP, SIMD and CUDA. Results indicate that OP2-Clang produces near-identical parallel code to that of OP2’s current source-to-source translator. We believe that the lessons learnt in OP2-Clang can be readily applied to developing other similar source-to-source translators, particularly for DSLs

    Acceleration of a full-scale industrial CFD application with OP2

    Get PDF
    Hydra is a full-scale industrial CFD application used for the design of turbomachinery at Rolls Royce plc., capable of performing complex simulations over highly detailed unstructured mesh geometries. Hydra presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging platforms. We present research in achieving this goal through the OP2 domain-specific high-level framework, demonstrating the viability of such a high-level programming approach. OP2 targets the domain of unstructured mesh problems and enables execution on a range of back-end hardware platforms. We chart the conversion of Hydra to OP2, and map out the key difficulties encountered in the process. Specifically we show how different parallel implementations can be achieved with an active library framework, even for a highly complicated industrial application and how different optimizations targeting contrasting parallel architectures can be applied to the whole application, seamlessly, reducing developer effort and increasing code longevity. Performance results demonstrate that not only the same runtime performance as that of the hand-tuned original code could be achieved, but it can be significantly improved on conventional processor systems, and many-core systems. Our results provide evidence of how high-level frameworks such as OP2 enable portability across a wide range of contrasting platforms and their significant utility in achieving high performance without the intervention of the application programmer

    Improved Network Performance via Antagonism: From Synthetic Rescues to Multi-drug Combinations

    Get PDF
    Recent research shows that a faulty or sub-optimally operating metabolic network can often be rescued by the targeted removal of enzyme-coding genes--the exact opposite of what traditional gene therapy would suggest. Predictions go as far as to assert that certain gene knockouts can restore the growth of otherwise nonviable gene-deficient cells. Many questions follow from this discovery: What are the underlying mechanisms? How generalizable is this effect? What are the potential applications? Here, I will approach these questions from the perspective of compensatory perturbations on networks. Relations will be drawn between such synthetic rescues and naturally occurring cascades of reaction inactivation, as well as their analogues in physical and other biological networks. I will specially discuss how rescue interactions can lead to the rational design of antagonistic drug combinations that select against resistance and how they can illuminate medical research on cancer, antibiotics, and metabolic diseases.Comment: Online Open "Problems and Paradigms" articl

    Enrichment and aggregation of topological motifs are independent organizational principles of integrated interaction networks

    Full text link
    Topological network motifs represent functional relationships within and between regulatory and protein-protein interaction networks. Enriched motifs often aggregate into self-contained units forming functional modules. Theoretical models for network evolution by duplication-divergence mechanisms and for network topology by hierarchical scale-free networks have suggested a one-to-one relation between network motif enrichment and aggregation, but this relation has never been tested quantitatively in real biological interaction networks. Here we introduce a novel method for assessing the statistical significance of network motif aggregation and for identifying clusters of overlapping network motifs. Using an integrated network of transcriptional, posttranslational and protein-protein interactions in yeast we show that network motif aggregation reflects a local modularity property which is independent of network motif enrichment. In particular our method identified novel functional network themes for a set of motifs which are not enriched yet aggregate significantly and challenges the conventional view that network motif enrichment is the most basic organizational principle of complex networks.Comment: 12 pages, 5 figure
    corecore