131 research outputs found

    A Component Approach to Collaborative Scientific Software Development: Tools and Techniques Utilized by the Quantum Chemistry Science Application Partnership

    Get PDF
    Cutting-edge scientific computing software is complex, increasingly involving the coupling of multiple packages to combine advanced algorithms or simulations at multiple physical scales. Component-based software engineering (CBSE) has been advanced as a technique for managing this complexity, and complex component applications have been created in the quantum chemistry domain, as well as several other simulation areas, using the component model advocated by the Common Component Architecture (CCA) Forum. While programming models do indeed enable sound software engineering practices, the selection of programming model is just one building block in a comprehensive approach to large-scale collaborative development which must also address interface and data standardization, and language and package interoperability. We provide an overview of the development approach utilized within the Quantum Chemistry Science Application Partnership, identifying design challenges, describing the techniques which we have adopted to address these challenges and highlighting the advantages which the CCA approach offers for collaborative development

    A Path to Operating System and Runtime Support for Extreme Scale Tools

    Get PDF
    In this project, we cast distributed resource access as operations on files in a global name space and developed a common, scalable solution for group operations on distributed processes and files. The resulting solution enables tool and middleware developers to quickly create new scalable software or easily improve the scalability of existing software. The cornerstone of the project was the design of a new programming idiom called group file operations that eliminates iterative behavior when a single process must apply the same set of file operations to a group of related files. To demonstrate our novel and scalable ideas for group file operations and global name space composition, we developed a group file system called TBON-FS that leverages a tree-based overlay network (TBON), specifically MRNet, for logarithmic communication and distributed data aggregation. We also developed proc++, a new synthetic file system co-designed for use in scalable group file operations. Over the course of the project, we evaluated the utility and performance of group file operations, global name space composition, TBON-FS, and proc++ in three case studies. The first study focused on the ease in using group file operations and TBON-FS to quickly develop several new scalable tools for distributed system administration and monitoring. The second study evaluated the integration of group file operation and TBON-FS within the Ganglia Distributed Monitoring System to improve its scalability for clusters. The final study involved the integration of group file operations, TBON-FS, and proc++ within TotalView, the widely-used parallel debugger. For this project, the work of the Oak Ridge National Laboratory (ORNL) team occurred primarily in two directions: bringing the MRNet tree-based overlay network (TBON) implementation to the Cray XT platform, and investigating techniques for predicting the performance of MRNet topologies on such systems. Rogue Wave Software (RWS), formerly TotalView Technologies Inc., worked with the University ofWisconsin (UW) team on the design and prototyping of a scalable version of the TotalView debugger. RWS assisted UW with their "proc++" design effort. RWS assisted UW with strategy for integrating proc++ into TotalView

    Exascale Computing Deployment Challenges

    Get PDF
    As Exascale computing proliferates, we see an accelerating shift towards clusters with thousands of nodes and thousands of cores per node, often on the back of commodity graphics processing units. This paper argues that this drives a once in a generation shift of computation, and that fundamentals of computer science therefore need to be re-examined. Exploiting the full power of Exascale computation will require attention to the fundamentals of programme design and specification, programming language design, systems and software engineering, analytic, performance and cost models, fundamental algorithmic design, and to increasing replacement of human bandwidth by computational analysis. As part of this, we will argue that Exascale computing will require a significant degree of co-design and close attention to the economics underlying the challenges ahead

    A language and development environment for parallel particle methods

    Get PDF
    We present the Parallel Particle-Mesh Environment (PPME), a domain-specific language (DSL) and development environment for numerical simulations using particles and hybrid particle-mesh methods. PPME is the successor of the Parallel Particle-Mesh Language (PPML), a Fortran-based DSL that provides high-level abstrac- tions for the development of distributed-memory particle-mesh simulations. On top of PPML, PPME provides a complete development environment for particle-based simu- lations usin state-of-the-art language engineering and compiler construction techniques. Relying on a novel domain metamodel and formal type system for particle methods, it enables advanced static code correctness checks at the level of particle abstractions, com- plementing the low-level analysis of the compiler. Furthermore, PPME adopts Herbie for improving the accuracy of floating-point expressions and supports a convenient high-level mathematical notation for equations and differential operators. For demonstration purposes, we discuss an example from Discrete Element Methods (DEM) using the classic Silbert model to simulate granular flows

    Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

    Get PDF
    Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.This work was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Grant TIN2016-79637-P(toward Unification of HPC and Big Data Paradigms), in part by the Spanish Ministry of Education under Grant FPU15/00422 TrainingProgram for Academic and Teaching Staff Grant, in part by the Advanced Scientific Computing Research, Office of Science, U.S.Department of Energy, under Contract DE-AC02-06CH11357, and in part by the DOE with under Agreement DE-DC000122495,Program Manager Laura Biven

    Proceedings of the 3rd Workshop on Domain-Specific Language Design and Implementation (DSLDI 2015)

    Full text link
    The goal of the DSLDI workshop is to bring together researchers and practitioners interested in sharing ideas on how DSLs should be designed, implemented, supported by tools, and applied in realistic application contexts. We are both interested in discovering how already known domains such as graph processing or machine learning can be best supported by DSLs, but also in exploring new domains that could be targeted by DSLs. More generally, we are interested in building a community that can drive forward the development of modern DSLs. These informal post-proceedings contain the submitted talk abstracts to the 3rd DSLDI workshop (DSLDI'15), and a summary of the panel discussion on Language Composition

    Parallel programming systems for scalable scientific computing

    Get PDF
    High-performance computing (HPC) systems are more powerful than ever before. However, this rise in performance brings with it greater complexity, presenting significant challenges for researchers who wish to use these systems for their scientific work. This dissertation explores the development of scalable programming solutions for scientific computing. These solutions aim to be effective across a diverse range of computing platforms, from personal desktops to advanced supercomputers.To better understand HPC systems, this dissertation begins with a literature review on exascale supercomputers, massive systems capable of performing 10¹⁸ floating-point operations per second. This review combines both manual and data-driven analyses, revealing that while traditional challenges of exascale computing have largely been addressed, issues like software complexity and data volume remain. Additionally, the dissertation introduces the open-source software tool (called LitStudy) developed for this research.Next, this dissertation introduces two novel programming systems. The first system (called Rocket) is designed to scale all-versus-all algorithms to massive datasets. It features a multi-level software-based cache, a divide-and-conquer approach, hierarchical work-stealing, and asynchronous processing to maximize data reuse, exploit data locality, dynamically balance workloads, and optimize resource utilization. The second system (called Lightning) aims to scale existing single-GPU kernel functions across multiple GPUs, even on different nodes, with minimal code adjustments. Results across eight benchmarks on up to 32 GPUs show excellent scalability.The dissertation concludes by proposing a set of design principles for developing parallel programming systems for scalable scientific computing. These principles, based on lessons from this PhD research, represent significant steps forward in enabling researchers to efficiently utilize HPC systems

    On Autonomic HPC Clouds

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.The long tail of science using HPC facilities is looking nowadays to instant available HPC Clouds as a viable alternative to the long waiting queues of supercomputing centers. While the name of HPC Cloud is suggesting a Cloud service, the current HPC-as-a-Service is mainly an offer of bar metal, better named cluster-on-demand. The elasticity and virtualization benefits of the Clouds are not exploited by HPC-as-a-Service. In this paper we discuss how the HPC Cloud offer can be improved from a particular point of view, of automation. After a reminder of the characteristics of the Autonomic Cloud, we project the requirements and expectations to what we name Autonomic HPC Clouds. Finally, we point towards the expected results of the latest research and development activities related to the topics that were identified.The work related to Autonomic HPC Clouds is supported by the European Commission under grant agreement H2020-6643946 (CloudLightning). The CLoudLightning project proposal was prepared by eight partner institutions, three of them as earlier partners in the COST Action IC1305 NESUS, benefiting from its inputs for the proposal. The section related to Autonomic Clouds is supported by the Romanian UEFISCDI under grant agreement PN-II-ID-PCE-2011- 3-0260 (AMICAS)
    corecore