45 research outputs found

    Towards Test Driven Development for Computational Science with pFUnit

    Get PDF
    Developers working in Computational Science & Engineering (CSE)/High Performance Computing (HPC) must contend with constant change due to advances in computing technology and science. Test Driven Development (TDD) is a methodology that mitigates software development risks due to change at the cost of adding comprehensive and continuous testing to the development process. Testing frameworks tailored for CSE/HPC, like pFUnit, can lower the barriers to such testing, yet CSE software faces unique constraints foreign to the broader software engineering community. Effective testing of numerical software requires a comprehensive suite of oracles, i.e., use cases with known answers, as well as robust estimates for the unavoidable numerical errors associated with implementation with finite-precision arithmetic. At first glance these concerns often seem exceedingly challenging or even insurmountable for real-world scientific applications. However, we argue that this common perception is incorrect and driven by (1) a conflation between model validation and software verification and (2) the general tendency in the scientific community to develop relatively coarse-grained, large procedures that compound numerous algorithmic steps.We believe TDD can be applied routinely to numerical software if developers pursue fine-grained implementations that permit testing, neatly side-stepping concerns about needing nontrivial oracles as well as the accumulation of errors. We present an example of a successful, complex legacy CSE/HPC code whose development process shares some aspects with TDD, which we contrast with current and potential capabilities. A mix of our proposed methodology and framework support should enable everyday use of TDD by CSE-expert developers

    Software Testing and Verification in Climate Model Development

    Get PDF
    Over the past 30 years most climate models have grown from relatively simple representations of a few atmospheric processes to a complex multi-disciplinary system. Computer infrastructure over that period has gone from punch card mainframes to modem parallel clusters. Model implementations have become complex, brittle, and increasingly difficult to extend and maintain. Existing verification processes for model implementations rely almost exclusively upon some combination of detailed analysis of output from full climate simulations and system-level regression tests. In additional to being quite costly in terms of developer time and computing resources, these testing methodologies are limited in terms of the types of defects that can be detected, isolated and diagnosed. Mitigating these weaknesses of coarse-grained testing with finer-grained "unit" tests has been perceived as cumbersome and counter-productive. In the commercial software sector, recent advances in tools and methodology have led to a renaissance for systematic fine-grained testing. We discuss the availability of analogous tools for scientific software and examine benefits that similar testing methodologies could bring to climate modeling software. We describe the unique challenges faced when testing complex numerical algorithms and suggest techniques to minimize and/or eliminate the difficulties

    Test Driven Development of Scientific Models

    Get PDF
    Test-Driven Development (TDD) is a software development process that promises many advantages for developer productivity and has become widely accepted among professional software engineers. As the name suggests, TDD practitioners alternate between writing short automated tests and producing code that passes those tests. Although this overly simplified description will undoubtedly sound prohibitively burdensome to many uninitiated developers, the advent of powerful unit-testing frameworks greatly reduces the effort required to produce and routinely execute suites of tests. By testimony, many developers find TDD to be addicting after only a few days of exposure, and find it unthinkable to return to previous practices. Of course, scientific/technical software differs from other software categories in a number of important respects, but I nonetheless believe that TDD is quite applicable to the development of such software and has the potential to significantly improve programmer productivity and code quality within the scientific community. After a detailed introduction to TDD, I will present the experience within the Software Systems Support Office (SSSO) in applying the technique to various scientific applications. This discussion will emphasize the various direct and indirect benefits as well as some of the difficulties and limitations of the methodology. I will conclude with a brief description of pFUnit, a unit testing framework I co-developed to support test-driven development of parallel Fortran applications

    POSITION PAPER - pFLogger: The Parallel Fortran Logging Framework for HPC Applications

    Get PDF
    In the context of high performance computing (HPC), software investments in support of text-based diagnostics, which monitor a running application, are typically limited compared to those for other types of IO. Examples of such diagnostics include reiteration of configuration parameters, progress indicators, simple metrics (e.g., mass conservation, convergence of solvers, etc.), and timers. To some degree, this difference in priority is justifiable as other forms of output are the primary products of a scientific model and, due to their large data volume, much more likely to be a significant performance concern. In contrast, text-based diagnostic content is generally not shared beyond the individual or group running an application and is most often used to troubleshoot when something goes wrong. We suggest that a more systematic approach enabled by a logging facility (or 'logger') similar to those routinely used by many communities would provide significant value to complex scientific applications. In the context of high-performance computing, an appropriate logger would provide specialized support for distributed and shared-memory parallelism and have low performance overhead. In this paper, we present our prototype implementation of pFlogger - a parallel Fortran-based logging framework, and assess its suitability for use in a complex scientific application

    Test Driven Development of Scientific Models

    Get PDF
    Test-Driven Development (TDD), a software development process that promises many advantages for developer productivity and software reliability, has become widely accepted among professional software engineers. As the name suggests, TDD practitioners alternate between writing short automated tests and producing code that passes those tests. Although this overly simplified description will undoubtedly sound prohibitively burdensome to many uninitiated developers, the advent of powerful unit-testing frameworks greatly reduces the effort required to produce and routinely execute suites of tests. By testimony, many developers find TDD to be addicting after only a few days of exposure, and find it unthinkable to return to previous practices.After a brief overview of the TDD process and my experience in applying the methodology for development activities at Goddard, I will delve more deeply into some of the challenges that are posed by numerical and scientific software as well as tools and implementation approaches that should address those challenges

    Leveraging Data Intensive Computing to Support Automated Event Services

    Get PDF
    A large portion of Earth Science investigations is phenomenon- or event-based, such as the studies of Rossby waves, mesoscale convective systems, and tropical cyclones. However, except for a few high-impact phenomena, e.g. tropical cyclones, comprehensive records are absent for the occurrences or events of these phenomena. Phenomenon-based studies therefore often focus on a few prominent cases while the lesser ones are overlooked. Without an automated means to gather the events, comprehensive investigation of a phenomenon is at least time-consuming if not impossible. An Earth Science event (ES event) is defined here as an episode of an Earth Science phenomenon. A cumulus cloud, a thunderstorm shower, a rogue wave, a tornado, an earthquake, a tsunami, a hurricane, or an EI Nino, is each an episode of a named ES phenomenon," and, from the small and insignificant to the large and potent, all are examples of ES events. An ES event has a finite duration and an associated geolocation as a function of time; its therefore an entity in four-dimensional . (4D) spatiotemporal space. The interests of Earth scientists typically rivet on Earth Science phenomena with potential to cause massive economic disruption or loss of life, but broader scientific curiosity also drives the study of phenomena that pose no immediate danger. We generally gain understanding of a given phenomenon by observing and studying individual events - usually beginning by identifying the occurrences of these events. Once representative events are identified or found, we must locate associated observed or simulated data prior to commencing analysis and concerted studies of the phenomenon. Knowledge concerning the phenomenon can accumulate only after analysis has started. However, except for a few high-impact phenomena. such as tropical cyclones and tornadoes, finding events and locating associated data currently may take a prohibitive amount of time and effort on the part of an individual investigator. And even for these high-impact phenomena, the availability of comprehensive records is still only a recent development. A major reason for the lack of comprehensive ,records for the majority of the ES phenomena is the perception that they do not pose immediate and/or severe threat to life and property and are thus not consistently tracked. monitored, and catalogued. Many phenomena even lack commonly accepted criteria for definitions. However. the lack of comprehensive records is also due to the increasingly prohibitive volume of observations and model data that must be examined. NASA Earth Observing System Data Information System (EOSDIS) alone archives several petabytes (PB) of satellite remote sensing data and steadily increases. All of these factors contribute to the difficulty of methodically identifying events corresponding to a given phenomenon and significantly impede systematic investigations. In the following we present a couple motivating scenarios, demonstrating the issues faced by Earth scientists studying ES phenomena

    Habitable Climate Scenarios for Proxima Centauri b With a Dynamic Ocean

    Full text link
    The nearby exoplanet Proxima Centauri b will be a prime future target for characterization, despite questions about its retention of water. Climate models with static oceans suggest that an Earth-like Proxima b could harbor a small dayside region of surface liquid water at fairly warm temperatures despite its weak instellation. We present the first 3-dimensional climate simulations of Proxima b with a dynamic ocean. We find that an ocean-covered Proxima b could have a much broader area of surface liquid water but at much colder temperatures than previously suggested, due to ocean heat transport and depression of the freezing point by salinity. Elevated greenhouse gas concentrations do not necessarily produce more open ocean area because of possible dynamic regime transitions. For an evolutionary path leading to a highly saline present ocean, Proxima b could conceivably be an inhabited, mostly open ocean planet dominated by halophilic life. For an ocean planet in 3:2 spin-orbit resonance, a permanent tropical waterbelt exists for moderate eccentricity. Simulations of Proxima Centauri b may also be a model for the habitability of planets receiving similar instellation from slightly cooler or warmer stars, e.g., in the TRAPPIST-1, LHS 1140, GJ 273, and GJ 3293 systems.Comment: Submitted to Astrobiology; 38 pages, 12 figures, 5 table

    Using SpF to Achieve Petascale for Legacy Pseudospectral Applications

    Get PDF
    Pseudospectral (PS) methods possess a number of characteristics (e.g., efficiency, accuracy, natural boundary conditions) that are extremely desirable for dynamo models. Unfortunately, dynamo models based upon PS methods face a number of daunting challenges, which include exposing additional parallelism, leveraging hardware accelerators, exploiting hybrid parallelism, and improving the scalability of global memory transposes. Although these issues are a concern for most models, solutions for PS methods tend to require far more pervasive changes to underlying data and control structures. Further, improvements in performance in one model are difficult to transfer to other models, resulting in significant duplication of effort across the research community. We have developed an extensible software framework for pseudospectral methods called SpF that is intended to enable extreme scalability and optimal performance. Highlevel abstractions provided by SpF unburden applications of the responsibility of managing domain decomposition and load balance while reducing the changes in code required to adapt to new computing architectures. The key design concept in SpF is that each phase of the numerical calculation is partitioned into disjoint numerical kernels that can be performed entirely inprocessor. The granularity of domain decomposition provided by SpF is only constrained by the datalocality requirements of these kernels. SpF builds on top of optimized vendor libraries for common numerical operations such as transforms, matrix solvers, etc., but can also be configured to use open source alternatives for portability. SpF includes several alternative schemes for global data redistribution and is expected to serve as an ideal testbed for further research into optimal approaches for different network architectures. In this presentation, we will describe our experience in porting legacy pseudospectral models, MoSST and DYNAMO, to use SpF as well as present preliminary performance results provided by the improved scalability

    Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis

    Get PDF
    MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows

    A Roundtable for Victoria M. Grieve, Little Cold Warriors: American Childhood in the 1950s

    Get PDF
    Dr. Thomas Field introduces a roundtable discussion of Victoria M. Grieve\u27s Little Cold Warriors: American Childhood in the 1950s, providing a synopsis of reviewer critiques before the reviewers expand on their views and the author responds
    corecore