7 research outputs found

    Enabling GPU Accelerated Computing in the SUNDIALS Time Integration Library

    Full text link
    As part of the Exascale Computing Project (ECP), a recent focus of development efforts for the SUite of Nonlinear and DIfferential/ALgebraic equation Solvers (SUNDIALS) has been to enable GPU-accelerated time integration in scientific applications at extreme scales. This effort has resulted in several new GPU-enabled implementations of core SUNDIALS data structures, support for programming paradigms which are aware of the heterogeneous architectures, and the introduction of utilities to provide new points of flexibility. In this paper, we discuss our considerations, both internal and external, when designing these new features and present the features themselves. We also present performance results for several of the features on the Summit supercomputer and early access hardware for the Frontier supercomputer, which demonstrate negligible performance overhead resulting from the additional infrastructure and significant speedups when using both NVIDIA and AMD GPUs

    A2Cloud: Practical Application-to-Cloud Matching To Empower Scientific Computing on a Budget

    No full text
    Primarily undergraduate universities and small businesses have long been at a disadvantage when it comes to scientific computing resources. On-premise computing clusters have a high barrier to entry and are often not justifiable for these users. Hours on supercomputing resources of big research labs are hard to come by and maintain. Modern cloud computing provides an attainable alternative, however, due to the large quantity of cloud solutions it is a challenge to select the most effective one. For that reason, we present a model that matches scientific applications to cloud instances for high application performance. Our model constructs two vectors: the application vector, which characterizes a program, and the probabilistic cloud vector which characterizes a cloud instance. The application vector components are application-specific constants such as the number of floating-point operations, memory usage, and disk usage. The cloud vector components are independent random variables that correspond to the application vector components such as floating-point operations per second, memory bandwidth, and disk bandwidth. The model then performs an inner-product of the two vectors to produce an Application-to-Cloud (A2Cloud) score, which quantifies the Application-to-Cloud match. We encapsulate the A2Cloud model in a user-friendly A2Cloud framework that inputs a test application and outputs cloud instance recommendations. We demonstrate the model and framework by conducting 162 application executions across nice cloud instances. Our tests yield an average A2Cloud matching rate of 6 for every 9 application-instance pairs with a mean absolute difference of +/- 1.08 ranks

    A2Cloud: Practical Application-to-Cloud Matching To Empower Scientific Computing on a Budget

    No full text
    Primarily undergraduate universities and small businesses have long been at a disadvantage when it comes to scientific computing resources. On-premise computing clusters have a high barrier to entry and are often not justifiable for these users. Hours on supercomputing resources of big research labs are hard to come by and maintain. Modern cloud computing provides an attainable alternative, however, due to the large quantity of cloud solutions it is a challenge to select the most effective one. For that reason, we present a model that matches scientific applications to cloud instances for high application performance. Our model constructs two vectors: the application vector, which characterizes a program, and the probabilistic cloud vector which characterizes a cloud instance. The application vector components are application-specific constants such as the number of floating-point operations, memory usage, and disk usage. The cloud vector components are independent random variables that correspond to the application vector components such as floating-point operations per second, memory bandwidth, and disk bandwidth. The model then performs an inner-product of the two vectors to produce an Application-to-Cloud (A2Cloud) score, which quantifies the Application-to-Cloud match. We encapsulate the A2Cloud model in a user-friendly A2Cloud framework that inputs a test application and outputs cloud instance recommendations. We demonstrate the model and framework by conducting 162 application executions across nice cloud instances. Our tests yield an average A2Cloud matching rate of 6 for every 9 application-instance pairs with a mean absolute difference of +/- 1.08 ranks

    Recenti acquisizioni sull'endocitobiosi batterica nei Blattaria durante le prime fasi dello sviluppo embrionale

    No full text
    Physiological characteristics of the Kiwifruit has led to a widespread adoption of controlled pollination, a process that involves humans manually applying pollen to flowers. There exist several controlled pollination methods, however, all of them are either extremely labor intensive or inefficiently utilize the expensive pollen mixture. Thus, we propose an autonomous robotic system which can detect flowers needing pollination in real-time and then deliver the pollen to the flowers in a controllable and repeatable manner. Additionally, the system produces valuable data about the orchard which we capture and store for external processing. To achieve flower recognition, we make use of state-of-the-art computer vision techniques. Specifically, we re-train the Inception-v3 Convolutional Neural Network on more than 1000 images from Kiwifruit orchards to achieve a validation accuracy of 76.2% and a testing accuracy of 92.5%. The flower detection system triggers eight different solenoids which control pollen emission through eight diffusers. Using computational fluid dynamics, we design diffusers which accurately project pollen onto the orchard canopy, and thus minimize waste. In an effort to make our system user-friendly, all of the electronics onboard the system are connected to a local area network, allowing them to be configurable from an iOS application running on an Apple iPad. Our preliminary testing in an artificial environment provides reasons for optimism, as our system successfully identifies flowers in printed images of a real orchard canopy. With help from our industry partner, Antles Pollen, the system will be tested in real kiwifruit orchard soon

    Enabling New Flexibility in the SUNDIALS Suite of Nonlinear and Differential/Algebraic Equation Solvers

    Full text link
    In recent years, the SUite of Nonlinear and DIfferential/ALgebraic equation Solvers (SUNDIALS) has been redesigned to better enable the use of application-specific and third-party algebraic solvers and data structures. Throughout this work, we have adhered to specific guiding principles that minimized the impact to current users while providing maximum flexibility for later evolution of solvers and data structures. The redesign was done through creation of new classes for linear and nonlinear solvers, enhancements to the vector class, and the creation of modern Fortran interfaces that leverage interoperability features of the Fortran 2003 standard. The vast majority of this work has been performed "behind-the-scenes," with minimal changes to the user interface and no reduction in solver capabilities or performance. However, these changes now allow advanced users to create highly customized solvers that exploit their problem structure, enabling SUNDIALS use on extreme-scale, heterogeneous computational architectures

    A2Cloud-RF: A Random Forest Based Statistical Framework to Guide Resource Selection for High-performance Scientific Computing on the Cloud

    No full text
    This article proposes a random-forest based A2Cloud framework to match scientific applications with Cloud providers and their instances for high performance. The framework leverages four engines for this task: PERF engine, Cloud trace engine, A2Cloud-ext engine, and the random forest classifier (RFC) engine. The PERF engine profiles the application to obtain performance characteristics, including the number of single-precision (SP) floating-point operations (FLOPs), double-precision (DP) FLOPs, x87 operations, memory accesses, and disk accesses. The Cloud trace engine obtains the corresponding performance characteristics of the selected Cloud instances including: SP floating point operations per second (FLOPS), DP FLOPS, x87 operations per second, memory bandwidth, and disk bandwidth. The A2Cloud-ext engine uses the application and Cloud instance characteristics to generate objective scores that represent the application-to-Cloud match. The RFC engine uses these objective scores to generate two types of random forests to assist users with rapid analysis: application-specific random forests (ARF) and application-class based random forests. The ARF consider only the input application\u27s characteristics to generate a random forest and provide numerical ratings to the selected Cloud instances. To generate the application-class based random forests, the RFC engine downloads the application profiles and scores of previously tested applications that perform similar to the input application. Using these data, the RFC engine creates a random forest for instance recommendation. We exhaustively test this framework using eight real-world applications across 12 instances from different Cloud providers. Our tests show significant statistical agreement between the instance ratings given by the framework and the ratings obtained via actual Cloud executions
    corecore