4,787 research outputs found

    [Subject benchmark statement]: computing

    Get PDF

    GT4Py: High Performance Stencils for Weather and Climate Applications using Python

    Full text link
    All major weather and climate applications are currently developed using languages such as Fortran or C++. This is typical in the domain of high performance computing (HPC), where efficient execution is an important concern. Unfortunately, this approach leads to implementations that intermix optimizations for specific hardware architectures with the high-level numerical methods that are typical for the domain. This leads to code that is verbose, difficult to extend and maintain, and difficult to port to different hardware architectures. Here, we propose a different strategy based on GT4Py (GridTools for Python). GT4Py is a Python framework to write weather and climate applications that includes a high-level embedded domain specific language (DSL) to write stencil computations. The toolchain integrated in GT4Py enables automatic code-generation,to obtain the performance of state-of-the-art C++ and CUDA implementations. The separation of concerns between the mathematical definitions and the actual implementations allows for performance portability of the computations on a wide range of computing architectures, while being embedded in Python allows easy access to the tools of the Python ecosystem to enhance the productivity of the scientists and facilitate integration in complex workflows. Here, the initial release of GT4Py is described, providing an overview of the current state of the framework and performance results showing how GT4Py can outperform pure Python implementations by orders of magnitude.Comment: 12 page

    A Framework for Megascale Agent Based Model Simulations on Graphics Processing Units

    Get PDF
    Agent-based modeling is a technique for modeling dynamic systems from the bottom up. Individual elements of the system are represented computationally as agents. The system-level behaviors emerge from the micro-level interactions of the agents. Contemporary state-of-the-art agent-based modeling toolkits are essentially discrete-event simulators designed to execute serially on the Central Processing Unit (CPU). They simulate Agent-Based Models (ABMs) by executing agent actions one at a time. In addition to imposing an un-natural execution order, these toolkits have limited scalability. In this article, we investigate data-parallel computer architectures such as Graphics Processing Units (GPUs) to simulate large scale ABMs. We have developed a series of efficient, data parallel algorithms for handling environment updates, various agent interactions, agent death and replication, and gathering statistics. We present three fundamental innovations that provide unprecedented scalability. The first is a novel stochastic memory allocator which enables parallel agent replication in O(1) average time. The second is a technique for resolving precedence constraints for agent actions in parallel. The third is a method that uses specialized graphics hardware, to gather and process statistical measures. These techniques have been implemented on a modern day GPU resulting in a substantial performance increase. We believe that our system is the first ever completely GPU based agent simulation framework. Although GPUs are the focus of our current implementations, our techniques can easily be adapted to other data-parallel architectures. We have benchmarked our framework against contemporary toolkits using two popular ABMs, namely, SugarScape and StupidModel.GPGPU, Agent Based Modeling, Data Parallel Algorithms, Stochastic Simulations

    Graduate Catalog Center for Computer and Information Sciences

    Get PDF

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Computing and Information Science

    Full text link
    Cornell University Courses of Study Vol. 102 2010/201

    SNAP, Crackle, WebWindows!

    Get PDF
    We elaborate the SNAP---Scalable (ATM) Network and (PC) Platforms---view of computing in the year 2000. The World Wide Web will continue its rapid evolution, and in the future, applications will not be written for Windows NT/95 or UNIX, but rather for WebWindows with interfaces defined by the standards of Web servers and clients. This universal environment will support WebTop productivity tools, such as WebWord, WebLotus123, and WebNotes built in modular dynamic fashion, and undermining the business model for large software companies. We define a layered WebWindows software architecture in which applications are built on top of multi-use services. We discuss examples including business enterprise systems (IntraNets), health care, financial services and education. HPCC is implicit throughout this discussion for there is no larger parallel system than the World Wide metacomputer. We suggest building the MPP programming environment in terms of pervasive sustainable WebWindows technologies. In particular, WebFlow will support naturally dataflow integrating data and compute intensive applications on distributed heterogeneous systems

    Heracles: A Tool for Fast RTL-Based Design Space Exploration of Multicore Processors

    Get PDF
    This paper presents Heracles, an open-source, functional, parameterized, synthesizable multicore system toolkit. Such a multi/many-core design platform is a powerful and versatile research and teaching tool for architectural exploration and hardware-software co-design. The Heracles toolkit comprises the soft hardware (HDL) modules, application compiler, and graphical user interface. It is designed with a high degree of modularity to support fast exploration of future multicore processors of di erent topologies, routing schemes, processing elements (cores), and memory system organizations. It is a component-based framework with parameterized interfaces and strong emphasis on module reusability. The compiler toolchain is used to map C or C++ based applications onto the processing units. The GUI allows the user to quickly con gure and launch a system instance for easy factorial development and evaluation. Hardware modules are implemented in synthesizable Verilog and are FPGA platform independent. The Heracles tool is freely available under the open-source MIT license at: http://projects.csail.mit.edu/heracle

    From a Competition for Self-Driving Miniature Cars to a Standardized Experimental Platform: Concept, Models, Architecture, and Evaluation

    Full text link
    Context: Competitions for self-driving cars facilitated the development and research in the domain of autonomous vehicles towards potential solutions for the future mobility. Objective: Miniature vehicles can bridge the gap between simulation-based evaluations of algorithms relying on simplified models, and those time-consuming vehicle tests on real-scale proving grounds. Method: This article combines findings from a systematic literature review, an in-depth analysis of results and technical concepts from contestants in a competition for self-driving miniature cars, and experiences of participating in the 2013 competition for self-driving cars. Results: A simulation-based development platform for real-scale vehicles has been adapted to support the development of a self-driving miniature car. Furthermore, a standardized platform was designed and realized to enable research and experiments in the context of future mobility solutions. Conclusion: A clear separation between algorithm conceptualization and validation in a model-based simulation environment enabled efficient and riskless experiments and validation. The design of a reusable, low-cost, and energy-efficient hardware architecture utilizing a standardized software/hardware interface enables experiments, which would otherwise require resources like a large real-scale test track.Comment: 17 pages, 19 figues, 2 table
    corecore