241 research outputs found

    Parallel Computers and Complex Systems

    Get PDF
    We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

    On the benefits of tasking with OpenMP

    Get PDF
    Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and potential for improved locality. Still, the adoption of tasking in production HPC codes has been slow. Despite OpenMP supporting tasks, most codes rely on worksharing-loop constructs alongside MPI primitives. This paper provides insights on the benefits of tasking over the worksharing-loop model by reporting on the experience of taskifying an adaptive mesh refinement proxy application: miniAMR. The performance evaluation shows the taskified implementation being 15–30% faster than the loop-parallel one for certain thread counts across four systems, three architectures and four compilers thanks to better load balancing and system utilization. Dynamic scheduling of loops narrows the gap but still falls short of tasking due to serial sections between loops. Locality improvements are incidental due to the lack of locality-aware scheduling. Overall, the introduction of asynchrony with tasking lives up to its promises, provided that programmers parallelize beyond individual loops and across application phases.Peer ReviewedPostprint (author's final draft

    Performance visualizations using XML representations

    Get PDF
    The intermediate representation (IR)forms the information exchanged among different passes of program compilation. The intermediate format proposed for extensibility and persistence is written in XML. In this way, the program transformations that were internal to the compiler become visible. The hierarchical structure of XML makes a natural representation for the abstract syntax tree (AST). A compiler can parse the program source into an IR, then output it as an XML document. Separated by orthogonal namespaces, other IRs are also presented in the same XML document, gathering program information such as dependence vectors, transforming matrices, iteration spaces dependence graphs and cache reuse distances. This XML document can be exchanged between the compiler and program visualizers for parallelism and locality

    Developing a Benchmark for Evaluating the Performance of Parallel Computers

    Get PDF
    This paper discusses the development of a portable suite of benchmarking programs for parallel computers. Comparative measurement of the performance of parallel computing systems has been limited because of the great diversity of architectures and of processor interconnection schemes. One solution is to translate benchmark codes into a consistent and portable parallel language. This paper reports on progress in developing such a portable suite of benchmarks. An extensive introduction to parallel computing is included as an appendix, to provide a thorough understanding of the factors complicating development of the performance suite. Key to the development was the use of p4, a library of tools developed at Argonne National Laboratory. The benchmark codes were translated successfully using p4 and were run on a variety of parallel machines. Conclusions and suggestions for future work are given

    Dagstuhl News January - December 1999

    Get PDF
    "Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

    Computing and Information Science (CIS)

    Full text link
    Cornell University Courses of Study Vol. 97 2005/200

    Graphical processing unit (GPU) acceleration for numerical solution of population balance models using high resolution finite volume algorithm

    Get PDF
    © 2016 Elsevier LtdPopulation balance modeling is a widely used approach to describe crystallization processes. It can be extended to multivariate cases where more internal coordinates i.e., particle properties such as multiple characteristic sizes, composition, purity, etc. can be used. The current study presents highly efficient fully discretized parallel implementation of the high resolution finite volume technique implemented on graphical processing units (GPUs) for the solution of single- and multi-dimensional population balance models (PBMs). The proposed GPU-PBM is implemented using CUDA C++ code for GPU calculations and provides a generic Matlab interface for easy application for scientific computing. The case studies demonstrate that the code running on the GPU is between 2–40 times faster than the compiled C++ code and 50–250 times faster than the standard MatLab implementation. This significant improvement in computational time enables the application of model-based control approaches in real time even in case of multidimensional population balance models
    • …
    corecore