11,813 research outputs found

    Experiments in distributed memory time warp

    Get PDF

    Coupling the time-warp algorithm with the graph-theoretical kinetic Monte Carlo framework for distributed simulations of heterogeneous catalysts

    Get PDF
    Despite the successful and ever widening adoption of kinetic Monte Carlo (KMC) simulations in the area of surface science and heterogeneous catalysis, the accessible length scales are still limited by the inherently sequential nature of the KMC framework. Simulating long-range surface phenomena, such as catalytic reconstruction and pattern formation, requires consideration of large surfaces/lattices, at the μm scale and beyond. However, handling such lattices with the sequential KMC framework is extremely challenging due to the heavy memory footprint and computational demand. The Time-Warp algorithm proposed by Jefferson [ACM. Trans. Program. Lang. Syst., 1985. 7: 404-425] offers a way to enable distributed parallelization of discrete event simulations. Thus, to enable high-fidelity simulations of challenging systems in heterogeneous catalysis, we have coupled the Time-Warp algorithm with the Graph-Theoretical KMC framework [J. Chem. Phys., 134(21): 214115; J. Chem. Phys., 139(22): 224706] and implemented the approach in the general-purpose KMC code Zacros. We have further developed a “parallel-emulation” serial algorithm, which produces identical results to those obtained from the distributed runs (with the Time-Warp algorithm) thereby validating the correctness of our implementation. These advancements make Zacros the first-of-its-kind general-purpose KMC code with distributed computing capabilities, thereby opening up opportunities for detailed meso-scale studies of heterogeneous catalysts and closer-than-ever comparisons of theory with experiments

    On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark

    Full text link
    Querying very large RDF data sets in an efficient manner requires a sophisticated distribution strategy. Several innovative solutions have recently been proposed for optimizing data distribution with predefined query workloads. This paper presents an in-depth analysis and experimental comparison of five representative and complementary distribution approaches. For achieving fair experimental results, we are using Apache Spark as a common parallel computing framework by rewriting the concerned algorithms using the Spark API. Spark provides guarantees in terms of fault tolerance, high availability and scalability which are essential in such systems. Our different implementations aim to highlight the fundamental implementation-independent characteristics of each approach in terms of data preparation, load balancing, data replication and to some extent to query answering cost and performance. The presented measures are obtained by testing each system on one synthetic and one real-world data set over query workloads with differing characteristics and different partitioning constraints.Comment: 16 pages, 3 figure

    Parallel Discrete Event Simulation with Erlang

    Full text link
    Discrete Event Simulation (DES) is a widely used technique in which the state of the simulator is updated by events happening at discrete points in time (hence the name). DES is used to model and analyze many kinds of systems, including computer architectures, communication networks, street traffic, and others. Parallel and Distributed Simulation (PADS) aims at improving the efficiency of DES by partitioning the simulation model across multiple processing elements, in order to enabling larger and/or more detailed studies to be carried out. The interest on PADS is increasing since the widespread availability of multicore processors and affordable high performance computing clusters. However, designing parallel simulation models requires considerable expertise, the result being that PADS techniques are not as widespread as they could be. In this paper we describe ErlangTW, a parallel simulation middleware based on the Time Warp synchronization protocol. ErlangTW is entirely written in Erlang, a concurrent, functional programming language specifically targeted at building distributed systems. We argue that writing parallel simulation models in Erlang is considerably easier than using conventional programming languages. Moreover, ErlangTW allows simulation models to be executed either on single-core, multicore and distributed computing architectures. We describe the design and prototype implementation of ErlangTW, and report some preliminary performance results on multicore and distributed architectures using the well known PHOLD benchmark.Comment: Proceedings of ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC 2012) in conjunction with ICFP 2012. ISBN: 978-1-4503-1577-

    A Unified Optimization Approach for Sparse Tensor Operations on GPUs

    Full text link
    Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations make such implementations challenging. We leverage the fact that sparse tensor operations share similar computation patterns to propose a unified tensor representation called F-COO. Combined with GPU-specific optimizations, F-COO provides highly-optimized implementations of sparse tensor computations on GPUs. The performance of the proposed unified approach is demonstrated for tensor-based kernels such as the Sparse Matricized Tensor- Times-Khatri-Rao Product (SpMTTKRP) and the Sparse Tensor- Times-Matrix Multiply (SpTTM) and is used in tensor decomposition algorithms. Compared to state-of-the-art work we improve the performance of SpTTM and SpMTTKRP up to 3.7 and 30.6 times respectively on NVIDIA Titan-X GPUs. We implement a CANDECOMP/PARAFAC (CP) decomposition and achieve up to 14.9 times speedup using the unified method over state-of-the-art libraries on NVIDIA Titan-X GPUs
    • …
    corecore