291 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Full QCD with the L\"uscher local bosonic action

    Get PDF
    We investigate L\"uscher's method of including dynamical Wilson fermions in a lattice simulation of QCD with two quark flavours. We measure the accuracy of the approximation by comparing it with Hybrid Monte Carlo results for gauge plaquette and Wilson loops. We also introduce an additional global Metropolis step in the update. We show that the complexity of L\"uscher's algorithm compares favourably with that of the Hybrid Monte Carlo.Comment: 21 pages Late

    Fault-Tolerant Ring Embeddings in Hypercubes -- A Reconfigurable Approach

    Get PDF
    We investigate the problem of designing reconfigurable embedding schemes for a fixed hypercube (without redundant processors and links). The fundamental idea for these schemes is to embed a basic network on the hypercube without fully utilizing the nodes on the hypercube. The remaining nodes can be used as spares to reconfigure the embeddings in case of faults. The result of this research shows that by carefully embedding the application graphs, the topological properties of the embedding can be preserved under fault conditions, and reconfiguration can be carried out efficiently. In this dissertation, we choose the ring as the basic network of interest, and propose several schemes for the design of reconfigurable embeddings with the aim of minimizing reconfiguration cost and performance degradation. The cost is measured by the number of node-state changes or reconfiguration steps needed for processing of the reconfiguration, and the performance degradation is characterized as the dilation of the new embedding after reconfiguration. Compared to the existing schemes, our schemes surpass the existing ones in terms of applicability of schemes and reconfiguration cost needed for the resulting embeddings

    Analysis of a parallelized nonlinear elliptic boundary value problem solver with application to reacting flows

    Get PDF
    A parallelized finite difference code based on the Newton method for systems of nonlinear elliptic boundary value problems in two dimensions is analyzed in terms of computational complexity and parallel efficiency. An approximate cost function depending on 15 dimensionless parameters is derived for algorithms based on stripwise and boxwise decompositions of the domain and a one-to-one assignment of the strip or box subdomains to processors. The sensitivity of the cost functions to the parameters is explored in regions of parameter space corresponding to model small-order systems with inexpensive function evaluations and also a coupled system of nineteen equations with very expensive function evaluations. The algorithm was implemented on the Intel Hypercube, and some experimental results for the model problems with stripwise decompositions are presented and compared with the theory. In the context of computational combustion problems, multiprocessors of either message-passing or shared-memory type may be employed with stripwise decompositions to realize speedup of O(n), where n is mesh resolution in one direction, for reasonable n

    Using offline routing to implement a low latenc 3D FFT in a multinode FPGA system

    Full text link
    Thesis (M.S.)--Boston UniversityApplications that require highly parallel computing along with low latency communication due to strong scaling, such as a calculating a 3D FFT for Molecular Dynamics simulations, can be problematic for traditional high performance computing (HPC) clusters. A multinode FPGA array is a good solution for these types of problems due to the direct high speed connections and flexible internal fabric inherent in FPGAs. Offline routing uses precomputed routing information to direct packets and can avoid much of the switching and congestion communication overhead. Two architectures are explored here which show the feasibility ofusing offline routing techniques to reduce communication latencies in FPGA systems. The first architecture targets a single FPGA that was built for initial exploration and to show how the powerful and flexible a single FPGA can be. It attained a maximum clock frequency of 102MHz and latencies of 64us and 250us for 3D FFT calculations of 32^3 and 64^3 data points respectively. The second architecture targets an FPGA that is intended to be the model for each node in the array. The best multinode version is based on a multilevel switching architecture. It has a maximum clock frequency of 134MHz. When scaled to a cluster, latencies project to 2.4us and 5.5us for 3D FFT calculations of 32^3 and 64^3 data points respectively. The two designs show the potential for using a single FPGA and multi-FPGA arrays for HPC applications where communication latency is critical to the application

    Performance effects of node mapping on the IBM BlueGene/L machine

    Get PDF
    The IBM BlueGene/L (BG/L) supercomputer is a new machine consisting of up to 65536 relatively modest compute nodes connected with three application-level networks -- a high-performance point-to-point 3D torus network, a global combining/broadcast tree network for collective operations, and a global interrupt/barrier network for extremely fast global barriers. The BG/L control system allows the user to assign MPI logical ranks to physical torus coordinates at run-time in an arbitrary manner as long as all nodes are uniquely included in the mapping. This presents the possibility of increasing application performance with very little effort. This thesis investigates the performance effects of node mapping with several benchmarks and scientific codes using a variety of existing and new mapping strategies. The benchmarks are the NAS parallel benchmarks, the Ames Laboratory Classical Molecular dynamics code (ALCMD), and the General Atomic and Molecular Electronic Structure System (GAMESS) application. The NAS benchmarks are short, easy to understand, and fairly well known. ALCMD has an interesting communication pattern that should benefit from a good mapping strategy. GAMESS is one application that is not necessarily well-suited for running on BlueGene because it requires a large amount of compute power and memory per node. However, it provides an interesting data point for performance of applications that were not designed for a particular system and the possible benefits of mapping on such applications. The mappings investigated were the stock permutations (XYZ, XZY, etc), Gray-code based mesh mappings, random maps, variations on Gray-code maps for embedding 2D meshes in the 3D torus, and three maps designed for GAMESS. Performance results are presented for node mappings on several BG/L partition sizes

    Most Permissive Semantics of Boolean Networks

    Get PDF
    As shown in (http://dx.doi.org/10.1101/2020.03.22.998377), the usual update modes of Boolean networks (BNs), including synchronous and (generalized) asynchronous, fail to capture behaviors introduced by multivalued refinements. Thus, update modes do not allow a correct abstract reasoning on dynamics of biological systems, as they may lead to reject valid BN models.This technical report lists the main definitions and properties of the most permissive semantics of BNs introduced in http://dx.doi.org/10.1101/2020.03.22.998377. This semantics meets with a correct abstraction of any multivalued refinements, with any update mode. It subsumes all the usual updating modes, while enabling new behaviors achievable by more concrete models. Moreover, it appears that classical dynamical analyzes of reachability and attractors have a simpler computational complexity:- reachability can be assessed in a polynomial number of iterations. The computation of iterations is in NP in the very general case, and is linear when local functions are monotonic, or with some usual representations of functions of BNs (binary decision diagrams, Petri nets, automata networks, etc.). Thus, reachability is in P with locally-monotonic BNs, and PNP^{\text{NP}} otherwise (instead of being PSPACE-complete with update modes);- deciding wherever a configuration belongs to an attractor is in coNP with locally-monotonic BNs, and coNPcoNP^{\text{coNP}} otherwise (instead of PSPACE-complete with update modes).Furthermore, we demonstrate that the semantics completely captures any behavior achievable with any multilevel or ODE refinement of the BN; and the semantics is minimal with respect to this model refinement criteria: to any most permissive trajectory, there exists a multilevel refinement of the BN which can reproduce it.In brief, the most permissive semantics of BNs enables a correct abstract reasoning on dynamics of BNs, with a greater tractability than previously introduced update modes
    corecore