6 research outputs found

    Accelerating Hybrid Monte Carlo simulations of the Hubbard model on the hexagonal lattice

    Full text link
    We present different methods to increase the performance of Hybrid Monte Carlo simulations of the Hubbard model in two-dimensions. Our simulations concentrate on a hexagonal lattice, though can be easily generalized to other lattices. It is found that best results can be achieved using a flexible GMRES solver for matrix inversions and the second order Omelyan integrator with Hasenbusch acceleration on different time scales for molecular dynamics. We demonstrate how an arbitrary number of Hasenbusch mass terms can be included into this geometry and find that the optimal speed depends weakly on the choice of the number of Hasenbusch masses and their values. As such, the tuning of these masses is amenable to automization and we present an algorithm for this tuning that is based on the knowledge of the dependence of solver time and forces on the Hasenbusch masses. We benchmark our algorithms to systems where direct numerical diagonalization is feasible and find excellent agreement. We also simulate systems with hexagonal lattice dimensions up to 102×102102\times 102 and Nt=64N_t=64. We find that the Hasenbusch algorithm leads to a speed up of more than an order of magnitude.Comment: Corrected Proof in Press in Computer Physics Communication

    Reconfigurable acceleration of big data analytics

    No full text
    The amount of data stored and processed in data centers is growing at an unprecedented rate. At the same time, the improvement in processing capabilities of central processing units (CPUs) has relatively stagnated over the last decade, creating an increasing demand for specialised processing. Specialised accelerators have gotten the spotlight for computationally intensive applications, such as with deep learning on graphics processing units (GPUs), though such special purpose processors tend to be optimised for the trending applications and are not efficient for each entity’s computational needs. On the other hand, reconfigurable computing has shown impressive potential in accelerating specialised tasks, including database applications. Hence, field-programmable gate arrays (FPGAs) have started evolving into an integral part in the data center. However, the heterogeneity found in today’s systems featuring FPGAs, such as through non-uniform memory accesses (NUMA), has complicated the deployment and development of database accelerators. This PhD introduces novel parallel algorithms and FPGA designs for database acceleration that take into consideration the inter-chip communication limitations. The presented designs accelerate fundamental database operators such as sorting, sort-merge join and distinct count, with notable advantages over the state-of-the-art. Additionally, some building blocks such as the parallel round-robin arbiter and the fast lightweight merge sorter (FLiMS) are shown to have a wider applicability, including in single-instruction multiple-data (SIMD) algorithms and network switches. The proposed designs operate in a streaming access pattern with a wide path in order to achieve scalability to input size and future high-bandwidth architectures. Finally, a discussion on future architectures with reconfigurable instructions is provided as future work to further address the challenges appearing when accelerating big data using today’s FPGAs.Open Acces

    Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers

    Full text link
    Network-on-chips (NoCs) are currently a widely used approach for achieving scalability of multi-cores to many-cores, as well as for interconnecting other vital system-on-chip (SoC) components. Each entity in 2D mesh-based NoCs has a router responsible for forwarding packets between the dimensions as well as the entity itself, and it is essentially a 5-port switch. With respect to the routing algorithm, there are important trade-offs between routing performance and the efficiency of overcoming potential deadlocks. Common deadlock avoidance techniques including the turn model usually involve restrictions of certain paths a packet can take at the cost of a higher probability for network congestion. In contrast, deadlock resolution techniques, as well as some avoidance schemes, provide more path flexibility at the expense of hardware complexity, such as by incorporating (or assuming) dedicated buffers. This paper provides a deadlock avoidance algorithm for NoC routers based on output-queues (OQs) or virtual-output queues (VOQs). The proposed approach features fewer path restrictions than common techniques, and can be based on existing routing algorithms as a baseline, deadlock-free or not. This requires no modification to the queueing topology, and the required logic is minimal. Our algorithm approaches the performance of fully-adaptive algorithms, while maintaining deadlock freedom
    corecore