6 research outputs found
Accelerating Hybrid Monte Carlo simulations of the Hubbard model on the hexagonal lattice
We present different methods to increase the performance of Hybrid Monte
Carlo simulations of the Hubbard model in two-dimensions. Our simulations
concentrate on a hexagonal lattice, though can be easily generalized to other
lattices. It is found that best results can be achieved using a flexible GMRES
solver for matrix inversions and the second order Omelyan integrator with
Hasenbusch acceleration on different time scales for molecular dynamics. We
demonstrate how an arbitrary number of Hasenbusch mass terms can be included
into this geometry and find that the optimal speed depends weakly on the choice
of the number of Hasenbusch masses and their values. As such, the tuning of
these masses is amenable to automization and we present an algorithm for this
tuning that is based on the knowledge of the dependence of solver time and
forces on the Hasenbusch masses. We benchmark our algorithms to systems where
direct numerical diagonalization is feasible and find excellent agreement. We
also simulate systems with hexagonal lattice dimensions up to
and . We find that the Hasenbusch algorithm leads to a speed up of more
than an order of magnitude.Comment: Corrected Proof in Press in Computer Physics Communication
Reconfigurable acceleration of big data analytics
The amount of data stored and processed in data centers is growing at an unprecedented rate. At the same time, the improvement in processing capabilities of central processing units (CPUs) has relatively stagnated over the last decade, creating an increasing demand for specialised processing. Specialised accelerators have gotten the spotlight for computationally intensive applications, such as with deep learning on graphics processing units (GPUs), though such special purpose processors tend to be optimised for the trending applications and are not efficient for each entity’s computational needs. On the other hand, reconfigurable computing has shown impressive potential in accelerating specialised tasks, including database applications. Hence, field-programmable gate arrays (FPGAs) have started evolving into an integral part in the data center. However, the heterogeneity found in today’s systems featuring FPGAs, such as through non-uniform memory accesses (NUMA), has complicated the deployment and development of database accelerators.
This PhD introduces novel parallel algorithms and FPGA designs for database acceleration that take into consideration the inter-chip communication limitations. The presented designs accelerate fundamental database operators such as sorting, sort-merge join and distinct count, with notable advantages over the state-of-the-art. Additionally, some building blocks such as the parallel round-robin arbiter and the fast lightweight merge sorter (FLiMS) are shown to have a wider applicability, including in single-instruction multiple-data (SIMD) algorithms and network switches. The proposed designs operate in a streaming access pattern with a wide path in order to achieve scalability to input size and future high-bandwidth architectures. Finally, a discussion on future architectures with reconfigurable instructions is provided as future work to further address the challenges appearing when accelerating big data using today’s FPGAs.Open Acces
Efficient deadlock avoidance for 2D mesh NoCs that use OQ or VOQ routers
Network-on-chips (NoCs) are currently a widely used approach for achieving
scalability of multi-cores to many-cores, as well as for interconnecting other
vital system-on-chip (SoC) components. Each entity in 2D mesh-based NoCs has a
router responsible for forwarding packets between the dimensions as well as the
entity itself, and it is essentially a 5-port switch. With respect to the
routing algorithm, there are important trade-offs between routing performance
and the efficiency of overcoming potential deadlocks. Common deadlock avoidance
techniques including the turn model usually involve restrictions of certain
paths a packet can take at the cost of a higher probability for network
congestion. In contrast, deadlock resolution techniques, as well as some
avoidance schemes, provide more path flexibility at the expense of hardware
complexity, such as by incorporating (or assuming) dedicated buffers.
This paper provides a deadlock avoidance algorithm for NoC routers based on
output-queues (OQs) or virtual-output queues (VOQs). The proposed approach
features fewer path restrictions than common techniques, and can be based on
existing routing algorithms as a baseline, deadlock-free or not. This requires
no modification to the queueing topology, and the required logic is minimal.
Our algorithm approaches the performance of fully-adaptive algorithms, while
maintaining deadlock freedom