75 research outputs found

### SAWdoubler: a program for counting self-avoiding walks

This article presents SAWdoubler, a package for counting the total number
Z(N) of self-avoiding walks (SAWs) on a regular lattice by the length-doubling
method, of which the basic concept has been published previously by us. We
discuss an algorithm for the creation of all SAWs of length N, efficient
storage of these SAWs in a tree data structure, and an algorithm for the
computation of correction terms to the count Z(2N) for SAWs of double length,
removing all combinations of two intersecting single-length SAWs.
We present an efficient numbering of the lattice sites that enables
exploitation of symmetry and leads to a smaller tree data structure; this
numbering is by increasing Euclidean distance from the origin of the lattice.
Furthermore, we show how the computation can be parallelised by distributing
the iterations of the main loop of the algorithm over the cores of a multicore
architecture. Experimental results on the 3D cubic lattice demonstrate that
Z(28) can be computed on a dual-core PC in only 1 hour and 40 minutes, with a
speedup of 1.56 compared to the single-core computation and with a gain by
using symmetry of a factor of 26. We present results for memory use and show
how the computation is made to fit in 4 Gbyte RAM. It is easy to extend the
SAWdoubler software to other lattices; it is publicly available under the GNU
LGPL license.Comment: 29 pages, 3 figure

### A medium-grain method for fast 2D bipartitioning of sparse matrices

We present a new hypergraph-based method, the medium-grain method, for solving the sparse matrix partitioning problem. This problem arises when distributing data for parallel sparse matrix-vector multiplication. In the medium-grain method, each matrix nonzero is assigned to either a row group or a column group, and these groups are represented by vertices of the hypergraph. For an m x n sparse matrix, the resulting hypergraph has m + n vertices and m + n hyperedges.
Furthermore, we present an iterative refinement procedure for improvement of a given partitioning, based on the medium-grain method, which can be applied as a cheap but effective postprocessing step after any partitioning method.
The medium-grain method is able to produce fully two-dimensional bipartitionings, but its computational complexity equals that of one-dimensional methods. Experimental results for a large set of sparse test matrices show that the medium-grain method with iterative refinement produces bipartitionings with lower communication volume compared to current state-of-the-art methods, and is faster at producing them

### Combinatorial Problems in High-Performance Computing: Partitioning

This extended abstract presents a survey of combinatorial problems
encountered in scientific computations on today\u27s
high-performance architectures, with sophisticated memory
hierarchies, multiple levels of cache, and multiple processors
on chip as well as off-chip.
For parallelism, the most important problem is to partition
sparse matrices, graph, or hypergraphs into nearly equal-sized
parts while trying to reduce inter-processor communication.
Common approaches to such problems involve multilevel
methods based on coarsening and uncoarsening (hyper)graphs,
matching of similar vertices, searching for good separator sets
and good splittings, dynamical adjustment of load imbalance,
and two-dimensional matrix splitting methods

### Minimizing Communication in the Multidimensional FFT

We present a parallel algorithm for the fast Fourier transform (FFT) in higher dimensions. This algorithm generalizes the cyclic-to-cyclic one-dimensional parallel algorithm to a cyclic-to-cyclic multidimensional parallel algorithm while retaining the property of needing only a single all-to-all communication step. This is under the constraint that we use at most âN processors for an FFT on an array with a total of NÂ elements, irrespective of the dimension d or the shape of the array. The only assumption we make is that N is sufficiently composite. Our algorithm starts and ends in the same data distribution. We present our multidimensional implementation FFTU which utilizes the sequential FFTW program for its local FFTs, and which can handle any dimension d. We obtain experimental results for d â€5 using MPI on up to 4096 cores of the supercomputer Snellius, comparing FFTU with the parallel FFTW program and with PFFT and heFFTe. These results show that FFTU is competitive with the state of the art and that it allows one to use a larger number of processors, while keeping communication limited to a single all-to-all operation. For arrays of size 10243Â and 645,Â FFTU achieves a speedup of a factor 149 and 176, respectively, on 4096 processors

### A geometric partitioning method for distributed tomographic reconstruction

Tomography is a powerful technique for 3D imaging of the interior of an object. With the growing sizes of typical tomographic data sets, the computational requirements for algorithms in tomography are rapidly increasing. Parallel and distributed-memory methods for tomographic reconstruction are therefore becoming increasingly common. An underexposed aspect is the effect of the data distribution on the performance of distributed-memory reconstruction algorithms. In this work, we introduce a geometric partitioning method, which takes into account the acquisition geometry and aims to minimize the necessary communication between nodes for distributed-memory forward projection and back projection operations. These operations are crucial subroutines for an important class of reconstruction methods. We show that the choice of data distribution has a significant impact on the runtime of these methods. With our novel partitioning method we reduce the communication volume drastically compared to straightforward distributions, by up to 90% for a number of cases, and furthermore we guarantee a specified load balance

### Exact k-way sparse matrix partitioning

To minimize the communication in parallel sparse matrix-vector multiplication while maintaining load balance, we need to partition the sparse matrix optimally into k disjoint parts, which is an NP-complete problem. We present an exact algorithm based on the branch and bound (BB) method which partitions a matrix for any k, and we explore exact sparse matrix partitioning beyond bipartitioning. The algorithm has been implemented in a software package General Matrix Partitioner (GMP). We also present an integer linear programming (ILP) model for the same problem, based on a hypergraph formulation. We used both methods to determine optimal 2,3,4-way partitionings for a subset of small matrices from the SuiteSparse Matrix Collection. For k=2, BB outperforms ILP, whereas for larger k, ILP is superior. We used the results found by these exact methods for k=4 to analyse the performance of recursive bipartitioning (RB) with exact bipartitioning. For 46 matrices of the 89 matrices in our test set of matrices with less than 250 nonzeros, the communication volume determined by RB was optimal. For the other matrices, RB is able to find 4-way partitionings with communication volume close to the optimal volume

### Partitioning a call graph

Splitting a large software system into smaller and more manageable units has become an important problem for many organizations. The basic structure of a software system is given by a directed graph with vertices representing the programs of the system and arcs representing calls from one program to another. Generating a good partitioning into smaller modules becomes a minimization problem for the number of programs being called by external programs. First, we formulate an equivalent integer linear programming problem with 0â1 variables. theoretically, with this approach the problem can be solved to optimality, but this becomes very costly with increasing size of the software system. Second, we formulate the problem as a hypergraph partitioning problem. This is a heuristic method using a multilevel strategy, but it turns out to be very fast and to deliver solutions that are close to optimal

### Parallel Sparse LU Decomposition on a Mesh Network of Transputers

A parallel algorithm is presented for the LU decomposition of a general sparse matrix on a distributed-memory MIMD multiprocessor with a square mesh communication network. In the algorithm, matrix elements are assigned to processors according to the grid distribution. Each processor represents the nonzero elements of its part of the matrix by a local, ordered, two-dimensional linked-list data structure. The complexity of important operations on this data structure and on several others is analysed. At each step of the algorithm, a parallel search for a set of m compatible pivot elements is performed. The Markowitz counts of the pivot elements are close to minimum, to preserve the sparsity of the matrix. The pivot elements also satisfy a threshold criterion, to ensure numerical stability. The compatibility of the m pivots enables the simultaneous elimination of m pivot rows and m pivot columns in a rank-m update of the reduced matrix. Experimental results on a network of 400 transputers are presented for a set of test matrices from the HarwellâBoeing sparse matrix collection

### Math saves the forest

Wireless sensor networks are decentralised networks consisting of sensors that can detect events and transmit data to neighbouring sensors. Ideally, this data is eventually gathered in a central base station. Wireless sensor networks have many possible applications. For example, they can be used to detect gas leaks in houses or fires in a forest.\ud
In this report, we study data gathering in wireless sensor networks with the objective of minimising the time to send event data to the base station. We focus on sensors with a limited cache and take into account both node and transmission failures. We present two cache strategies and analyse the performance of these strategies for specific networks. For the case without node failures we give the expected arrival time of event data at the base station for both a line and a 2D grid network. For the case with node failures we study the expected arrival time on two-dimensional networks through simulation, as well as the influence of the broadcast range

### Open Problems in (Hyper)Graph Decomposition

Large networks are useful in a wide range of applications. Sometimes problem
instances are composed of billions of entities. Decomposing and analyzing these
structures helps us gain new insights about our surroundings. Even if the final
application concerns a different problem (such as traversal, finding paths,
trees, and flows), decomposing large graphs is often an important subproblem
for complexity reduction or parallelization. This report is a summary of
discussions that happened at Dagstuhl seminar 23331 on "Recent Trends in Graph
Decomposition" and presents currently open problems and future directions in
the area of (hyper)graph decomposition

- âŠ