305,739 research outputs found
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine
Phylogenetic tree reconstruction is one of the grand challenge
problems in Bioinformatics. The search for a best-scoring tree with 50
organisms, under a reasonable optimality criterion, creates a
topological search space which is as large as the number of atoms in
the universe. Computational phylogeny is challenging even for the most
powerful supercomputers. It is also an ideal candidate for
benchmarking emerging multiprocessor architectures, because it
exhibits various levels of fine and coarse-grain parallelism. In this
paper, we present the porting, optimization, and evaluation of RAxML
on the Cell Broadband Engine. RAxML is a provably efficient, hill
climbing algorithm for computing phylogenetic trees based on the
Maximum Likelihood (ML) method. The algorithm uses an embarrassingly
parallel search method, which also exhibits data-level parallelism and
control parallelism in the computation of the likelihood functions.
We present the optimization of one of the currently fastest tree
search algorithms, on a real Cell blade prototype. We also
investigate problems and present solutions pertaining to the
optimization of floating point code, control flow, communication,
scheduling, and multi-level parallelization on the Cell
Approximate algorithms for partitioning and assignment problems
The problem of optimally assigning the modules of a parallel/pipelined program over the processors of a multiple computer system under certain restrictions on the interconnection structure of the program as well as the multiple computer system was considered. For a variety of such programs it is possible to find linear time if a partition of the program exists in which the load on any processor is within a certain bound. This method, when combined with a binary search over a finite range, provides an approximate solution to the partitioning problem. The specific problems considered were: a chain structured parallel program over a chain-like computer system, multiple chain-like programs over a host-satellite system, and a tree structured parallel program over a host-satellite system. For a problem with m modules and n processors, the complexity of the algorithm is no worse than O(mnlog(W sub T/epsilon)), where W sub T is the cost of assigning all modules to one processor and epsilon the desired accuracy
Parallel algorithm for solving integer linear programs
Linear programs, or LPs, are often used in optimization problems, such as improving manufacturing efficiency of maximizing the yield from limited resources. The most common method for solving LPs is the Simplex Method, which will yield a solution, if one exists, but over the real numbers. From a purely numerical standpoint, it will be an optimal solution, but quite often we desire an optimal integer solution. A linear program in which the variables are also constrained to be integers is called an integer linear program or ILP. It is the focus of this report to present a parallel algorithm for solving ILPs. We discuss a serial algorithm using a breadth-first branch-and-bound search to check the feasible solution space, and then extend it into a parallel algorithm using a client-server model. In the parallel mode, the search may not be truly breadth-first, depending on the solution time for each node in the solution tree. Our search takes advantage of pruning, often resulting in super-linear improvements in solution time. Finally, we present results from sample ILPs, describe a few modifications to enhance the algorithm and improve solution time, and offer suggestions for future work
FPGA Hardware Acceleration of a Phylogenetic Tree Reconstruction with Maximum Parsimony Algorithm
In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel and pipeline processing. We show results for eight real-world biological datasets. We compare execution times against our previous hardware approach, and TNT, the fastest available parsimony program, which is also accelerated by the Indirect Calculation of Tree Lengths method. Acceleration rates between 34 to 45 per rearrangement, and 2 to 6 for the whole search, are obtained against our previous hardware approach. Acceleration rates between 2 to 36 per rearrangement, and 18 to 112 for the whole search, are obtained against TNT
- …