305,739 research outputs found

    Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

    Full text link
    We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two-level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values. Due to its small memory footprint during traversal, the method lends itself to an efficient, parallel GPU implementation. This Product Quantization Tree (PQT) approach significantly outperforms recent state of the art methods for high dimensional nearest neighbor queries on standard reference datasets. Ours is the first work that demonstrates GPU performance superior to CPU performance on high dimensional, large scale ANN problems in time-critical real-world applications, like loop-closing in videos

    Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

    Full text link
    We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

    RAxML-Cell: Parallel Phylogenetic Tree Inference on the Cell Broadband Engine

    Get PDF
    Phylogenetic tree reconstruction is one of the grand challenge problems in Bioinformatics. The search for a best-scoring tree with 50 organisms, under a reasonable optimality criterion, creates a topological search space which is as large as the number of atoms in the universe. Computational phylogeny is challenging even for the most powerful supercomputers. It is also an ideal candidate for benchmarking emerging multiprocessor architectures, because it exhibits various levels of fine and coarse-grain parallelism. In this paper, we present the porting, optimization, and evaluation of RAxML on the Cell Broadband Engine. RAxML is a provably efficient, hill climbing algorithm for computing phylogenetic trees based on the Maximum Likelihood (ML) method. The algorithm uses an embarrassingly parallel search method, which also exhibits data-level parallelism and control parallelism in the computation of the likelihood functions. We present the optimization of one of the currently fastest tree search algorithms, on a real Cell blade prototype. We also investigate problems and present solutions pertaining to the optimization of floating point code, control flow, communication, scheduling, and multi-level parallelization on the Cell

    Approximate algorithms for partitioning and assignment problems

    Get PDF
    The problem of optimally assigning the modules of a parallel/pipelined program over the processors of a multiple computer system under certain restrictions on the interconnection structure of the program as well as the multiple computer system was considered. For a variety of such programs it is possible to find linear time if a partition of the program exists in which the load on any processor is within a certain bound. This method, when combined with a binary search over a finite range, provides an approximate solution to the partitioning problem. The specific problems considered were: a chain structured parallel program over a chain-like computer system, multiple chain-like programs over a host-satellite system, and a tree structured parallel program over a host-satellite system. For a problem with m modules and n processors, the complexity of the algorithm is no worse than O(mnlog(W sub T/epsilon)), where W sub T is the cost of assigning all modules to one processor and epsilon the desired accuracy

    Parallel algorithm for solving integer linear programs

    Get PDF
    Linear programs, or LPs, are often used in optimization problems, such as improving manufacturing efficiency of maximizing the yield from limited resources. The most common method for solving LPs is the Simplex Method, which will yield a solution, if one exists, but over the real numbers. From a purely numerical standpoint, it will be an optimal solution, but quite often we desire an optimal integer solution. A linear program in which the variables are also constrained to be integers is called an integer linear program or ILP. It is the focus of this report to present a parallel algorithm for solving ILPs. We discuss a serial algorithm using a breadth-first branch-and-bound search to check the feasible solution space, and then extend it into a parallel algorithm using a client-server model. In the parallel mode, the search may not be truly breadth-first, depending on the solution time for each node in the solution tree. Our search takes advantage of pruning, often resulting in super-linear improvements in solution time. Finally, we present results from sample ILPs, describe a few modifications to enhance the algorithm and improve solution time, and offer suggestions for future work

    FPGA Hardware Acceleration of a Phylogenetic Tree Reconstruction with Maximum Parsimony Algorithm

    Get PDF
    In this paper, we present an FPGA hardware implementation for a phylogenetic tree reconstruction with a maximum parsimony algorithm. We base our approach on a particular stochastic local search algorithm that uses the Progressive Neighborhood and the Indirect Calculation of Tree Lengths method. This method is widely used for the acceleration of the phylogenetic tree reconstruction algorithm in software. In our implementation, we define a tree structure and accelerate the search by parallel and pipeline processing. We show results for eight real-world biological datasets. We compare execution times against our previous hardware approach, and TNT, the fastest available parsimony program, which is also accelerated by the Indirect Calculation of Tree Lengths method. Acceleration rates between 34 to 45 per rearrangement, and 2 to 6 for the whole search, are obtained against our previous hardware approach. Acceleration rates between 2 to 36 per rearrangement, and 18 to 112 for the whole search, are obtained against TNT
    • …
    corecore