22 research outputs found

    A correlation-aware data placement strategy for key-value stores

    Get PDF
    Key-value stores hold the unprecedented bulk of the data produced by applications such as social networks. Their scalability and availability requirements often outweigh sacri cing richer data and pro- cessing models, and even elementary data consistency. Moreover, existing key-value stores have only random or order based placement strategies. In this paper we exploit arbitrary data relations easily expressed by the application to foster data locality and improve the performance of com- plex queries common in social network read-intensive workloads. We present a novel data placement strategy, supporting dynamic tags, based on multidimensional locality-preserving mappings. We compare our data placement strategy with the ones used in existing key-value stores under the workload of a typical social network appli- cation and show that the proposed correlation-aware data placement strategy o ers a major improvement on the system's overall response time and network requirements

    A Domain-Decomposed Multilevel Method for Adaptively Refined Cartesian Grids with Embedded Boundaries

    Get PDF
    Preliminary verification and validation of an efficient Euler solver for adaptively refined Cartesian meshes with embedded boundaries is presented. The parallel, multilevel method makes use of a new on-the-fly parallel domain decomposition strategy based upon the use of space-filling curves, and automatically generates a sequence of coarse meshes for processing by the multigrid smoother. The coarse mesh generation algorithm produces grids which completely cover the computational domain at every level in the mesh hierarchy. A series of examples on realistically complex three-dimensional configurations demonstrate that this new coarsening algorithm reliably achieves mesh coarsening ratios in excess of 7 on adaptively refined meshes. Numerical investigations of the scheme's local truncation error demonstrate an achieved order of accuracy between 1.82 and 1.88. Convergence results for the multigrid scheme are presented for both subsonic and transonic test cases and demonstrate W-cycle multigrid convergence rates between 0.84 and 0.94. Preliminary parallel scalability tests on both simple wing and complex complete aircraft geometries shows a computational speedup of 52 on 64 processors using the run-time mesh partitioner

    A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms

    Full text link
    Multiple Sequences Alignment (MSA) of biological sequences is a fundamental problem in computational biology due to its critical significance in wide ranging applications including haplotype reconstruction, sequence homology, phylogenetic analysis, and prediction of evolutionary origins. The MSA problem is considered NP-hard and known heuristics for the problem do not scale well with increasing number of sequences. On the other hand, with the advent of new breed of fast sequencing techniques it is now possible to generate thousands of sequences very quickly. For rapid sequence analysis, it is therefore desirable to develop fast MSA algorithms that scale well with the increase in the dataset size. In this paper, we present a novel domain decomposition based technique to solve the MSA problem on multiprocessing platforms. The domain decomposition based technique, in addition to yielding better quality, gives enormous advantage in terms of execution time and memory requirements. The proposed strategy allows to decrease the time complexity of any known heuristic of O(N)^x complexity by a factor of O(1/p)^x, where N is the number of sequences, x depends on the underlying heuristic approach, and p is the number of processing nodes. In particular, we propose a highly scalable algorithm, Sample-Align-D, for aligning biological sequences using Muscle system as the underlying heuristic. The proposed algorithm has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of quality of alignment, execution time and speed-up.Comment: 36 pages, 17 figures, Accepted manuscript in Journal of Parallel and Distributed Computing(JPDC

    Dynamic Load-Balancing with Variable Number of Processors based on Graph Repartitioning

    Get PDF
    Dynamic load balancing is an important step conditioning the performance of parallel adaptive codes whose load evolution is difficult to predict. Most of the works which answer this problem perform well, but are limited to an initially fixed number of processors which is not modified at runtime. These approaches can be very inefficient, especially in terms of resource consumption. In this paper, we present a new graph repartitioning algorithm which accepts a variable number of processors, assuming the load is already balanced. Our algorithm minimizes both data communication and data migration overheads, while maintaining the computational load balanced. This algorithm is based on a theoretical result, that constructs optimal communication matrices with both a minimum migration volume and a minimum number of communications. An experimental study which compares our work against state-of-the-art approaches is presented

    Repartitionnement d'un graphe de M vers N processeurs : application pour l'équilibrage dynamique de charge

    Get PDF
    National audienceL'équilibrage dynamique de charge est une étape cruciale qui conditionne la performance des codes adaptatifs dont l'évolution de la charge est difficilement prévisible. Néanmoins, l'ensemble des travaux dans ce domaine se limitent -- à notre connaissance -- au cas où le nombre de processeurs est fixé initialement et n'est pas remis en cause lors de l'équilibrage. Cela peut s'avérer particulièrement inefficace, notamment du point de vue de la consommation des ressources. Nous proposons dans cet article un nouvel algorithme de repartitionnement de graphe permettant de faire varier le nombre de processeurs, en supposant que la charge du graphe n'est pas modifiée. Cet algorithme optimise conjointement la coupe et la migration en s'appuyant sur un modèle de partitionnement de graphe à sommets fixes. Des résultats expérimentaux valident nos travaux en les comparant à d'autres approches

    Doctor of Philosophy

    Get PDF
    dissertationSolutions to Partial Di erential Equations (PDEs) are often computed by discretizing the domain into a collection of computational elements referred to as a mesh. This solution is an approximation with an error that decreases as the mesh spacing decreases. However, decreasing the mesh spacing also increases the computational requirements. Adaptive mesh re nement (AMR) attempts to reduce the error while limiting the increase in computational requirements by re ning the mesh locally in regions of the domain that have large error while maintaining a coarse mesh in other portions of the domain. This approach often provides a solution that is as accurate as that obtained from a much larger xed mesh simulation, thus saving on both computational time and memory. However, historically, these AMR operations often limit the overall scalability of the application. Adapting the mesh at runtime necessitates scalable regridding and load balancing algorithms. This dissertation analyzes the performance bottlenecks for a widely used regridding algorithm and presents two new algorithms which exhibit ideal scalability. In addition, a scalable space- lling curve generation algorithm for dynamic load balancing is also presented. The performance of these algorithms is analyzed by determining their theoretical complexity, deriving performance models, and comparing the observed performance to those performance models. The models are then used to predict performance on larger numbers of processors. This analysis demonstrates the necessity of these algorithms at larger numbers of processors. This dissertation also investigates methods to more accurately predict workloads based on measurements taken at runtime. While the methods used are not new, the application of these methods to the load balancing process is. These methods are shown to be highly accurate and able to predict the workload within 3% error. By improving the accuracy of these estimations, the load imbalance of the simulation can be reduced, thereby increasing the overall performance

    Emergent relational schemas for RDF

    Get PDF

    Working With Incremental Spatial Data During Parallel (GPU) Computation

    Get PDF
    Central to many complex systems, spatial actors require an awareness of their local environment to enable behaviours such as communication and navigation. Complex system simulations represent this behaviour with Fixed Radius Near Neighbours (FRNN) search. This algorithm allows actors to store data at spatial locations and then query the data structure to find all data stored within a fixed radius of the search origin. The work within this thesis answers the question: What techniques can be used for improving the performance of FRNN searches during complex system simulations on Graphics Processing Units (GPUs)? It is generally agreed that Uniform Spatial Partitioning (USP) is the most suitable data structure for providing FRNN search on GPUs. However, due to the architectural complexities of GPUs, the performance is constrained such that FRNN search remains one of the most expensive common stages between complex systems models. Existing innovations to USP highlight a need to take advantage of recent GPU advances, reducing the levels of divergence and limiting redundant memory accesses as viable routes to improve the performance of FRNN search. This thesis addresses these with three separate optimisations that can be used simultaneously. Experiments have assessed the impact of optimisations to the general case of FRNN search found within complex system simulations and demonstrated their impact in practice when applied to full complex system models. Results presented show the performance of the construction and query stages of FRNN search can be improved by over 2x and 1.3x respectively. These improvements allow complex system simulations to be executed faster, enabling increases in scale and model complexity
    corecore