201 research outputs found

    An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

    Get PDF
    The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

    Out-of-Core Streamline Visualization on Large Unstructured Meshes

    Get PDF
    It's advantageous for computational scientists to have the capability to perform interactive visualization on their desktop workstations. For data on large unstructured meshes, this capability is not generally available. In particular, particle tracing on unstructured grids can result in a high percentage of non-contiguous memory accesses and therefore may perform very poorly with virtual memory paging schemes. The alternative of visualizing a lower resolution of the data degrades the original high-resolution calculations. This paper presents an out-of-core approach for interactive streamline construction on large unstructured tetrahedral meshes containing millions of elements. The out-of-core algorithm uses an octree to partition and restructure the raw data into subsets stored into disk files for fast data retrieval. A memory management policy tailored to the streamline calculations is used such that during the streamline construction only a very small amount of data are brought into the main memory on demand. By carefully scheduling computation and data fetching, the overhead of reading data from the disk is significantly reduced and good memory performance results. This out-of-core algorithm makes possible interactive streamline visualization of large unstructured-grid data sets on a single mid-range workstation with relatively low main-memory capacity: 5-20 megabytes. Our test results also show that this approach is much more efficient than relying on virtual memory and operating system's paging algorithms

    Load Balancing Algorithms for Parallel Spatial Join on HPC Platforms

    Get PDF
    Geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of geospatial computations and analytics on large scale datasets, parallel processing is necessary. To exploit fine-grained parallel processing on large scale compute clusters, partitioning of skewed datasets in a load-balanced way is challenging. The workload in spatial join is data dependent and highly irregular. Moreover, wide variation in the size and density of geometries from one region of the map to another, further exacerbates the load imbalance. This dissertation focuses on spatial join operation used in Geographic Information Systems (GIS) and spatial databases, where the inputs are two layers of geospatial data, and the output is a combination of the two layers according to join predicate.This dissertation introduces a novel spatial data partitioning algorithm geared towards load balancing the parallel spatial join processing. Unlike existing partitioning techniques, the proposed partitioning algorithm divides the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. This workload partitioning algorithm has been evaluated on a high-performance computing system using real-world datasets. An intermediate output-sensitive duplication avoidance technique is proposed that decreases the external memory space requirement for storing spatial join candidates across the partitions. GPU acceleration is used to further reduce the spatial partitioning runtime. For dynamic load balancing in spatial join, a novel framework for fine-grained work stealing is presented. This framework is efficient and NUMA-aware. Performance improvements are demonstrated on shared and distributed memory architectures using threads and message passing. Experimental results show effective mitigation of data skew. The framework supports a variety of spatial join predicates and spatial overlay using partitioned and un-partitioned datasets

    Efficient Parallel and Distributed Algorithms for GIS Polygon Overlay Processing

    Get PDF
    Polygon clipping is one of the complex operations in computational geometry. It is used in Geographic Information Systems (GIS), Computer Graphics, and VLSI CAD. For two polygons with n and m vertices, the number of intersections can be O(nm). In this dissertation, we present the first output-sensitive CREW PRAM algorithm, which can perform polygon clipping in O(log n) time using O(n + k + k\u27) processors, where n is the number of vertices, k is the number of intersections, and k\u27 is the additional temporary vertices introduced due to the partitioning of polygons. The current best algorithm by Karinthi, Srinivas, and Almasi does not handle self-intersecting polygons, is not output-sensitive and must employ O(n^2) processors to achieve O(log n) time. The second parallel algorithm is an output-sensitive PRAM algorithm based on Greiner-Hormann algorithm with O(log n) time complexity using O(n + k) processors. This is cost-optimal when compared to the time complexity of the best-known sequential plane-sweep based algorithm for polygon clipping. For self-intersecting polygons, the time complexity is O(((n + k) log n log log n)/p) using p In addition to these parallel algorithms, the other main contributions in this dissertation are 1) multi-core and many-core implementation for clipping a pair of polygons and 2) MPI-GIS and Hadoop Topology Suite for distributed polygon overlay using a cluster of nodes. Nvidia GPU and CUDA are used for the many-core implementation. The MPI based system achieves 44X speedup while processing about 600K polygons in two real-world GIS shapefiles 1) USA Detailed Water Bodies and 2) USA Block Group Boundaries) within 20 seconds on a 32-node (8 cores each) IBM iDataPlex cluster interconnected by InfiniBand technology

    An effective Chinese indexing method based on partitioned signature files.

    Get PDF
    Wong Chi Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 107-114).Abstract also in Chinese.Abstract --- p.iiAcknowledgements --- p.viChapter 1 --- Introduction --- p.1Chapter 1.1 --- Introduction to Chinese IR --- p.1Chapter 1.2 --- Contributions --- p.3Chapter 1.3 --- Organization of this Thesis --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Indexing methods --- p.6Chapter 2.1.1 --- Full-text scanning --- p.7Chapter 2.1.2 --- Inverted files --- p.7Chapter 2.1.3 --- Signature files --- p.9Chapter 2.1.4 --- Clustering --- p.10Chapter 2.2 --- Information Retrieval Models --- p.10Chapter 2.2.1 --- Boolean model --- p.11Chapter 2.2.2 --- Vector space model --- p.11Chapter 2.2.3 --- Probabilistic model --- p.13Chapter 2.2.4 --- Logical model --- p.14Chapter 3 --- Investigation of Segmentation on the Vector Space Retrieval Model --- p.15Chapter 3.1 --- Segmentation of Chinese Texts --- p.16Chapter 3.1.1 --- Character-based segmentation --- p.16Chapter 3.1.2 --- Word-based segmentation --- p.18Chapter 3.1.3 --- N-Gram segmentation --- p.21Chapter 3.2 --- Performance Evaluation of Three Segmentation Approaches --- p.23Chapter 3.2.1 --- Experimental Setup --- p.23Chapter 3.2.2 --- Experimental Results --- p.24Chapter 3.2.3 --- Discussion --- p.29Chapter 4 --- Signature File Background --- p.32Chapter 4.1 --- Superimposed coding --- p.34Chapter 4.2 --- False drop probability --- p.36Chapter 5 --- Partitioned Signature File Based On Chinese Word Length --- p.39Chapter 5.1 --- Fixed Weight Block (FWB) Signature File --- p.41Chapter 5.2 --- Overview of PSFC --- p.45Chapter 5.3 --- Design Considerations --- p.50Chapter 6 --- New Hashing Techniques for Partitioned Signature Files --- p.59Chapter 6.1 --- Direct Division Method --- p.61Chapter 6.2 --- Random Number Assisted Division Method --- p.62Chapter 6.3 --- Frequency-based hashing method --- p.64Chapter 6.4 --- Chinese character-based hashing method --- p.68Chapter 7 --- Experiments and Results --- p.72Chapter 7.1 --- Performance evaluation of partitioned signature file based on Chi- nese word length --- p.74Chapter 7.1.1 --- Retrieval Performance --- p.75Chapter 7.1.2 --- Signature Reduction Ratio --- p.77Chapter 7.1.3 --- Storage Requirement --- p.79Chapter 7.1.4 --- Discussion --- p.81Chapter 7.2 --- Performance evaluation of different dynamic signature generation methods --- p.82Chapter 7.2.1 --- Collision --- p.84Chapter 7.2.2 --- Retrieval Performance --- p.86Chapter 7.2.3 --- Discussion --- p.89Chapter 8 --- Conclusions and Future Work --- p.91Chapter 8.1 --- Conclusions --- p.91Chapter 8.2 --- Future work --- p.95Chapter A --- Notations of Signature Files --- p.96Chapter B --- False Drop Probability --- p.98Chapter C --- Experimental Results --- p.103Bibliography --- p.10

    Semiannual report

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period 1 Oct. 1994 - 31 Mar. 1995

    Highly Parallel Processing of Relational Databases (Thesis)

    Get PDF

    [Research activities in applied mathematics, fluid mechanics, and computer science]

    Get PDF
    This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science during the period April 1, 1995 through September 30, 1995

    Extension of the Overset Grid Preprocessor for Surface Conforming Meshes

    Get PDF
    RÉSUMÉ Un des défis à relever pour les aérodynamiciens numériciens est de développer des méthodes représentant le plus fidèlement possible la dynamique des fluides. L’augmentation des ressources de calcul disponibles permet maintenant à la dynamique des fluides numérique de représenter et résoudre adéquatement ces problèmes. Les travaux présentés dans ce mémoire se concentrent sur le développement d’une méthode pour résoudre les équations de Navier-Stokes sur des géométries complexes. Le logiciel utilisé pour faire ces simulations est celui développé à Polytechnique Montréal, NSCODE. Deux objectifs sont définis pour le projet: développer une méthode permettant la résolution de géométries complexes utilisant des maillages partageant une surface et démontrer la robustesse de la méthode en lien à des applications de type industriel. Dans le but d’augmenter les capacités de la méthode, une revue de littérature du développement de la méthode dans différents groupes de recherche, tels la NASA ou l’ONÉRA, a été faite. La méthode chimère, aussi connue sous son appellation anglaise «Overset», est choisie pour sa grande flexibilité à supporter des géométries complexes. Elle permet de mailler les différentes composantes d’une géométrie de façon indépendante entre celles-ci. Cela permet donc de simplifier la génération des maillages, étape complexe dans le processus de la dynamique des fluides numérique. La méthode chimère fait l’assemblage entre les différents maillages, utilisant des fonctions d’interpolation pour créer la communication entre eux. Une première version de la méthode avait précédemment été implémentée au sein du solveur NSCODE, mais n’avait été validée que sur des géométries dont les différentes composantes étaient entièrement entourées de fluide. Pour des géométries complexes, il n’est toutefois pas possible de procéder ainsi, et les maillages doivent pouvoir se superposer sur la surface de la géométrie. Trois axes de développement permettant d’élargir les capacités de la méthode actuelle sont identifiés. Premièrement, la méthode telle qu’implémentée présentait un algorithme de découpe de géométries (traduction libre du terme anglais «hole cutting») sommaire, échouant sur des géométries concaves. Un algorithme utilisant une triangulation Delaunay contrainte pour modéliser la géométrie est venu renforcir cette étape de la méthode chimère. Deuxièmement, pour supporter des maillages qui se superposent sur la même géométrie, l’interpolation dans les régions visqueuses a été étudiée. Principalement, les particularités liées au solveur, soit une discrétisation centrée aux cellules et un schéma de dissipation artificielle requérant 2 voisins, sont venues influencer les choix pour la méthode.----------ABSTRACT Aerodynamics engineers aspire to develop methods that represent with as much fidelity as possible fluid dynamics. With the fast growth of computational resources, Computational Fluid Dynamics (CFD) tools can now rely on high fidelity methods to solve these problems. This thesis focuses on the development of a method to solve the Navier-Stokes equations over complex geometries. The flow solver developed at Polytechnique Montreal, NSCODE, is the software used to perform the simulations. Two objectives are defined: develop a method to simulate complex geometries using surface conforming meshes and demonstrate its robustness with respect to industrial type applications. A literature review is conducted to evaluate the maturation of the overset method inside different research groups, notably the NASA and the ONERA. Also known as the Chimera method, it is selected based on its capacity to handle such difficult geometries. It allows to mesh different components individually, which ensures maximum grid quality. The mesh generation process is then simplified, which is regarded as a tedious and time consuming aspect of CFD. The overset method proceeds to perform the assembly of the different components together. Communication between these meshes is assured by using interpolation functions. An initial version of the overset method had previously been implemented inside NSCODE. Its validation was partially done, as it was only used for fully separated geometries. For complex geometries, this condition can not always be met, and the method must be able to treat meshes that overlap on the surface. Three development axis are identified to increase the capabilities of the current implementation. First, the hole cutting algorithm in place, while being a fast method, lacks of versatility towards more complex cases. Concave geometries lead to non valid grid assembly. An algorithm is developed to replace it, which uses a constrained Delaunay triangulation to represent accurately the internal geometry. Second, in order to support meshes with overlapping surfaces, a study of the interpolation in the viscous region is performed. Focus is given to the particularities of the flow solver, mainly the cell centred scheme as well as an artificial dissipation scheme, to influence the chosen methods. Two aspects are analyzed: the mesh generation for these meshes and the proper treatment of the boundary condition. A limitation is proposed to the mesh generation, to help ensure adequate grid assemblies and valid interpolation donors. Third, the manner to compute the aerodynamic forces and moments is addressed. A weighted panel method is introduced to avoid the double integration in overlapping regions

    Spatial Database Support for Virtual Engineering

    Get PDF
    The development, design, manufacturing and maintenance of modern engineering products is a very expensive and complex task. Shorter product cycles and a greater diversity of models are becoming decisive competitive factors in the hard-fought automobile and plane market. In order to support engineers to create complex products when being pressed for time, systems are required which answer collision and similarity queries effectively and efficiently. In order to achieve industrial strength, the required specialized functionality has to be integrated into fully-fledged database systems, so that fundamental services of these systems can be fully reused, including transactions, concurrency control and recovery. This thesis aims at the development of theoretical sound and practical realizable algorithms which effectively and efficiently detect colliding and similar complex spatial objects. After a short introductory Part I, we look in Part II at different spatial index structures and discuss their integrability into object-relational database systems. Based on this discussion, we present two generic approaches for accelerating collision queries. The first approach exploits available statistical information in order to accelerate the query process. The second approach is based on a cost-based decompositioning of complex spatial objects. In a broad experimental evaluation based on real-world test data sets, we demonstrate the usefulness of the presented techniques which allow interactive query response times even for large data sets of complex objects. In Part III of the thesis, we discuss several similarity models for spatial objects. We show by means of a new evaluation method that data-partitioning similarity models yield more meaningful results than space-partitioning similarity models. We introduce a very effective similarity model which is based on a new paradigm in similarity search, namely the use of vector set represented objects. In order to guarantee efficient query processing, suitable filters are introduced for accelerating similarity queries on complex spatial objects. Based on clustering and the introduced similarity models we present an industrial prototype which helps the user to navigate through massive data sets.Ein schneller und reibungsloser Entwicklungsprozess neuer Produkte ist ein wichtiger Faktor für den wirtschaftlichen Erfolg vieler Unternehmen insbesondere aus der Luft- und Raumfahrttechnik und der Automobilindustrie. Damit Ingenieure in immer kürzerer Zeit immer anspruchsvollere Produkte entwickeln können, werden effektive und effiziente Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten benötigt. Um den hohen Anforderungen eines produktiven Einsatzes zu genügen, müssen entsprechend spezialisierte Zugriffsmethoden in vollwertige Datenbanksysteme integriert werden, so dass zentrale Datenbankdienste wie Trans-aktionen, kontrollierte Nebenläufigkeit und Wiederanlauf sichergestellt sind. Ziel dieser Doktorarbeit ist es deshalb, effektive und effiziente Algorithmen für Kollisions- und Ähnlichkeitsanfragen auf komplexen räumlichen Objekten zu ent-wickeln und diese in kommerzielle Objekt-Relationale Datenbanksysteme zu integrieren. Im ersten Teil der Arbeit werden verschiedene räumliche Indexstrukturen zur effizienten Bearbeitung von Kollisionsanfragen diskutiert und auf ihre Integrationsfähigkeit in Objekt-Relationale Datenbanksysteme hin untersucht. Daran an-knüpfend werden zwei generische Verfahren zur Beschleunigung von Kollisionsanfragen vorgestellt. Das erste Verfahren benutzt statistische Informationen räumlicher Indexstrukturen, um eine gegebene Anfrage zu beschleunigen. Das zweite Verfahren beruht auf einer kostenbasierten Zerlegung komplexer räumlicher Datenbank- Objekte. Diese beiden Verfahren ergänzen sich gegenseitig und können unabhängig voneinander oder zusammen eingesetzt werden. In einer ausführlichen experimentellen Evaluation wird gezeigt, dass die beiden vorgestellten Verfahren interaktive Kollisionsanfragen auf umfangreichen Datenmengen und komplexen Objekten ermöglichen. Im zweiten Teil der Arbeit werden verschiedene Ähnlichkeitsmodelle für räum-liche Objekte vorgestellt. Es wird experimentell aufgezeigt, dass datenpartitionierende Modelle effektiver sind als raumpartitionierende Verfahren. Weiterhin werden geeignete Filtertechniken zur Beschleunigung des Anfrageprozesses entwickelt und experimentell untersucht. Basierend auf Clustering und den entwickelten Ähnlichkeitsmodellen wird ein industrietauglicher Prototyp vorgestellt, der Benutzern hilft, durch große Datenmengen zu navigieren
    • …
    corecore