57 research outputs found

    An Efficient Algorithm for Bulk-Loading xBR+ -trees

    Get PDF
    A major part of the interface to a database is made up of the queries that can be addressed to this database and answered (processed) in an efficient way, contributing to the quality of the developed software. Efficiently processed spatial queries constitute a fundamental part of the interface to spatial databases due to the wide area of applications that may address such queries, like geographical information systems (GIS), location-based services, computer visualization, automated mapping, facilities management, etc. Another important capability of the interface to a spatial database is to offer the creation of efficient index structures to speed up spatial query processing. The xBR + -tree is a balanced disk-resident quadtree-based index structure for point data, which is very efficient for processing such queries. Bulk-loading refers to the process of creating an index from scratch, when the dataset to be indexed is available beforehand, instead of creating the index gradually (and more slowly), when the dataset elements are inserted one-by-one. In this paper, we present an algorithm for bulk-loading xBR + -trees for big datasets residing on disk, using a limited amount of main memory. The resulting tree is not only built fast, but exhibits high performance in processing a broad range of spatial queries, where one or two datasets are involved. To justify these characteristics, using real and artificial datasets of various cardinalities, first, we present an experimental comparison of this algorithm vs. a previous version of the same algorithm and STR, a popular algorithm of bulk-loading R-trees, regarding tree creation time and the characteristics of the trees created, and second, we experimentally compare the query efficiency of bulk-loaded xBR + -trees vs. bulk-loaded R-trees, regarding I/O and execution time. Thus, this paper contributes to the implementation of spatial database interfaces and the efficient storage organization for big spatial data management

    Estructuras espaciales para la representación de mapas

    Get PDF
    Unos pocos años atrás la información geográfica se representaba en mapas de papel y la manipulación de esta información estaba limitada a un proceso manual no interactivo. El rápido desarrollo de la digitalización de la información geográfica junto con una demanda creciente de manipulación y análisis de estos datos ha generado la necesidad de software dedicado. La información geográfica está constituida por grandes volúmenes de información espacial y se hace necesario contar con métodos eficientes para almacenarla y recuperar datos relevantes. Los datos espaciales consisten de objetos espaciales basados en puntos, líneas, superficies, volúmenes y otros datos de mayor dimensión. Ejemplo de datos espaciales son ciudades, rutas, ríos, etc. En un mapa, estos objetos espaciales combinados con ciertos atributos no espaciales tales como nombres de ciudades, nombres de calles, numeración de rutas, etc. conforman la información geográfica. Las bases de datos espaciales facilitan el almacenamiento y el procesamiento eficiente de información espacial y no espacial, idealmente sin favorecer una sobre otra. Una propuesta para la representación de datos espaciales es separar estructuralmente los datos no espaciales de los espaciales manteniendo apropiadamente la relación entre ambos. Esto nos permite agilizar las consultas sobre los datos espaciales (las operaciones espaciales son realizadas directamente sobre la estructura de datos espaciales) y nos da la libertad de elegir una estructura de datos espacial más apropiada de las que nos impone la estructura no espacial. Existen muchos estudios teóricos dedicados al tratamiento de información espacial; no obstante, la mayoría de los trabajos de alcance público se basan en la utilización de bases de datos relacionales las cuales no resultan adecuadas debido a la naturaleza de la información subyacente. En este contexto, se están estudiando distintas técnicas que permitan representar y manejar adecuadamente datos espaciales, restringiendo la atención a objetos cuyas componentes estén definidas en el plano 2D. Como parte de este estudio se está desarrollando un prototipo para la recuperación y visualización de información espacial que sirva de base para el diseño de distintas aplicaciones de planeamiento urbano.Eje: Computación gráfica. VisualizaciónRed de Universidades con Carreras en Informática (RedUNCI

    Estructuras espaciales para la representación de mapas

    Get PDF
    Unos pocos años atrás la información geográfica se representaba en mapas de papel y la manipulación de esta información estaba limitada a un proceso manual no interactivo. El rápido desarrollo de la digitalización de la información geográfica junto con una demanda creciente de manipulación y análisis de estos datos ha generado la necesidad de software dedicado. La información geográfica está constituida por grandes volúmenes de información espacial y se hace necesario contar con métodos eficientes para almacenarla y recuperar datos relevantes. Los datos espaciales consisten de objetos espaciales basados en puntos, líneas, superficies, volúmenes y otros datos de mayor dimensión. Ejemplo de datos espaciales son ciudades, rutas, ríos, etc. En un mapa, estos objetos espaciales combinados con ciertos atributos no espaciales tales como nombres de ciudades, nombres de calles, numeración de rutas, etc. conforman la información geográfica. Las bases de datos espaciales facilitan el almacenamiento y el procesamiento eficiente de información espacial y no espacial, idealmente sin favorecer una sobre otra. Una propuesta para la representación de datos espaciales es separar estructuralmente los datos no espaciales de los espaciales manteniendo apropiadamente la relación entre ambos. Esto nos permite agilizar las consultas sobre los datos espaciales (las operaciones espaciales son realizadas directamente sobre la estructura de datos espaciales) y nos da la libertad de elegir una estructura de datos espacial más apropiada de las que nos impone la estructura no espacial. Existen muchos estudios teóricos dedicados al tratamiento de información espacial; no obstante, la mayoría de los trabajos de alcance público se basan en la utilización de bases de datos relacionales las cuales no resultan adecuadas debido a la naturaleza de la información subyacente. En este contexto, se están estudiando distintas técnicas que permitan representar y manejar adecuadamente datos espaciales, restringiendo la atención a objetos cuyas componentes estén definidas en el plano 2D. Como parte de este estudio se está desarrollando un prototipo para la recuperación y visualización de información espacial que sirva de base para el diseño de distintas aplicaciones de planeamiento urbano.Eje: Computación gráfica. VisualizaciónRed de Universidades con Carreras en Informática (RedUNCI

    Efficient Generating And Processing Of Large-Scale Unstructured Meshes

    Get PDF
    Unstructured meshes are used in a variety of disciplines to represent simulations and experimental data. Scientists who want to increase accuracy of simulations by increasing resolution must also increase the size of the resulting dataset. However, generating and processing a extremely large unstructured meshes remains a barrier. Researchers have published many parallel Delaunay triangulation (DT) algorithms, often focusing on partitioning the initial mesh domain, so that each rectangular partition can be triangulated in parallel. However, the comproblems for this method is how to merge all triangulated partitions into a single domain-wide mesh or the significant cost for communication the sub-region borders. We devised a novel algorithm --Triangulation of Independent Partitions in Parallel (TIPP) to deal with very large DT problems without requiring inter-processor communication while still guaranteeing the Delaunay criteria. The core of the algorithm is to find a set of independent} partitions such that the circumcircles of triangles in one partition do not enclose any vertex in other partitions. For this reason, this set of independent partitions can be triangulated in parallel without affecting each other. The results of mesh generation is the large unstructured meshes including vertex index and vertex coordinate files which introduce a new challenge \-- locality. Partitioning unstructured meshes to improve locality is a key part of our own approach. Elements that were widely scattered in the original dataset are grouped together, speeding data access. For further improve unstructured mesh partitioning, we also described our new approach. Direct Load which mitigates the challenges of unstructured meshes by maximizing the proportion of useful data retrieved during each read from disk, which in turn reduces the total number of read operations, boosting performance

    Large-Scale Spatial Data Management on Modern Parallel and Distributed Platforms

    Full text link
    Rapidly growing volume of spatial data has made it desirable to develop efficient techniques for managing large-scale spatial data. Traditional spatial data management techniques cannot meet requirements of efficiency and scalability for large-scale spatial data processing. In this dissertation, we have developed new data-parallel designs for large-scale spatial data management that can better utilize modern inexpensive commodity parallel and distributed platforms, including multi-core CPUs, many-core GPUs and computer clusters, to achieve both efficiency and scalability. After introducing background on spatial data management and modern parallel and distributed systems, we present our parallel designs for spatial indexing and spatial join query processing on both multi-core CPUs and GPUs for high efficiency as well as their integrations with Big Data systems for better scalability. Experiment results using real world datasets demonstrate the effectiveness and efficiency of the proposed techniques on managing large-scale spatial data

    Efficient Index-based Methods for Processing Large Biological Databases.

    Full text link
    Over the last few decades, advances in life sciences have generated a vast amount of biological data. To cope with the rapid increase in data volume, there is a pressing need for efficient computational methods to query large biological datasets. This thesis develops efficient and scalable querying methods for biological data. For an efficient sequence database search, we developed two q-gram index based algorithms, miBLAST and ProbeMatch. miBLAST is designed to expedite batch identification of statistically significant sequence alignments. ProbeMatch is designed for identifying sequence alignments based on a k-mismatch model. For an efficient protein structure database search, we also developed a multi-dimensional index based algorithm method called proCC, an automatic and efficient classification framework. All these algorithms result in substantial performance improvements over existing methods. When designing index-based methods, the right choice of indexing methods is essential. In addition to developing index-based methods for biological applications, we also investigated an essential database problem that reexamines the state-of-the-art indexing methods by experimental evaluation. Our experimental study provides a valuable insight for choosing the right indexing method and also motivates a careful consideration of index structures when designing index-based methods. In the long run, index-based methods can lead to new and more efficient algorithms for querying and mining biological datasets. The examples above, which include query processing on biological sequence and geometrical structure datasets, employ index-based methods very effectively. While the database research community has long recognized the need for index-based query processing algorithms, the bioinformatics community has been slow to adopt such algorithms. However, since many biological datasets are growing very rapidly, database-style index-based algorithms are likely to play a crucial role in modern bioinformatics methods. The work proposed in this thesis lays the foundation for such methods.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61570/1/youjkim_1.pd

    On the Practice and Application of Context-Free Language Reachability

    Get PDF
    The Context-Free Language Reachability (CFL-R) formalism relates to some of the most important computational problems facing researchers and industry practitioners. CFL-R is a generalisation of graph reachability and language recognition, such that pairs in a labelled graph are reachable if and only if there is a path between them whose labels, joined together in the order they were encountered, spell a word in a given context-free language. The formalism finds particular use as a vehicle for phrasing and reasoning about program analysis, since complex relationships within the data, logic or structure of computer programs are easily expressed and discovered in CFL-R. Unfortunately, The potential of CFL-R can not be met by state of the art solvers. Current algorithms have scalability and expressibility issues that prevent them from being used on large graph instances or complex grammars. This work outlines our efforts in understanding the practical concerns surrounding CFL-R, and applying this knowledge to improve the performance of CFL-R applications. We examine the major difficulties with solving CFL-R-based analyses at-scale, via a case-study of points-to analysis as a CFL-R problem. Points-to analysis is fundamentally important to many modern research and industry efforts, and is relevant to optimisation, bug-checking and security technologies. Our understanding of the scalability challenge motivates work in developing practical CFL-R techniques. We present improved evaluation algorithms and declarative optimisation techniques for CFL-R, capitalising on the simplicity of CFL-R to creating fully automatic methodologies. The culmination of our work is a general-purpose and high-performance tool called Cauliflower, a solver-generator for CFL-R problems. We describe Cauliflower and evaluate its performance experimentally, showing significant improvement over alternative general techniques

    Map algebra on raster datasets represented by compact data structures

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract]: The increase in the size of data repositories has forced the design of new computing paradigms to be able to process large volumes of data in a reasonable amount of time. One of them is in-memory computing, which advocates storing all the data in main memory to avoid the disk I/O bottleneck. Compression is one of the key technologies for this approach. For raster data, a compact data structure, called (Formula presented.) -raster, have been recently been proposed. It compresses raster maps while still supporting fast retrieval of a given datum or a portion of the data directly from the compressed data. (Formula presented.) -raster's original work introduced several queries in which it was superior to competitors. However, to be used as the basis of an in-memory system for raster data, it is mandatory to demonstrate its efficiency when performing more complex operations such as the map algebra operators. In this work, we present the algorithms to run a set of these operators directly on (Formula presented.) -raster without a decompression procedure.This work was supported by the National Natural Science Foundation of China (Grant Nos. 31171944, 31640068), Anhui Provincial Natural Science Foundation (Grant No. 2019B319), Earmarked Fund for Anhui Science and Technology Major Project (202003b06020016). Information CITIC, Ministerio de Ciencia e Innovación, Grant/Award Numbers: PID2020-114635RB-I00; PDC2021-120917-C21; PDC2021-121239-C31; PID2019-105221RB-C41; TED2021-129245-C21; Xunta de Galicia, Grant/Award Numbers: ED431C 2021/53; IN852D 2021/3 (CO3)This work was partially supported by CITIC, CITIC is funded by the Xunta de Galicia through the collaboration agreement between the Department of Culture, Education, Vocational Training and Universities and the Galician universities for the reinforcement of the research centers of the Galician University System (CIGUS). IN852D 2021/3(CO3): partially funded by UE, (ERDF), GAIN, convocatoria Conecta COVID. GRC: ED431C 2021/53: partially funded by GAIN/Xunta de Galicia. TED2021-129245B-C21; PDC2021-121239-C31; PDC2021-120917-C21: partially funded by MCIN/AEI/10.13039/501100011033 and “NextGenerationEU”/PRTR. PID2020-114635RB-I00; PID2019-105221RB-C41: partially funded by MCIN/AEI/10.13039/501100011033. Funding for open access charge: Universidadeda Coruña/CISUG.Xunta de Galicia; ED431C 2021/53Xunta de Galicia; IN852D 2021/3 (CO3)National Natural Science Foundation of China; 31171944National Natural Science Foundation of China; 31640068Anhui Provincial Natural Science Foundation; 2019B31

    An intelligent Geographic Information System for design

    Get PDF
    Recent advances in geographic information systems (GIS) and artificial intelligence (AI) techniques have been summarised, concentrating on the theoretical aspects of their construction and use. Existing projects combining AI and GIS have also been discussed, with attention paid to the interfacing methods used and problems uncovered by the approaches. AI and GIS have been combined in this research to create an intelligent GIS for design. This has been applied to off-shore pipeline route design. The system was tested using data from a real pipeline design project. [Continues.
    corecore