Search CORE

29 research outputs found

Bulk Insertions into xBR+ -trees

Author: G Roumelis
G Roumelis
GR Hjaltason
L Arge
L Chen
R Choubey
S Shekhar
T Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk insertion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR + -tree is a balanced, disk-resident, Quadtree-based index for point data, which is very efficient for processing spatial queries. In this paper, we present the first algorithm for bulk insertion into xBR+ -trees. This algorithm incorporates extensions of techniques that we have recently developed for bulk loading xBR+ -trees. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. inserting items one-by-one for updating xBR+ -trees, regarding performance (I/O and execution time) and the characteristics of the resulting trees. We also present experimental results regarding the query-processing efficiency of xBR+ -trees built by bulk insertions vs. xBR+ -trees built by inserting items one-by-one

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

Author: A Aji
A Corral
A Eldawy
F García-García
F Li
G Roumelis
H Zhang
J Shi
M Tang
Publication venue
Publication date: 01/01/2017
Field of study

Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the ε Distance Join Query (εDJQ). The KCPQ finds the K closest pairs of points from two datasets and the εDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold ε of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

The K Group Nearest-Neighbor Query on Non-indexed RAM-Resident Data

Author: D Papadias
EH Jacox
FP Preparata
G Roumelis
H-K Ahn
J Li
K Hinrichs
P Rigaux
S Namnandorj
T Jiang
X Lian
Y Luo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data sets that are used for answering a single query only once (or just a few times) before they are replaced by new data sets appear frequently in practical applications. The cost of buiding indexes to accelerate query processing would not be repaid for such data sets. We consider an extension of the popular (K) Nearest-Neighbor Query, called the (K) Group Nearest Neighbor Query (GNNQ). This query discovers the (K) nearest neighbor(s) to a group of query points (considering the sum of distances to all the members of the query group) and has been studied during recent years, considering data sets indexed by efficient spatial data structures. We study (K) GNNQs, considering non-indexed RAM-resident data sets and present an existing algorithm adapted to such data sets and two Plane-Sweep algorithms, that apply optimizations emerging from the geometric properties of the problem. By extensive experimentation, using real and synthetic data sets, we highlight the most efficient algorithm

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

The deletion operation in xBR-trees

Author: Corral A.
Roumelis G.
Vassilakopoulos M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In order to design a spatial index, the most important operations are: insertion, deletion and search. We focus on the deletion operation over the xBR-tree, a spatial data secondary memory structure that belongs to the Quad tree family. The algorithm of handling deletions is presented, taking into account that the deletion of a leaf item may cause entries deletions from internal nodes. The well-known merging technique is applied, to retain the efficiency of the xBR-tree. © 2012 IEEE

Crossref

University of Thessaly Institutional Repository

Nearest Neighbor Algorithms using xBR-Trees

Author: Corral A.
Roumelis G.
Vassilakopoulos M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

One of the common queries in spatial databases is the (K) Nearest Neighbor Query that discovers the (K) closest objects to a query object. Processing of spatial queries, in most cases, is accomplished by indexing spatial data by an access method. In this paper, we present algorithms for Nearest Neighbor Queries using a disk based structure that belongs to the Quadtree family, the xBR-tree, that can be used for indexing large point datasets. We demonstrate performance results (I/O efficiency and execution time) of alternative Nearest Neighbor algorithms, using real datasets. © 2011 IEEE

University of Thessaly Institutional Repository

Database design of a geo-environmental information system

Author: Loukopoulos T.
Roumelis G.
Vassilakopoulos M.
Publication venue
Publication date: 01/01/2014
Field of study

Environmental protection from productive investments becomes a major task for enterprises and constitutes a critical competitiveness factor. The region of Central Greece presents many serious and particular environmental problems. An Environmental Geographic Information System is under development that will maintain necessary and available information, including existing environmental legislation, specific data rules, regulations, restrictions and actions of the primary sector, existing activities of the secondary and tertiary sectors and their influences. The system will provide information about the environmental status in each location with respect to water resources, soil and atmosphere, the existence of significant pollution sources, existing surveys, studies and measurements for high risk areas, the land use and legal status of locations and the infrastructure networks. In this paper, we present a Database Design that supports the above mentioned objectives and information provision. More specifically, we present examples of user queries that the system should be able to answer for extraction of useful information, the basic categorization of data that will be maintained by the system, a data model that is able to support such data maintenance and examine how existing indexing structures can be utilized for efficient processing of such queries. Copyright © 2014 SCITEPRESS - Science and Technology Publications

University of Thessaly Institutional Repository

Bulk-loading xBR+-trees

Author: Roumelis G. Vassilakopoulos M., Corral A., Manolopoulos Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Spatial indexes are important in spatial databases for efficient execution of queries involving spatial constraints. The xBR+-tree is a balanced disk-resident quadtree-based index structure for point data, which is very efficient for processing such queries. Bulk-loading refers to the process of creating an index from scratch as a whole, when the dataset to be indexed is available beforehand, instead of creating (loading) the index gradually, when the dataset items are available one-by-one. In this paper, we present an algorithm for bulk-loading xBR+-trees for big datasets residing on disk, using a limited amount of RAM. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. the algorithm loading items one-by-one, regarding performance (I/O and execution time) and the characteristics of the xBR+-trees created. We also present experimental results regarding the efficiency of bulk-loaded xBR+-trees vs. xBR+-trees where items are loaded one-by-one for query processing. © Springer International Publishing Switzerland 2016

University of Thessaly Institutional Repository

Plane-sweep algorithms for the K Group Nearest-Neighbor Query

Author: Corral A.
Manolopoulos Y.
Roumelis G.
Vassilakopoulos M.
Publication venue
Publication date: 01/01/2015
Field of study

One of the most representative and studied queries in Spatial Databases is the (K) Nearest-Neighbor (NNQ), that discovers the (K) nearest neighbor(s) to a query point. An extension that is important for practical applications is the (K) Group Nearest Neighbor Query (GNNQ), that discovers the (K) nearest neighbor(s) to a group of query points (considering the sum of distances to all the members of the query group). This query has been studied during the recent years, considering data sets indexed by efficient spatial data structures. We study (K) GNNQs, considering non-indexed data sets, since this case is frequent in practical applications. And we present two (RAM-based) Plane-Sweep algorithms, that apply optimizations emerging from the geometric properties of the problem. By extensive experimentation, using real and synthetic data sets, we highlight the most efficient algorithm

CiteSeerX

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

University of Thessaly Institutional Repository

New plane-sweep algorithms for distance-based join queries in spatial databases

Author: Roumelis G. Corral A., Vassilakopoulos M., Manolopoulos Y.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases. © 2016, Springer Science+Business Media New York

University of Thessaly Institutional Repository

A new plane-sweep algorithm for the k-closest-pairs query

Author: Corral A.
Manolopoulos Y.
Roumelis G.
Vassilakopoulos M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

One of the most representative and studied Distance-Based Queries in Spatial Databases is the K-Closest-Pairs Query (KCPQ). This query involves two spatial data sets and a distance function to measure the degree of closeness, along with a given number K of elements of the result. The output is a set of pairs of objects (with one object element from each set), with the K lowest distances. In this paper, we study the problem of processing KCPQs between RAM-based point sets, using Plane-Sweep (PS) algorithms. We utilize two improvements that can be applied to a PS algorithm and propose a new algorithm that minimizes the number of distance computations, in comparison to the classic PS algorithm. By extensive experimentation, using real and synthetic data sets, we highlight the most efficient improvement and show that the new PS algorithm outperforms the classic one, in most cases. © 2014 Springer International Publishing Switzerland

Crossref

University of Thessaly Institutional Repository