45 research outputs found

    Bulk Insertions into xBR+ -trees

    Get PDF
    Bulk insertion refers to the process of updating an existing index by inserting a large batch of new data, treating the items of this batch as a whole and not by inserting these items one-by-one. Bulk insertion is related to bulk loading, which refers to the process of creating a non-existing index from scratch, when the dataset to be indexed is available beforehand. The xBR + -tree is a balanced, disk-resident, Quadtree-based index for point data, which is very efficient for processing spatial queries. In this paper, we present the first algorithm for bulk insertion into xBR+ -trees. This algorithm incorporates extensions of techniques that we have recently developed for bulk loading xBR+ -trees. Moreover, using real and artificial datasets of various cardinalities, we present an experimental comparison of this algorithm vs. inserting items one-by-one for updating xBR+ -trees, regarding performance (I/O and execution time) and the characteristics of the resulting trees. We also present experimental results regarding the query-processing efficiency of xBR+ -trees built by bulk insertions vs. xBR+ -trees built by inserting items one-by-one

    An Efficient Algorithm for Bulk-Loading xBR+ -trees

    Get PDF
    A major part of the interface to a database is made up of the queries that can be addressed to this database and answered (processed) in an efficient way, contributing to the quality of the developed software. Efficiently processed spatial queries constitute a fundamental part of the interface to spatial databases due to the wide area of applications that may address such queries, like geographical information systems (GIS), location-based services, computer visualization, automated mapping, facilities management, etc. Another important capability of the interface to a spatial database is to offer the creation of efficient index structures to speed up spatial query processing. The xBR + -tree is a balanced disk-resident quadtree-based index structure for point data, which is very efficient for processing such queries. Bulk-loading refers to the process of creating an index from scratch, when the dataset to be indexed is available beforehand, instead of creating the index gradually (and more slowly), when the dataset elements are inserted one-by-one. In this paper, we present an algorithm for bulk-loading xBR + -trees for big datasets residing on disk, using a limited amount of main memory. The resulting tree is not only built fast, but exhibits high performance in processing a broad range of spatial queries, where one or two datasets are involved. To justify these characteristics, using real and artificial datasets of various cardinalities, first, we present an experimental comparison of this algorithm vs. a previous version of the same algorithm and STR, a popular algorithm of bulk-loading R-trees, regarding tree creation time and the characteristics of the trees created, and second, we experimentally compare the query efficiency of bulk-loaded xBR + -trees vs. bulk-loaded R-trees, regarding I/O and execution time. Thus, this paper contributes to the implementation of spatial database interfaces and the efficient storage organization for big spatial data management

    Efficient query processing on large spatial databases A performance study

    Get PDF
    Processing of spatial queries has been studied extensively in the literature. In most cases, it is accomplished by indexing spatial data using spatial access methods. Spatial indexes, such as those based on the Quadtree, are important in spatial databases for efficient execution of queries involving spatial constraints and objects. In this paper, we study a recent balanced disk-based index structure for point data, called xBR + -tree, that belongs to the Quadtree family and hierarchically decomposes space in a regular manner. For the most common spatial queries, like Point Location, Window, Distance Range, Nearest Neighbor and Distance-based Join, the R-tree family is a very popular choice of spatial index, due to its excellent query performance. For this reason, we compare the performance of the xBR + -tree with respect to the R ∗ -tree and the R + -tree for tree building and processing the most studied spatial queries. To perform this comparison, we utilize existing algorithms and present new ones. We demonstrate through extensive experimental performance results (I/O efficiency and execution time), based on medium and large real and synthetic datasets, that the xBR + -tree is a big winner in execution time in all cases and a winner in I/O in most cases

    New Plane-Sweep Algorithms for Distance-Based Join Queries in Spatial Databases

    Get PDF
    Efficient and effective processing of the distance-based join query (DJQ) is of great importance in spatial databases due to the wide area of applications that may address such queries (mapping, urban planning, transportation planning, resource management, etc.). The most representative and studied DJQs are the K Closest Pairs Query (KCPQ) and εDistance Join Query (εDJQ). These spatial queries involve two spatial data sets and a distance function to measure the degree of closeness, along with a given number of pairs in the final result (K) or a distance threshold (ε). In this paper, we propose four new plane-sweep-based algorithms for KCPQs and their extensions for εDJQs in the context of spatial databases, without the use of an index for any of the two disk-resident data sets (since, building and using indexes is not always in favor of processing performance). They employ a combination of plane-sweep algorithms and space partitioning techniques to join the data sets. Finally, we present results of an extensive experimental study, that compares the efficiency and effectiveness of the proposed algorithms for KCPQs and εDJQs. This performance study, conducted on medium and big spatial data sets (real and synthetic) validates that the proposed plane-sweep-based algorithms are very promising in terms of both efficient and effective measures, when neither inputs are indexed. Moreover, the best of the new algorithms is experimentally compared to the best algorithm that is based on the R-tree (a widely accepted access method), for KCPQs and εDJQs, using the same data sets. This comparison shows that the new algorithms outperform R-tree based algorithms, in most cases

    A Comparison of Distributed Spatial Data Management Systems for Processing Distance Join Queries

    Get PDF
    Due to the ubiquitous use of spatial data applications and the large amounts of spatial data that these applications generate, the processing of large-scale distance joins in distributed systems is becoming increasingly popular. Two of the most studied distance join queries are the K Closest Pair Query (KCPQ) and the ε Distance Join Query (εDJQ). The KCPQ finds the K closest pairs of points from two datasets and the εDJQ finds all the possible pairs of points from two datasets, that are within a distance threshold ε of each other. Distributed cluster-based computing systems can be classified in Hadoop-based and Spark-based systems. Based on this classification, in this paper, we compare two of the most current and leading distributed spatial data management systems, namely SpatialHadoop and LocationSpark, by evaluating the performance of existing and newly proposed parallel and distributed distance join query algorithms in different situations with big real-world datasets. As a general conclusion, while SpatialHadoop is more mature and robust system, LocationSpark is the winner with respect to the total execution time

    The K Group Nearest-Neighbor Query on Non-indexed RAM-Resident Data

    Get PDF
    Data sets that are used for answering a single query only once (or just a few times) before they are replaced by new data sets appear frequently in practical applications. The cost of buiding indexes to accelerate query processing would not be repaid for such data sets. We consider an extension of the popular (K) Nearest-Neighbor Query, called the (K) Group Nearest Neighbor Query (GNNQ). This query discovers the (K) nearest neighbor(s) to a group of query points (considering the sum of distances to all the members of the query group) and has been studied during recent years, considering data sets indexed by efficient spatial data structures. We study (K) GNNQs, considering non-indexed RAM-resident data sets and present an existing algorithm adapted to such data sets and two Plane-Sweep algorithms, that apply optimizations emerging from the geometric properties of the problem. By extensive experimentation, using real and synthetic data sets, we highlight the most efficient algorithm

    Structuring point data and processing spatial queries

    No full text
    The aim of the present thesis was to develop and study an improved version of the structure xBR-tree named xBR+-tree in the section of spatial data structuring. This index would have to be capable of organizing, querying and storing small and big spatial data. Construction methods (one-by-one insertion and bulk loading) and a deletion algorithm were developed. The xBR+-tree was compared experimentally with xBR-tree and popular R-trees in both types of spatial queries with one or two input data sets. We proposed two enhancements on algorithms using classic plain sweep for join queries (kCPQ, εDJQ) with two input spatial data sets stored in main memory. One new algorithm (Reverse Run Plain Sweep - RRPS) was developed in order to improve the query processing of that type of queries executed on data stored in main memory beforehand or partial loading. The experimental results showed that the algorithm RRPS always reduces the distance calculations, therefore accelerates the execution time. In the field of spatial query processing, on data stored in main memory, existing methods were studied and new ones were proposed in order to solve the k Group Nearest Neighbor problem. Finally, new algorithms were proposed using a combination of the plane sweep technique and space partitioning for kCPQ and εDJQ without utilizing any spatial indexing method over data sets stored in the secondary memory and studied. The new algorithms proved efficient. The best of the new algorithms proved more efficient in comparison to the best algorithm using the spatial structure R-tree.Σκοπός της διατριβής στον τομέα των μεθόδων δόμησης σημειακών δεδομένων πολύ μεγάλου όγκου ήταν η βελτίωση της δομής του δενδρικού χωρικού ευρετηρίου xBR-tree με μία νέα δομή (xBR+-tree). Αναπτύχθηκαν μέθοδοι κατασκευής του νέου ευρετηρίου με εισαγωγή μεμονωμένων και μαζική εισαγωγή δεδομένων και μέθοδος διαγραφής δεδομένων από τα xBR-tree. Μελετήθηκαν τα αποτελέσματα πειραμάτων σύγκρισης του xBR+-tree με το xBR-tree και με R-trees στη λειτουργία δόμησης και στην επεξεργασία χωρικών ερωτημάτων επί ενός ή δύο συνόλων δεδομένων. Προτάθηκαν επεκτάσεις των αλγορίθμων κλασικής τεχνικής σάρωσης επιπέδου για ερωτήματα σύζευξης δύο χωρικών συνόλων αποθηκευμένων στην κύρια μνήμη kCPQ και εDJQ με δύο προτάσεις βελτίωσης. Παρουσιάστηκε ένας νέος αλγόριθμος (Αντίρροπης Κίνησης Αλγόριθμος Σάρωσης – RRPS) για τη βελτίωση της επεξεργασίας των ερωτημάτων αυτών τόσο με δεδομένα εξολοκλήρου στην κύρια μνήμη ή επί τμημάτων των συνόλων δεδομένων που ανεβαίνουν επιλεκτικά στην κύρια μνήμη. Τα αποτελέσματα των πειραμάτων οδήγησαν στο συμπέρασμα ότι ο αλγόριθμος RRPS πάντοτε περιορίζει τους υπολογισμούς απόστασης άρα επιταχύνει χρονικά την εκτέλεση. Στο πεδίο των ερωτημάτων χωρικής σύζευξης με δεδομένα στην κύρια μνήμη μελετήθηκαν υπάρχοντες και προτάθηκαν νέοι αλγόριθμοι επίλυσης των k ομαδικών εγγύτερων γειτόνων. Τέλος, παρουσιάστηκαν και μελετήθηκαν νέοι αλγόριθμοι που χρησιμοποιούν ένα συνδυασμό τεχνικής σάρωσης και διαμέρισης του χώρου για τη σύζευξη των δεδομένων για ερωτήματα kCPQ και εDJQ, χωρίς τη χρήση κάποιου ευρετηρίου, με δεδομένα αποθηκευμένα στον δίσκο. Απεδείχθη ότι οι αλγόριθμοι RRPS είναι αποδοτικότεροι στα ανωτέρω ερωτήματα από τους κλασικής τεχνικής σάρωσης. Ο καλύτερος από τους νέους αλγόριθμους συγκρινόμενος πειραματικά με τον καλύτερο αλγόριθμο που χρησιμοποιεί τη χωρική δομή δεικτοδότησης R-tree αναδείχθηκε ως αποδοτικότερος

    Corporate book-tax conformity: empirical research in Greece

    No full text
    Main objective of the thesis is to analyze book-tax differences that occur on Greek enterprises using confidential tax data from the Greek Ministry of Finance. The results provide valid information about the exact amount of taxable income (the size of taxable profits or tax losses) as well as the identification of the main sources that create the difference between accounting and taxable income in Greek firms. As well according to the international literature, the results show that the accounting profits (accounting losses) as published in the annual financial statements of a company, are deviated significantly from taxable profits (tax losses) that the firm declare to tax authorities through the annual tax return form. It is identified that the main source of book-tax differences is non-deductible expenses and tax losses carryforward. In addition, this thesis focuses on two econometric models. The first examine the relation between the quality of profits and book-tax differences in Greek companies, while the second investigates the magnitude of tax avoidance / tax evasion (also in Greek firms) and the macroeconomic fluctuations of Greek economy. The results show that as long as the conformity level is higher (or the level of book-tax differences is lower), the quality of the company's accounting profits is higher. Moreover, the analysis based on quantitative results combining firm-level and macroeconomic data, demonstrate that (all other things being equal) during recession phases (expansion phases) firms tend to avoid but not evade taxes (not avoid but instead evade taxes).Κύριος στόχος της διατριβής είναι η ανάλυση των λογιστικών και φορολογικών διαφορών των επιχειρήσεων στην Ελλάδα χρησιμοποιώντας εμπιστευτικά φορολογικά δεδομένα που παραχωρήθηκαν επίσημα από το Υπουργείο Οικονομικών. Τα αποτελέσματα παρέχουν έγκυρες πληροφορίες για τον εντοπισμό των κυριότερων πηγών της διαφοράς μεταξύ λογιστικού και φορολογητέου εισοδήματος. Σύμφωνα και με τη διεθνή βιβλιογραφία τα ευρήματά μας δείχνουν ότι το λογιστικό κέρδος (λογιστική ζημία) των ετήσιων δημοσιευμένων οικονομικών καταστάσεων της επιχείρησης αποκλίνει σημαντικά από το φορολογητέο κέρδος (φορολογητέα ζημία) της επιχείρησης όπως αυτό δηλώνεται στις φορολογικές αρχές μέσα από την ετήσια φορολογική δήλωση. Εντοπίζεται ότι κύρια πηγή των διαφορών μεταξύ λογιστικού και φορολογητέου αποτελέσματος αποτελούν οι δαπάνες που δεν αναγνωρίζονται για έκπτωση καθώς και η μεταφερομένη φορολογική ζημιά προηγούμενων χρήσεων. Επιπλέον η παρούσα διατριβή διερευνά δυο βασικές ερευνητικές υποθέσεις. Η πρώτη αφορά την σχέση που υφίσταται μεταξύ ποιότητας των κερδών και διαφορών λογιστικού -φορολογητέου εισοδήματος των ελληνικών επιχειρήσεων, ενώ η δεύτερη το μέγεθος της φοροδιαφυγής/φοροαποφυγής των επιχειρήσεων και των μακροοικονομικών διακυμάνσεων της Eλληνικής οικονομίας. Τα αποτελέσματα δείχνουν ότι όσο το επίπεδο συμμόρφωσης είναι υψηλότερο (ή αλλιώς το επίπεδο των λογιστικών-φορολογικών διαφορών είναι χαμηλότερο) η ποιότητα των λογιστικών κερδών των επιχειρήσεων είναι υψηλότερη. Ταυτόχρονα η ανάλυση των δεδομένων αποδεικνύει ότι υπάρχει αρνητική συσχέτιση μεταξύ των οικονομικών συνθηκών, όπως μετριέται από το Ακαθάριστο Εγχώριο Προϊόν, και της φοροαποφυγής των επιχειρήσεων. Απεναντίας η σχέση μεταξύ της φοροδιαφυγής των επιχειρήσεων και των οικονομικών συνθηκών είναι θετική. Με άλλα λόγια, ceteris paribus, κατά τη φάση της ύφεσης (επέκτασης), οι επιχειρήσεις τείνουν να φοροαποφεύγουν αλλά όχι να φοροδιαφεύγουν (δεν φοροαποφεύγουν, αλλά φοροδιαφεύγουν)

    The deletion operation in xBR-trees

    No full text
    In order to design a spatial index, the most important operations are: insertion, deletion and search. We focus on the deletion operation over the xBR-tree, a spatial data secondary memory structure that belongs to the Quad tree family. The algorithm of handling deletions is presented, taking into account that the deletion of a leaf item may cause entries deletions from internal nodes. The well-known merging technique is applied, to retain the efficiency of the xBR-tree. © 2012 IEEE
    corecore