767 research outputs found

    Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters

    Full text link
    We show how to answer spatial multiple-set intersection queries in O(n(log w)/w + kt) expected time, where n is the total size of the t sets involved in the query, w is the number of bits in a memory word, k is the output size, and c is any fixed constant. This improves the asymptotic performance over previous solutions and is based on an interesting data structure, known as 2-3 cuckoo hash-filters. Our results apply in the word-RAM model (or practical RAM model), which allows for constant-time bit-parallel operations, such as bitwise AND, OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs. Our solutions apply to any multiple-set intersection queries in spatial data sets that can be reduced to one-dimensional range queries, such as spatial join queries for one-dimensional points or sets of points stored along space-filling curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International Conference on Advances in Geographic Information System

    New Database Architecture for Smart Query Handler of Spatial Database

    Get PDF
    AbstractA spatial database system is a database system with additional capabilities for handling spatial data. It also supports spatial data types in its implementation, providing spatial indexing and efficient algorithms for spatial join. The retrieval of data values from a spatial database involves searching through the huge repository of data, involving huge cost. Thus query optimization on spatial database takes more time as compared to RDBMS. The current state of art shows that during the execution of a query in a spatial database management system (SDBMS), the query optimizer creates all possible query evaluation plans. All plans are equivalent in term of their final output but vary in their execution cost, the amount of time to run. Once the data is retrieved the query and its plans are deleted from memory to free the space for future usage. This is repeated for the next query even if the query is already executed. This leads to increased storage overhead and execution time. In this paper, a new database architecture is proposed, which uses a buffer based query optimization technique for faster data retrieval

    Accelerating Spatio-Textual Queries with Learned Indices

    Full text link
    Efficiently computing spatio-textual queries has become increasingly important in various applications that need to quickly retrieve geolocated entities associated with textual information, such as in location-based services and social networks. To accelerate such queries, several works have proposed combining spatial and textual indices into hybrid index structures. Recently, the novel idea of replacing traditional indices with ML models has attracted a lot of attention. This includes works on learned spatial indices, where the main challenge is to address the lack of a total ordering among objects in a multidimensional space. In this work, we investigate how to extend this novel type of index design to the case of spatio-textual data. We study different design choices, based on either loose or tight coupling between the spatial and textual part, as well as a hybrid index that combines a traditional and a learned component. We also perform an experimental evaluation using several real-world datasets to assess the potential benefits of using a learned index for evaluating spatio-textual queries

    Visual Landmark Recognition from Internet Photo Collections: A Large-Scale Evaluation

    Full text link
    The task of a visual landmark recognition system is to identify photographed buildings or objects in query photos and to provide the user with relevant information on them. With their increasing coverage of the world's landmark buildings and objects, Internet photo collections are now being used as a source for building such systems in a fully automatic fashion. This process typically consists of three steps: clustering large amounts of images by the objects they depict; determining object names from user-provided tags; and building a robust, compact, and efficient recognition index. To this date, however, there is little empirical information on how well current approaches for those steps perform in a large-scale open-set mining and recognition task. Furthermore, there is little empirical information on how recognition performance varies for different types of landmark objects and where there is still potential for improvement. With this paper, we intend to fill these gaps. Using a dataset of 500k images from Paris, we analyze each component of the landmark recognition pipeline in order to answer the following questions: How many and what kinds of objects can be discovered automatically? How can we best use the resulting image clusters to recognize the object in a query? How can the object be efficiently represented in memory for recognition? How reliably can semantic information be extracted? And finally: What are the limiting factors in the resulting pipeline from query to semantics? We evaluate how different choices of methods and parameters for the individual pipeline steps affect overall system performance and examine their effects for different query categories such as buildings, paintings or sculptures

    Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search Criteria

    Get PDF
    This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types. Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems. This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme. To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria

    A survey on big data indexing strategies

    Get PDF
    The operations of the Internet have led to a significant growth and accumulation of data known as Big Data.Individuals and organizations that utilize this data, had no idea, nor were they prepared for this data explosion.Hence, the available solutions cannot meet the needs of the growing heterogeneous data in terms of processing. This results in inefficient information retrieval or search query results.The design of indexing strategies that can support this need is required. A survey on various indexing strategies and how they are utilized for solving Big Data management issues can serve as a guide for choosing the strategy best suited for a problem, and can also serve as a base for the design of more efficient indexing strategies.The aim of the study is to explore the characteristics of the indexing strategies used in Big Data manageability by covering some of the weaknesses and strengths of B-tree, R-tree, to name but a few. This paper covers some popular indexing strategies used for Big Data management. It exposes the potentials of each by carefully exploring their properties in ways that are related to problem solving
    • …
    corecore