6,339 research outputs found

    SVS-JOIN : efficient spatial visual similarity join for geo-multimedia

    Get PDF
    In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently

    An R*-Tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System

    Get PDF
    The growing importance of geospatial databases has made it essential to perform complex spatial queries efficiently. To achieve acceptable performance levels, database systems have been increasingly required to make use of parallelism. The spatial join is a computationally expensive operator. Efficient implementation of the join operator is, thus, desirable. The work presented in this document attempts to improve the performance of spatial join queries by distributing the data set across several nodes of a cluster and executing queries across these nodes in parallel. This document discusses a new parallel algorithm that implements the spatial join in an efficient manner. This algorithm is compared to an existing parallel spatial-join algorithm, the clone join. Both algorithms have been implemented on a Beowulf cluster and compared using real datasets. An extensive experimental analysis reveals that the proposed algorithm exhibits superior performance both in declustering time as well as in the execution time of the join query

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    A query processing system for very large spatial databases using a new map algebra

    Get PDF
    Dans cette thĂšse nous introduisons une approche de traitement de requĂȘtes pour des bases de donnĂ©e spatiales. Nous expliquons aussi les concepts principaux que nous avons dĂ©fini et dĂ©veloppĂ©: une algĂšbre spatiale et une approche Ă  base de graphe utilisĂ©e dans l'optimisateur. L'algĂšbre spatiale est dĂ©fini pour exprimer les requĂȘtes et les rĂšgles de transformation pendant les diffĂ©rentes Ă©tapes de l'optimisation de requĂȘtes. Nous avons essayĂ© de dĂ©finir l'algĂšbre la plus complĂšte que possible pour couvrir une grande variĂ©tĂ© d'application. L'opĂ©rateur algĂ©brique reçoit et produit seulement des carte. Les fonctions reçoivent des cartes et produisent des scalaires ou des objets. L'optimisateur reçoit la requĂȘte en expression algĂ©brique et produit un QEP (Query Evaluation Plan) efficace dans deux Ă©tapes: gĂ©nĂ©ration de QEG (Query Evaluation Graph) et gĂ©nĂ©ration de QEP. Dans premiĂšre Ă©tape un graphe (QEG) Ă©quivalent de l'expression algĂ©brique est produit. Les rĂšgles de transformation sont utilisĂ©es pour transformer le graphe a un Ă©quivalent plus efficace. Dans deuxiĂšme Ă©tape un QEP est produit de QEG passĂ© de l'Ă©tape prĂ©cĂ©dente. Le QEP est un ensemble des opĂ©rations primitives consĂ©cutives qui produit les rĂ©sultats finals (la rĂ©ponse finale de la requĂȘte soumise au base de donnĂ©e). Nous avons implĂ©mentĂ© l'optimisateur, un gĂ©nĂ©rateur de requĂȘte spatiale alĂ©atoire, et une base de donnĂ©e simulĂ©e. La base de donnĂ©e spatiale simulĂ©e est un ensemble de fonctions pour simuler des opĂ©rations spatiales primitives. Les requĂȘtes alĂ©atoires sont soumis Ă  l'optimisateur. Les QEPs gĂ©nĂ©rĂ©es sont soumis au simulateur de base de donnĂ©es spatiale. Les rĂ©sultats expĂ©rimentaux sont utilisĂ©s pour discuter les performances et les caractĂ©ristiques de l'optimisateur.Abstract: In this thesis we introduce a query processing approach for spatial databases and explain the main concepts we defined and developed: a spatial algebra and a graph based approach used in the optimizer. The spatial algebra was defined to express queries and transformation rules during different steps of the query optimization. To cover a vast variety of potential applications, we tried to define the algebra as complete as possible. The algebra looks at the spatial data as maps of spatial objects. The algebraic operators act on the maps and result in new maps. Aggregate functions can act on maps and objects and produce objects or basic values (characters, numbers, etc.). The optimizer receives the query in algebraic expression and produces one efficient QEP (Query Evaluation Plan) through two main consecutive blocks: QEG (Query Evaluation Graph) generation and QEP generation. In QEG generation we construct a graph equivalent of the algebraic expression and then apply graph transformation rules to produce one efficient QEG. In QEP generation we receive the efficient QEG and do predicate ordering and approximation and then generate the efficient QEP. The QEP is a set of consecutive phases that must be executed in the specified order. Each phase consist of one or more primitive operations. All primitive operations that are in the same phase can be executed in parallel. We implemented the optimizer, a randomly spatial query generator and a simulated spatial database. The query generator produces random queries for the purpose of testing the optimizer. The simulated spatial database is a set of functions to simulate primitive spatial operations. They return the cost of the corresponding primitive operation according to input parameters. We put randomly generated queries to the optimizer, got the generated QEPs and put them to the spatial database simulator. We used the experimental results to discuss on the optimizer characteristics and performance. The optimizer was designed for databases with a very large number of spatial objects nevertheless most of the concepts we used can be applied to all spatial information systems."--RĂ©sumĂ© abrĂ©gĂ© par UMI

    Optimal-Location-Selection Query Processing in Spatial Databases

    Get PDF
    Abstract—This paper introduces and solves a novel type of spatial queries, namely, Optimal-Location-Selection (OLS) search, which has many applications in real life. Given a data object set DA, a target object set DB, a spatial region R, and a critical distance dc in a multidimensional space, an OLS query retrieves those target objects in DB that are outside R but have maximal optimality. Here, the optimality of a target object b 2 DB located outside R is defined as the number of the data objects from DA that are inside R and meanwhile have their distances to b not exceeding dc. When there is a tie, the accumulated distance from the data objects to b serves as the tie breaker, and the one with smaller distance has the better optimality. In this paper, we present the optimality metric, formalize the OLS query, and propose several algorithms for processing OLS queries efficiently. A comprehensive experimental evaluation has been conducted using both real and synthetic data sets to demonstrate the efficiency and effectiveness of the proposed algorithms. Index Terms—Query processing, optimal-location-selection, spatial database, algorithm. Ç
    • 

    corecore