402 research outputs found

    Reverse Nearest Neighbor Heat Maps: A Tool for Influence Exploration

    Full text link
    We study the problem of constructing a reverse nearest neighbor (RNN) heat map by finding the RNN set of every point in a two-dimensional space. Based on the RNN set of a point, we obtain a quantitative influence (i.e., heat) for the point. The heat map provides a global view on the influence distribution in the space, and hence supports exploratory analyses in many applications such as marketing and resource management. To construct such a heat map, we first reduce it to a problem called Region Coloring (RC), which divides the space into disjoint regions within which all the points have the same RNN set. We then propose a novel algorithm named CREST that efficiently solves the RC problem by labeling each region with the heat value of its containing points. In CREST, we propose innovative techniques to avoid processing expensive RNN queries and greatly reduce the number of region labeling operations. We perform detailed analyses on the complexity of CREST and lower bounds of the RC problem, and prove that CREST is asymptotically optimal in the worst case. Extensive experiments with both real and synthetic data sets demonstrate that CREST outperforms alternative algorithms by several orders of magnitude.Comment: Accepted to appear in ICDE 201

    The future of urban models in the Big Data and AI era: a bibliometric analysis (2000-2019)

    Get PDF
    This article questions the effects on urban research dynamics of the Big Data and AI turn in urban management. To identify these effects, we use two complementary materials: bibliometric data and interviews. We consider two areas in urban research: one, covering the academic research dealing with transportation systems and the other, with water systems. First, we measure the evolution of AI and Big Data keywords in these two areas. Second, we measure the evolution of the share of publications published in computer science journals about urban traffic and water quality. To guide these bibliometric analyses, we rely on the content of interviews conducted with academics and higher education officials in Paris and Edinburgh at the beginning of 2018

    New Database Architecture for Smart Query Handler of Spatial Database

    Get PDF
    AbstractA spatial database system is a database system with additional capabilities for handling spatial data. It also supports spatial data types in its implementation, providing spatial indexing and efficient algorithms for spatial join. The retrieval of data values from a spatial database involves searching through the huge repository of data, involving huge cost. Thus query optimization on spatial database takes more time as compared to RDBMS. The current state of art shows that during the execution of a query in a spatial database management system (SDBMS), the query optimizer creates all possible query evaluation plans. All plans are equivalent in term of their final output but vary in their execution cost, the amount of time to run. Once the data is retrieved the query and its plans are deleted from memory to free the space for future usage. This is repeated for the next query even if the query is already executed. This leads to increased storage overhead and execution time. In this paper, a new database architecture is proposed, which uses a buffer based query optimization technique for faster data retrieval

    Enhancing SpatialHadoop with Closest Pair Queries

    Get PDF
    Given two datasets P and Q, the K Closest Pair Query (KCPQ) finds the K closest pairs of objects from P Ă—Q. It is an operation widely adopted by many spatial and GIS applications. As a combination of the K Nearest Neighbor (KNN) and the spatial join queries, KCPQ is an expensive operation. Given the increasing volume of spatial data, it is difficult to perform a KCPQ on a centralized machine efficiently. For this reason, this paper addresses the problem of computing the KCPQ on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes a novel algorithm in SpatialHadoop to perform efficient parallel KCPQ on large-scale spatial datasets. We have evaluated the performance of the algorithm in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal

    A MapReduce Algorithm for Polygon Retrieval in Geospatial Analysis

    Get PDF
    The proliferation of data acquisition devices like 3D laser scanners had led to the burst of large-scale spatial terrain data which imposes many challenges to spatial data analysis and computation. With the advent of several emerging cloud technologies, a natural and cost-effective approach to managing such large-scale data is to store and process such datasets in a publicly hosted cloud service using modern distributed computing paradigms such as MapReduce. For several key spatial data analysis and computation problems, polygon retrieval is a fundamental operation which is often computed under real-time constraints. However, existing sequential algorithms fail to meet this demand effectively given that terrain data in recent years have witnessed an unprecedented growth in both volume and rate. In this work, we present a MapReduce-based parallel polygon retrieval algorithm which aims at minimizing the IO and CPU loads of the map and reduce tasks during spatial data processing. Our proposed algorithm first hierarchically indexes the spatial terrain data using a quad-tree index, with the help of which, a significant amount of data is filtered out in the pre-processing stage based on the query object. In addition, a prefix tree based on the quad-tree index is built to query the relationship between the terrain data and query area in real time which leads to significant savings in both I/O load and CPU time. The performance of the proposed techniques is evaluated in a Hadoop cluster and the results demonstrate that the proposed techniques are scalable and lead to more than 35% reduction in execution time of the polygon retrieval operation over existing distributed algorithms

    A MapReduce-Based Big Spatial Data Framework for Solving the Problem of Covering a Polygon with Orthogonal Rectangles

    Get PDF
    The polygon covering problem is an important class of problems in the area of computational geometry. There are slightly different versions of this problem depending on the types of polygons to be addressed. In this paper, we focus on finding an answer to a question of whether an orthogonal rectangle, or spatial query window, is fully covered by a set of orthogonal rectangles which are in smaller sizes. This problem is encountered in many application domains including object recognition/extraction/trace, spatial analyses, topological analyses, and augmented reality applications. In many real-world applications, in the cases of using traditional central computation techniques, working with real world data results in a performance bottlenecks. The work presented in this paper proposes a high performance MapReduce-based big data framework to solve the polygon covering problem in the cases of using a spatial query window and data are represented as a set of orthogonal rectangles. Orthogonal rectangular polygons are represented in the form of minimum bounding boxes. The spatial query windows are also called as range queries. The proposed spatial big data framework is evaluated in terms of horizontal scalability. In addition, efficiency and speed-up performance metrics for the proposed two algorithms are measured

    Distance Range Queries in SpatialHadoop

    Get PDF
    Efficient processing of Distance Range Queries (DRQs) is of great importance in spatial databases due to the wide area of applications. This type of spatial query is characterized by a distance range over one or two datasets. The most representative and known DRQs are the ε Distance Range Query (εDRQ) and the ε Distance Range Join Query (εDRJQ). Given the increasing volume of spatial data, it is difficult to perform a DRQ on a centralized machine efficiently. Moreover, the εDRJQ is an expensive spatial operation, since it can be considered a combination of the εDR and the spatial join queries. For this reason, this paper addresses the problem of computing DRQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports spatial operations efficiently, and proposes new algorithms in SpatialHadoop to perform efficient parallel DRQs on large-scale spatial datasets. We have evaluated the performance of the proposed algorithms in several situations with big synthetic and real-world datasets. The experiments have demonstrated the efficiency and scalability of our proposal

    Efficient Large-scale Distance-Based Join Queries in SpatialHadoop

    Get PDF
    Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the ε Distance Join Query (εDJQ). These types of join queries are characterized by a number of desired pairs (K) or a distance threshold (ε) between the components of the pairs in the final result, over two spatial datasets. Both are expensive operations, since two spatial datasets are combined with additional constraints. Given the increasing volume of spatial data originating from multiple sources and stored in distributed servers, it is not always efficient to perform DBJQs on a centralized server. For this reason, this paper addresses the problem of computing DBJQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports efficient processing of spatial queries in a cloud-based setting. We propose novel algorithms, based on plane-sweep, to perform efficient parallel DBJQs on large-scale spatial datasets in Spatial Hadoop. We evaluate the performance of the proposed algorithms in several situations with large real-world as well as synthetic datasets. The experiments demonstrate the efficiency and scalability of our proposed methodologies
    • …
    corecore