Search CORE

24 research outputs found

What makes spatial data big? A discussion on how to partition spatial data

Author: Belussi A.
Carra D.
Migliorini S.
Negri M.
Pelagatti G.
Publication venue: place:Dagstuhl
Publication date: 01/01/2018
Field of study

The amount of available spatial data has significantly increased in the last years so that traditional analysis tools have become inappropriate to effectively manage them. Therefore, many attempts have been made in order to define extensions of existing MapReduce tools, such as Hadoop or Spark, with spatial capabilities in terms of data types and algorithms. Such extensions are mainly based on the partitioning techniques implemented for textual data where the dimension is given in terms of the number of occupied bytes. However, spatial data are characterized by other features which describe their dimension, such as the number of vertices or the MBR size of geometries, which greatly affect the performance of operations, like the spatial join, during data analysis. The result is that the use of traditional partitioning techniques prevents to completely exploit the benefit of the parallel execution provided by a MapReduce environment. This paper extensively analyses the problem considering the spatial join operation as use case, performing both a theoretical and an experimental analysis for it. Moreover, it provides a solution based on a different partitioning technique, which splits complex or extensive geometries. Finally, we validate the proposed solution by means of some experiments on synthetic and real datasets

Archivio istituzionale della ricerca - Politecnico di Milano

Catalogo dei prodotti della ricerca

Dagstuhl Research Online Publication Server

A context-based approach for partitioning big data

Author: Belussi A.
Carra D.
Migliorini S.
Quintarelli E.
Publication venue
Publication date: 01/01/2020
Field of study

In recent years, the amount of available data keeps growing at fast rate, and it is therefore crucial to be able to process them in an efficient way. The level of parallelism in tools such as Hadoop or Spark is determined, among other things, by the partitioning applied to the dataset. A common method is to split the data into chunks considering the number of bytes. While this approach may work well for text-based batch processing, there are a number of cases where the dataset contains structured information, such as the time or the spatial coordinates, and one may be interested in exploiting such a structure to improve the partitioning. This could have an impact on the processing time and increase the overall resource usage efficiency. This paper explores an approach based on the notion of context, such as temporal or spatial information, for partitioning the data. We design a context-based multi-dimensional partitioning technique that divides an n 12dimensional space into splits by considering the distribution of the each contextual dimension in the dataset. We tested our approach on a dataset from a touristic scenario, and our experiments show that we are able to improve the efficiency of the resource usage

Catalogo dei prodotti della ricerca

A MapReduce-Based Big Spatial Data Framework for Solving the Problem of Covering a Polygon with Orthogonal Rectangles

Author: Ahmet Sayar
Süleyman Eken
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2019
Field of study

The polygon covering problem is an important class of problems in the area of computational geometry. There are slightly different versions of this problem depending on the types of polygons to be addressed. In this paper, we focus on finding an answer to a question of whether an orthogonal rectangle, or spatial query window, is fully covered by a set of orthogonal rectangles which are in smaller sizes. This problem is encountered in many application domains including object recognition/extraction/trace, spatial analyses, topological analyses, and augmented reality applications. In many real-world applications, in the cases of using traditional central computation techniques, working with real world data results in a performance bottlenecks. The work presented in this paper proposes a high performance MapReduce-based big data framework to solve the polygon covering problem in the cases of using a spatial query window and data are represented as a set of orthogonal rectangles. Orthogonal rectangular polygons are represented in the form of minimum bounding boxes. The spatial query windows are also called as range queries. The proposed spatial big data framework is evaluated in terms of horizontal scalability. In addition, efficiency and speed-up performance metrics for the proposed two algorithms are measured

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Distributing Tourists Among POIs with an Adaptive Trip Recommendation System

Author: Belussi Alberto
Carra Damiano
Migliorini Sara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Traveling is part of many people leisure activities and an increasing fraction of the economy comes from the tourism. Given a destination, the information about the different attractions, or points of interest (POIs), can be found on many sources. Among these attractions, finding the ones that could be of interest for a specific user represents a challenging task. Travel recommendation systems deal with this type of problems. Most of the solution in the literature does not take into account the impact of the suggestions on the level of crowding of POIs. This paper considers the trip planning problem focusing on user balancing among the different POIs. To this aim, we consider the effects of the previous recommendations, as well as estimates based on historical data, while devising a new recommendation. The problem is formulated as a multi-objective optimization problem, and a recommendation engine has been designed and implemented for exploring the solution space in near real-time, through a distributed version of the Simulated Annealing approach. We test our solution using a real dataset of users visiting the POIs of a touristic city, and we show that we are able to provide high quality recommendations, yet maintaining the attractions not overcrowded

Catalogo dei prodotti della ricerca

CoPart: a context-based partitioning technique for big data

Author: Belussi Alberto
Carra Damiano
Migliorini Sara
Quintarelli Elisa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called COPART, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times

Catalogo dei prodotti della ricerca

Big Data Computing for Geospatial Applications

Author
Publication venue: 'MDPI AG'
Publication date: 01/05/2021
Field of study

The convergence of big data and geospatial computing has brought forth challenges and opportunities to Geographic Information Science with regard to geospatial data management, processing, analysis, modeling, and visualization. This book highlights recent advancements in integrating new computing approaches, spatial methods, and data management strategies to tackle geospatial big data challenges and meanwhile demonstrates opportunities for using big data for geospatial applications. Crucial to the advancements highlighted in this book is the integration of computational thinking and spatial thinking and the transformation of abstract ideas and models to concrete data structures and algorithms

Directory of Open Access Books (DOAB)

A template-based approach for the specification of 3D topological constraints

Author: Belussi Alberto
Migliorini Sara
Negri Mauro
Pelagatti Giuseppe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Several different models have been defined in literature for the definition of 3D scenes that include a geometrical representation of objects together with a semantical classification of them. Such semantical characterization encapsulates important details about the object properties and behavior and often includes spatial relations that are defined only implicitly or through natural language, such as \u201can external access shall be in touch with the building only when it is classified as a direct access\u201d. The problem of ensuring the coherence between geometric and semantic information is well known in literature. Many attempts exist which try to extent the OCL to allow the representation of spatial integrity constraints in an UML model. However, this approach requires a deep knowledge of the OCL formalism and the implementation of ad-hoc procedures to validate the constraints specified at conceptual level. Therefore, a new approach is needed that helps designers to define complex OCL constraints and at the same time allows the automatic generation of the code to test them on a given dataset. The aim of this paper is to propose a set of predefined templates to express on the classes of an UML data model, a family of 3D spatial integrity constraints based on topological relations; all this without requiring the knowledge of any formal language by domain experts and supporting their automatic translation into validation procedures

Catalogo dei prodotti della ricerca

Adaptive Trip Recommendation System

Author: Belussi Alberto
Carra Damiano
Migliorini Sara
Publication venue: Dipartimento di Informatica
Publication date: 01/01/2018
Field of study

Travel recommendation systems provide suggestions to the users based on di erent information, such as user preferences, needs, or constraints. The recommendation may also take into account some characteristics of the points of interest (POIs) to be visited, such as the opening hours, or the peak hours. Although a number of studies have been proposed on the topic, most of them tailor the recommendation considering the user viewpoint, without evaluating the impact of the suggestions on the system as a whole. This may lead to oscillatory dynamics, where the choices made by the recommendation system generate new peak hours. This paper considers the trip planning problem that takes into account the balancing of users among the di erent POIs. To this aim, we consider the estimate of the level of crowding at POIs, including both the historical data and the e ects of the recommendation. We formulate the problem as a multi- objective optimization problem, and we design a recommendation engine that explores the solution space in near real-time, through a distributed version of the Simulated Annealing approach. Through an experimental evaluation on a real dataset of users visiting the POIs of a touristic city, we show that our solution is able to provide high quality recommendations, yet maintaining the attractions not overcrowded

Catalogo dei prodotti della ricerca

Recommended from our members

A Paradigm for Scalable, Transactional, and Efficient Spatial Indexes

Author: Gao Ning
Publication venue: University of Colorado Boulder
Publication date: 01/01/2018
Field of study

With large volumes of geo-tagged data collected in various applications, spatial query pro- cessing becomes essential. Query engines depend on efficient indexes to expedite processing. There are three main challenges: scaling out to accommodate large volumes of spatial data, support- ing transactional primitives for strong consistency guarantees, and adapting to highly dynamic workloads. This thesis proposes a paradigm for scalable, transactional, and efficient spatial indexes to significantly reduce development efforts in designing and comparing multiple spatial indexes.This thesis first introduces a distributed and transactional key value store called DTranx to persist the spatial indexes. DTranx follows the SEDA architecture to exploit high concurrency in multi-core environments and it adopts a hybrid of optimistic concurrency control and two-phase commit protocols to narrow down the critical sections of distributed locking during transaction com- mits. Moreover, DTranx integrates a persistent memory based write-ahead log to reduce durability overhead and combines a garbage collection mechanism without affecting normal transactions. To maintain high throughput for search workloads when databases are constantly updated, snapshot transactions are introduced.Then, a paradigm is presented with a set of intuitive APIs and a Mempool runtime to re- duce development efforts. Mempool transparently synchronizes local states of data structures with DTranx and it handles two critical tasks: address translation and transparent server synchroniza- tion, of which the latter includes transaction construction and data synchronization. Furthermore, a dynamic partitioning strategy is integrated into DTranx to generate partitioning and replication plans that reduce inter-server communications and balance resource usage.Lastly, single-threaded data structures BTree and RTree are converted into distributed versions within two weeks. The BTree and RTree achieve 253.07 kops/sec and 77.83 kops/sec through- put respectively for pure search operations in a 25-server cluster

CU Scholar Institutional Repository