99 research outputs found

    A Spatio-Temporal Framework for Managing Archeological Data

    Get PDF
    Space and time are two important characteristics of data in many domains. This is particularly true in the archaeological context where informa- tion concerning the discovery location of objects allows one to derive important relations between findings of a specific survey or even of different surveys, and time aspects extend from the excavation time, to the dating of archaeological objects. In recent years, several attempts have been performed to develop a spatio-temporal information system tailored for archaeological data. The first aim of this paper is to propose a model, called Star, for repre- senting spatio-temporal data in archaeology. In particular, since in this domain dates are often subjective, estimated and imprecise, Star has to incorporate such vague representation by using fuzzy dates and fuzzy relationships among them. Moreover, besides to the topological relations, another kind of spatial relations is particularly useful in archeology: the stratigraphic ones. There- fore, this paper defines a set of rules for deriving temporal knowledge from the topological and stratigraphic relations existing between two findings. Finally, considering the process through which objects are usually manually dated by archeologists, some existing automatic reasoning techniques may be success- fully applied to guide such process. For this purpose, the last contribution regards the translation of archaeological temporal data into a Fuzzy Temporal Constraint Network for checking the overall data consistency and reducing the vagueness of some dates based on their relationships with other ones

    A framework for integrating multi-accuracy spatial data in geographical applications

    Get PDF
    In recent years the integration of spatial data coming from different sources has become a crucial issue for many geographical applications, especially in the process of building and maintaining a Spatial Data Infrastructure (SDI). In such context new methodologies are necessary in order to acquire and update spatial datasets by collecting new measurements from different sources. The traditionalapproach implemented in GIS systems for updating spatial data does not usually consider the accuracy of these data, but just replaces the old geometries with the new ones. The application of such approach in the case of an SDI, where continuous and incremental updates occur, will lead very soon to an inconsistent spatial dataset withrespect to spatial relations and relative distances among objects. This paper addresses such problem and proposes a framework for representing multi-accuracy spatial databases, based on a statistical representation of the objects geometry, together with a method for the incremental and consistent update of the objects, that applies acustomized version of the Kalman filter.Moreover, the framework considers also the spatial relations among objects, since they represent a particular kind of observation that could be derived from geometries or be observed independently in the real world. Spatial relations among objects need also to be compared in spatial dataintegration and we show that they are necessary in order to obtain a correct result in merging objects geometries

    A Balanced Solution for the Partition-based Spatial Merge join in MapReduce

    Get PDF
    Several MapReduce frameworks have been developed in recent years in order to cope with the need to process an increasing amount of data. Moreover, some extensions of them have been proposed to deal with particular kind of information, like the spatial one. In this paper we will refer to SpatialHadoop, a spatial extension of Apache Hadoop which provides a rich set of spatial data types and operations. In the geo-spatial domain, spatial join is considered a fundamental operation for performing data analysis. However, the join operation is generally classified as a critical task to be performed in MapReduce, since it requires to process two datasets at time. Several different solutions have been proposed in literature for efficiently performing a spatial join which may or may not require the presence of a spatial index computed on both datasets or only one of them. As already discussed in literature, the efficiency of such operation depends on the ability to both prune unnecessary data as soon as possible and to provide a balanced amount of work to be done by each parallelly executed task. In this paper,we take a step forward in this direction by proposing an evolution of the Partition-based Spatial Merge Join algorithm which tries to completely exploit the benefit of the parallelism induced by the MapReduce framework. In particular, we concentrate on the partition phase which has to produce filtered balanced and meaningful subdivisions of the original datasets

    Skewness-Based Partitioning in SpatialHadoop

    Get PDF
    In recent years, several extensions of the Hadoop system have been proposed for dealing with spatial data. SpatialHadoop belongs to this group of projects and includes some MapReduce implementations of spatial operators, like range queries and spatial join. the MapReduce paradigm is based on the fundamental principle that a task can be parallelized by partitioning data into chunks and performing the same operation on them, (map phase), eventually combining the partial results at the end (reduce phase). Thus, the applied partitioning technique can tremendously affect the performance of a parallel execution, since it is the key point for obtaining balanced map tasks and exploiting the parallelism as much as possible. When uniformly distributed datasets are considered, this goal can be easily obtained by using a regular grid covering the whole reference space for partitioning the geometries of the input dataset; conversely, with skewed distributed datasets, this might not be the right choice and other techniques have to be applied. for instance, SpatialHadoop can produce a global index also by means of a Quadtree-based grid or an Rtree-based grid, which in turn are more expensive index structures to build. This paper proposes a technique based on both a box counting function and a heuristic, rooted on theoretical properties and experimental observations, for detecting the degree of skewness of an input spatial dataset and then deciding which partitioning technique to apply in order to improve as much as possible the performance of subsequent operations. Experiments on both synthetic and real datasets are presented to confirm the effectiveness of the proposed approach

    Tracking Data Provenance of Archaeological Temporal Information in Presence of Uncertainty

    Get PDF
    The interpretation process is one of the main tasks performed by archaeologists who, starting from ground data about evidences and findings, incrementally derive knowledge about ancient objects or events. Very often more than one archaeologist contributes in different time instants to discover details about the same finding and thus, it is important to keep track of history and provenance of the overall knowledge discovery process. To this aim, we propose a model and a set of derivation rules for tracking and refining data provenance during the archaeological interpretation process. In particular, among all the possible interpretation activities, we concentrate on the one concerning the dating that archaeologists perform to assign one or more time intervals to a finding to define its lifespan on the temporal axis. In this context, we propose a framework to represent and derive updated provenance data about temporal information after the mentioned derivation process. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus, we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different findings represented as a graph-based dataset. The derivation rules used to infer more precise temporal intervals are enriched to manage also provenance information and their following updates after a derivation step. A MapReduce version of the path consistency algorithm is also proposed to improve the efficiency of the refining process on big graph-based datasets

    An Interoperable Spatio-Temporal Model for Archaeological Data Based on ISO Standard 19100

    Get PDF
    Archaeological data are characterized by both spatial and temporal dimensions that are often related to each other and are of particular interest during the interpretation process. For this reason, several attempts have been performed in recent years in order to develop a GIS tailored for archaeological data. However, despite the increasing use of information technologies in the archaeological domain, the actual situation is that any agency or research group independently develops its own local database and management application which is isolated from the others. Conversely, the sharing of information and the cooperation between different archaeological agencies or research groups can be particularly useful in order to support the interpretation process by using data discovered in similar situations w.r.t. spatio-temporal or thematic aspects. In the geographical domain, the INSPIRE initiative of European Union tries to support the development of a Spatial Data Infrastructure (SDI) through which several organizations, like public bodies or private companies, with overlapping goals can share data, resources, tools and competencies in an effective way. The aim of this paper is to lay the basis for the development of an Archaeological SDI starting from the experience acquired during the collaboration among several Italian organizations. In particular, the paper proposes a spatio-temporal conceptual model for archaeological data based on the ISO Standards of the 19100 family and promotes the use of the GeoUML methodology in order to put into practice such interoperability. The GeoUML methodology and tools have been enhanced in order to suite the archaeological domain and to automatically produce several useful documents, configuration files and codebase starting from the conceptual specification. The applicability of the spatio-temporal conceptual model and the usefulness of the produced tools have been tested in three different Italian contexts: Rome, Verona and Isola della Scala

    Distributing Tourists Among POIs with an Adaptive Trip Recommendation System

    Get PDF
    Traveling is part of many people leisure activities and an increasing fraction of the economy comes from the tourism. Given a destination, the information about the different attractions, or points of interest (POIs), can be found on many sources. Among these attractions, finding the ones that could be of interest for a specific user represents a challenging task. Travel recommendation systems deal with this type of problems. Most of the solution in the literature does not take into account the impact of the suggestions on the level of crowding of POIs. This paper considers the trip planning problem focusing on user balancing among the different POIs. To this aim, we consider the effects of the previous recommendations, as well as estimates based on historical data, while devising a new recommendation. The problem is formulated as a multi-objective optimization problem, and a recommendation engine has been designed and implemented for exploring the solution space in near real-time, through a distributed version of the Simulated Annealing approach. We test our solution using a real dataset of users visiting the POIs of a touristic city, and we show that we are able to provide high quality recommendations, yet maintaining the attractions not overcrowded

    Tracking social provenance in chains of retweets

    Get PDF
    In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper

    Establishing Robustness of a Spatial Dataset in a Tolerance-Based Vector Model

    Get PDF
    Spatial data are usually described through a vector model in which geometries are rep- resented by a set of coordinates embedded into an Euclidean space. The use of a finite representation, instead of the real numbers theoretically required, causes many robustness problems which are well-known in literature. Such problems are made even worst in a distributed context, where data is exchanged between different systems and several perturbations can be introduced in the data representation. In this context, a spatial dataset is said to be robust if the evaluation of the spatial relations existing among its objects can be performed in different systems, producing always the same result.In order to discuss the robustness of a spatial dataset, two implementation models have to be distinguished, since they determine different ways to evaluate the relations existing among geometric objects: the identity and the tolerance model. The robustness of a dataset in the identity model has been widely discussed in [Belussi et al., 2012, Belussi et al., 2013, Belussi et al., 2015a] and some algorithms of the Snap Rounding (SR) family [Hobby, 1999, Halperin and Packer, 2002, Packer, 2008, Belussi et al., 2015b] can be successfully applied in such context. Conversely, this problem has been less explored in the tolerance model. The aim of this paper is to propose an algorithm inspired by the ones of SR family for establishing or restoring the robustness of a vector dataset in the tolerance model. The main ideas are to introduce an additional operation which spreads instead of snapping geometries, in order to preserve the original relation between them, and to use a tolerance region for such operation instead of a single snapping location. Finally, some experiments on real-world datasets are presented, which confirms how the proposed algorithm can establish the robustness of a dataset

    Using Deep Learning for Big Spatial Data Partitioning

    Get PDF
    This article explores the use of deep learning to choose an appropriate spatial partitioning technique for big data. The exponential increase in the volumes of spatial datasets resulted in the development of big spatial data frameworks. These systems need to partition the data across machines to be able to scale out the computation. Unfortunately, there is no current method to automatically choose an appropriate partitioning technique based on the input data distribution. This article addresses this problem by using deep learning to train a model that captures the relationship between the data distribution and the quality of the partitioning techniques.We propose a solution that runs in two phases, training and application. The offline training phase generates synthetic data based on diverse distributions, partitions them using six different partitioning techniques, and measures their quality using four quality metrics. At the same time, it summarizes the datasets using a histogram and well-designed skewness measures. The data summaries and the quality metrics are then use to train a deep learning model. The second phase uses this model to predict the best partitioning technique given a new dataset that needs to be partitioned.We run an extensive experimental evaluation on big spatial data, andwe experimentally showthe applicability of the proposed technique.We showthat the proposed model outperforms the baseline method in terms of accuracy for choosing the best partitioning technique by only analyzing the summary of the datasets
    • …
    corecore