130 research outputs found

    Tracing outliers in the dataset of Drosophila suzukii records with the Isolation Forest method

    Get PDF
    The analysis of big data is a fundamental challenge for the current and future stream of data coming from many different sources. Geospatial data is one of the sources currently less investigated. A typical example of always increasing data set is that produced by the distribution data of invasive species on the concerned territories. The dataset of Drosophila suzuki invasion sites in Europe up to 2011 was used to test a possible method to pinpoint its outliers (anomalies). Our aim was to find a method of analysis that would be able to treat large amount of data in order to produce easily readable outputs to summarize and predict the status and, possibly, the future development of a biological invasion. To do that, we aimed to identify the so called anomalies of the dataset, identified with a Python script based on the machine learning algorithm “Isolation Forest”. We used also the K-Means clustering method to partition the dataset. In our test, based on a real dataset, the Silhouette method yielded a number of clusters of 10 as the best result. The clusters were drawn on the map with a Voronoi tessellation, showing that 8 clusters were centered on industrial harbours, while the last two were in the hinterland. This fact led us to guess that: (1) the main entrance mechanisms in Europe may be the wares import fluxes through ports, occurring apparently several times; (2) the spreading into the inland may be due to road transportation of wares; (3) the outliers (anomalies) found with the isolation forest method would identify individuals or populations that tend to detach from their original cluster and hence represent indications about the lines of further spreading of the invasion. This type of analysis aims hence to identify the future direction of an invasion, rather than the center of origin as in the case of geographic profiling. Isolation Forest provides therefore complimentary results with respect to PGP. The recent records of the invasive species, mainly localized close to the outliers position, are an indication that the isolation forest method can be considered predictive and proved to be a useful method to treat large datasets of geospatial data

    Acceleration of Computational Geometry Algorithms for High Performance Computing Based Geo-Spatial Big Data Analysis

    Get PDF
    Geo-Spatial computing and data analysis is the branch of computer science that deals with real world location-based data. Computational geometry algorithms are algorithms that process geometry/shapes and is one of the pillars of geo-spatial computing. Real world map and location-based data can be huge in size and the data structures used to process them extremely big leading to huge computational costs. Furthermore, Geo-Spatial datasets are growing on all V’s (Volume, Variety, Value, etc.) and are becoming larger and more complex to process in-turn demanding more computational resources. High Performance Computing is a way to breakdown the problem in ways that it can run in parallel on big computers with massive processing power and hence reduce the computing time delivering the same results but much faster.This dissertation explores different techniques to accelerate the processing of computational geometry algorithms and geo-spatial computing like using Many-core Graphics Processing Units (GPU), Multi-core Central Processing Units (CPU), Multi-node setup with Message Passing Interface (MPI), Cache optimizations, Memory and Communication optimizations, load balancing, Algorithmic Modifications, Directive based parallelization with OpenMP or OpenACC and Vectorization with compiler intrinsic (AVX). This dissertation has applied at least one of the mentioned techniques to the following problems. Novel method to parallelize plane sweep based geometric intersection for GPU with directives is presented. Parallelization of plane sweep based Voronoi construction, parallelization of Segment tree construction, Segment tree queries and Segment tree-based operations has been presented. Spatial autocorrelation, computation of getis-ord hotspots are also presented. Acceleration performance and speedup results are presented in each corresponding chapter

    A Map-algebra-inspired Approach for Interacting With Wireless Sensor Networks, Cyber-physical Systems or Internet of Things

    Get PDF
    The typical approach for consuming data from wireless sensor networks (WSN) and Internet of Things (IoT) has been to send data back to central servers for processing and analysis. This thesis develops an alternative strategy for processing and acting on data directly in the environment referred to as Active embedded Map Algebra (AeMA). Active refers to the near real time production of data, and embedded refers to the architecture of distributed embedded sensor nodes. Network macroprogramming, a style of programming adopted for wireless sensor networks and IoT, addresses the challenges of coordinating the behavior of multiple connected devices through a high-level programming model. Several macroprogramming models have been proposed, but none to date has adopted a comprehensive spatial model. This thesis takes the unique approach of adapting the well-known Map Algebra model from Geographic Information Science to extend the functionality of WSN/IoT and the opportunities for user interaction with WSN/IoT. As an inherently spatial model, the Map Algebra-inspired metaphor supports the types of computation desired from a network of geographically dispersed WSN nodes. The AeMA data model aligns with the conceptual model of GIS layers and specific layer operations from Map Algebra. A declarative query and network tasking language, based on Map Algebra operations, provides the basis for operations and interactions. The model adds functionality to calculate and store time series and specific temporal summary-type composite objects as an extension to traditional Map Algebra. The AeMA encodes Map Algebra-inspired operations into an extensible Virtual Machine Runtime system, called MARS (Map Algebra Runtime System) that supports Map Algebra in an efficient and extensible way. Map algebra-like operations are performed in a distributed manner. Data do not leave the network but are analyzed and consumed in place. As a consequence, collected information is available in-situ to drive local actions. The conceptual model and tasking language are designed to direct nodes as active entities, able to perform some actions on their environment. This Map Algebra inspired network macroprogramming model has many potential applications for spatially deployed WSN/IoT networks. In particular the thesis notes its utility for precision agriculture applications

    Supermarket site assessment and the importance of spatial analysis data

    Get PDF
    Publicado originalmente em "Advances in Doctoral Research in Management", Volume 1, pp. 171-195, Fevereiro de 2006. ISBN 978-981-256-044-5 (Hardcover).This work is part of a dissertation that addresses the supermarket site assessment problem. We propose a 3-steps method for stores' site evaluation. (The 1st step yields the constitution of analogue groups of existent supermarkets, using a clustering procedure. On the 2nd step we use classification trees to classify new stores into specific analogue groups. Finally, on the 3rd step, we build a linear regression model to forecast new sites’ sales, based on several predictor variables, including dummy variables referred to analogue groups). In order to deal with demographic and competition data related to each supermarket, we use neighborhood delimitation techniques. Three alternative delimitation techniques and two aggregation procedures are compared. Results are evaluated based on the proportion of sales turnover variance that the alternative predictors are able to explain. (As a result, we select one aggregation procedure, although we conclude that none of the delimitation models: shortest path polygons and multiplicative weighted Voronoi diagrams, first and second order, present similar performance). Finally, we compare the relative importance of spatial data predictors in site assessment evaluation, using Dominance Analysis. As a result, the relevance of spatial analysis predictors clearly emerges being only dominated by the "trade area"

    On the use of multi-sensor digital traces to discover spatio-temporal human behavioral patterns

    Get PDF
    134 p.La tecnología ya es parte de nuestras vidas y cada vez que interactuamos con ella, ya sea en una llamada telefónica, al realizar un pago con tarjeta de crédito o nuestra actividad en redes sociales, se almacenan trazas digitales. En esta tesis nos interesan aquellas trazas digitales que también registran la geolocalización de las personas al momento de realizar sus actividades diarias. Esta información nos permite conocer cómo las personas interactúan con la ciudad, algo muy valioso en planificación urbana,gestión de tráfico, políticas publicas e incluso para tomar acciones preventivas frente a desastres naturales.Esta tesis tiene por objetivo estudiar patrones de comportamiento humano a partir de trazas digitales. Para ello se utilizan tres conjuntos de datos masivos que registran la actividad de usuarios anonimizados en cuanto a llamados telefónicos, compras en tarjetas de crédito y actividad en redes sociales (check-ins,imágenes, comentarios y tweets). Se propone una metodología que permite extraer patrones de comportamiento humano usando modelos de semántica latente, Latent Dirichlet Allocation y DynamicTopis Models. El primero para detectar patrones espaciales y el segundo para detectar patrones espaciotemporales. Adicionalmente, se propone un conjunto de métricas para contar con un métodoobjetivo de evaluación de patrones obtenidos

    Passive mobile data for studying seasonal tourism mobilities: an application in a Mediterranean Coastal destination

    Get PDF
    The article uses passive mobile data to analyse the complex mobilities that occur in a coastal region characterised by seasonal patterns of tourism activity. A large volume of data generated by mobile phone users has been selected and processed to subsequently display the information in the form of visualisations that are useful for transport and tourism research, policy, and practice. More specifically, the analysis consisted of four steps: (1) a dataset containing records for four days—two on summer days and two in winter—was selected, (2) these were aggregated spatially, temporally, and differentiating trips by local residents, national tourists, and international tourists, (3) origindestination matrices were built, and (4) graph-based visualisations were created to provide evidence on the nature of the mobilities affecting the study area. The results of our work provide new evidence of how the analysis of passive mobile data can be useful to study the effects of tourism seasonality in local mobility patterns

    An Agent-Based Variogram Modeller: Investigating Intelligent, Distributed-Component Geographical Information Systems

    Get PDF
    Geo-Information Science (GIScience) is the field of study that addresses substantive questions concerning the handling, analysis and visualisation of spatial data. Geo- Information Systems (GIS), including software, data acquisition and organisational arrangements, are the key technologies underpinning GIScience. A GIS is normally tailored to the service it is supposed to perform. However, there is often the need to do a function that might not be supported by the GIS tool being used. The normal solution in these circumstances is to go out and look for another tool that can do the service, and often an expert to use that tool. This is expensive, time consuming and certainly stressful to the geographical data analyses. On the other hand, GIS is often used in conjunction with other technologies to form a geocomputational environment. One of the complex tools in geocomputation is geostatistics. One of its functions is to provide the means to determine the extent of spatial dependencies within geographical data and processes. Spatial datasets are often large and complex. Currently Agent system are being integrated into GIS to offer flexibility and allow better data analysis. The theis will look into the current application of Agents in within the GIS community, determine if they are used to representing data, process or act a service. The thesis looks into proving the applicability of an agent-oriented paradigm as a service based GIS, having the possibility of providing greater interoperability and reducing resource requirements (human and tools). In particular, analysis was undertaken to determine the need to introduce enhanced features to agents, in order to maximise their effectiveness in GIS. This was achieved by addressing the software agent complexity in design and implementation for the GIS environment and by suggesting possible solutions to encountered problems. The software agent characteristics and features (which include the dynamic binding of plans to software agents in order to tackle the levels of complexity and range of contexts) were examined, as well as discussing current GIScience and the applications of agent technology to GIS, agents as entities, objects and processes. These concepts and their functionalities to GIS are then analysed and discussed. The extent of agent functionality, analysis of the gaps and the use these technologies to express a distributed service providing an agent-based GIS framework is then presented. Thus, a general agent-based framework for GIS and a novel agent-based architecture for a specific part of GIS, the variogram, to examine the applicability of the agent- oriented paradigm to GIS, was devised. An examination of the current mechanisms for constructing variograms, underlying processes and functions was undertaken, then these processes were embedded into a novel agent architecture for GIS. Once the successful software agent implementation had been achieved, the corresponding tool was tested and validated - internally for code errors and externally to determine its functional requirements and whether it enhances the GIS process of dealing with data. Thereafter, its compared with other known service based GIS agents and its advantages and disadvantages analysed
    corecore