12 research outputs found

    Enhancing In-Memory Spatial Indexing with Learned Search

    Get PDF
    Spatial data is ubiquitous. Massive amounts of data are generated every day from a plethora of sources such as billions of GPS-enableddevices (e.g., cell phones, cars, and sensors), consumer-based applications (e.g., Uber and Strava), and social media platforms (e.g.,location-tagged posts on Facebook, Twitter, and Instagram). This exponential growth in spatial data has led the research communityto build systems and applications for efficient spatial data processing.In this study, we apply a recently developed machine-learned search technique for single-dimensional sorted data to spatial indexing.Specifically, we partition spatial data using six traditional spatial partitioning techniques and employ machine-learned search withineach partition to support point, range, distance, and spatial join queries. Adhering to the latest research trends, we tune the partitioningtechniques to be instance-optimized. By tuning each partitioning technique for optimal performance, we demonstrate that: (i) grid-basedindex structures outperform tree-based index structures (from 1.23× to 2.47×), (ii) learning-enhanced variants of commonly used spatialindex structures outperform their original counterparts (from 1.44× to 53.34× faster), (iii) machine-learned search within a partitionis faster than binary search by 11.79% - 39.51% when filtering on one dimension, (iv) the benefit of machine-learned search diminishesin the presence of other compute-intensive operations (e.g. scan costs in higher selectivity queries, Haversine distance computation, andpoint-in-polygon tests), and (v) index lookup is the bottleneck for tree-based structures, which could potentially be reduced by linearizingthe indexed partitions.Additional Key Words and Phrases: spatial data, indexing, machine-learning, spatial queries, geospatia

    UUKG: Unified Urban Knowledge Graph Dataset for Urban Spatiotemporal Prediction

    Full text link
    Accurate Urban SpatioTemporal Prediction (USTP) is of great importance to the development and operation of the smart city. As an emerging building block, multi-sourced urban data are usually integrated as urban knowledge graphs (UrbanKGs) to provide critical knowledge for urban spatiotemporal prediction models. However, existing UrbanKGs are often tailored for specific downstream prediction tasks and are not publicly available, which limits the potential advancement. This paper presents UUKG, the unified urban knowledge graph dataset for knowledge-enhanced urban spatiotemporal predictions. Specifically, we first construct UrbanKGs consisting of millions of triplets for two metropolises by connecting heterogeneous urban entities such as administrative boroughs, POIs, and road segments. Moreover, we conduct qualitative and quantitative analysis on constructed UrbanKGs and uncover diverse high-order structural patterns, such as hierarchies and cycles, that can be leveraged to benefit downstream USTP tasks. To validate and facilitate the use of UrbanKGs, we implement and evaluate 15 KG embedding methods on the KG completion task and integrate the learned KG embeddings into 9 spatiotemporal models for five different USTP tasks. The extensive experimental results not only provide benchmarks of knowledge-enhanced USTP models under different task settings but also highlight the potential of state-of-the-art high-order structure-aware UrbanKG embedding methods. We hope the proposed UUKG fosters research on urban knowledge graphs and broad smart city applications. The dataset and source code are available at https://github.com/usail-hkust/UUKG/.Comment: NeurIPS 2023 Track on Datasets and Benchmark

    Dynamic Time Warping Under Translation: Approximation Guided by Space-Filling Curves

    Get PDF
    The Dynamic Time Warping (DTW) distance is a popular measure of similarity for a variety of sequence data. For comparing polygonal curves π, σ in ℝ^d, it provides a robust, outlier-insensitive alternative to the Fréchet distance. However, like the Fréchet distance, the DTW distance is not invariant under translations. Can we efficiently optimize the DTW distance of π and σ under arbitrary translations, to compare the curves' shape irrespective of their absolute location? There are surprisingly few works in this direction, which may be due to its computational intricacy: For the Euclidean norm, this problem contains as a special case the geometric median problem, which provably admits no exact algebraic algorithm (that is, no algorithm using only addition, multiplication, and k-th roots). We thus investigate exact algorithms for non-Euclidean norms as well as approximation algorithms for the Euclidean norm. For the L₁ norm in ℝ^d, we provide an ????(n^{2(d+1)})-time algorithm, i.e., an exact polynomial-time algorithm for constant d. Here and below, n bounds the curves' complexities. For the Euclidean norm in ℝ², we show that a simple problem-specific insight leads to a (1+ε)-approximation in time ????(n³/ε²). We then show how to obtain a subcubic ????̃(n^{2.5}/ε²) time algorithm with significant new ideas; this time comes close to the well-known quadratic time barrier for computing DTW for fixed translations. Technically, the algorithm is obtained by speeding up repeated DTW distance estimations using a dynamic data structure for maintaining shortest paths in weighted planar digraphs. Crucially, we show how to traverse a candidate set of translations using space-filling curves in a way that incurs only few updates to the data structure. We hope that our results will facilitate the use of DTW under translation both in theory and practice, and inspire similar algorithmic approaches for related geometric optimization problems

    Knowledge-Driven Harmonization of Sensor Observations: Exploiting Linked Open Data for IoT Data Streams

    Get PDF
    The rise of the Internet of Things leads to an unprecedented number of continuous sensor observations that are available as IoT data streams. Harmonization of such observations is a labor-intensive task due to heterogeneity in format, syntax, and semantics. We aim to reduce the effort for such harmonization tasks by employing a knowledge-driven approach. To this end, we pursue the idea of exploiting the large body of formalized public knowledge represented as statements in Linked Open Data

    Mining Human Mobility Data and Social Media for Smart Ride Sharing

    Get PDF
    CAPES People living in highly-populated cities increasingly suffer an impoverishment of their quality of life due to pollution and traffic congestion problems caused by the huge number of circulating vehicles. Indeed, the reduction the number of circulating vehicles is one of the most difficult challenges in large metropolitan areas. This PhD thesis proposes a research contribution with the final objective of reducing travelling vehicles. This is done towards two different directions: on the one hand, we aim to improve the efficacy of ride sharing systems, creating a larger number of ride possibilities based on the passengers destination activities; on the other hand, we propose a social media analysis method, based on machine learning, to identify transportation demand to an event. Concerning the first research direction, we investigate a novel approach to boost ride sharing opportunities based, not only on fixed destinations, but also on alternative destinations while preserving the intended activity of the user. We observe that in many cases the activity motivating the use of a private car (e.g., going to a shopping mall) can be performed at many different locations (e.g. all the shopping malls in a given area). Our assumption is that, when there is the possibility of sharing a ride, people may accept visiting an alternative destination to fulfill their needs. Based on this idea, We thus propose Activity-Based Ride Matching (ABRM), an algorithm aimed at matching ride requests with ride offers to alternative destinations where the intended activity can still be performed. By analyzing two large mobility datasets, we found that with our approach there is an increase up to 54.69% in ride-sharing opportunities compared to a traditional fixed-destination-oriented approach. For the second research contribution, we focus on the analysis of social media for inferring the transportation demands for large events such as music festivals and sports games. In this context, we investigate the novel problem of exploiting the content of nongeotagged posts to infer users’ attendance of large events. We identified three temporal periods: before, during and after an event. We detail the features used to train the event attendance classifiers on the three temporal periods and report on experiments conducted on two large music festivals in the UK. Our classifiers attained a very high accuracy, with the highest result observed for Creamfields festival (∼91% accuracy to classify users that will participate in the event). Furthermore, we proposed an example of application of our methodology in event-related transportation. This proposed application aims to evaluate the geographic areas with a higher potential demand for transportation services to an event. Pessoas que vivem em cidades altamente populosas sofrem cada vez mais com o declínio da qualidade de vida devido à poluição e aos problemas de congestionamento causados pelo enorme número de veículos em circulação. A redução da quantidade de veículos em circulação é de fato um dos mais difíceis desafios em grandes áreas metropolitanas. A presente tese de doutorado propõe uma pesquisa com o objetivo final de reduzir o número de veículos em circulação. Tal objetivo é feito em duas diferentes direções: por um lado, pretendemos melhorar a eficácia dos sistemas de ride-sharing aumentando o número de possibilidades de caronas com base na atividade destino dos passageiros; por outro lado, propomos também um método baseado em aprendizagem de máquina e análise de mídia social para identificar demanda de transporte de um evento. Em relação à primeira contribuição da pesquisa, nós investigamos uma nova abordagem para aumentar o compartilhamento de caronas baseando-se não apenas em destinos fixos, mas também em destinos alternativos enquanto que preservando a atividade pretendida do usuário. Observamos que em muitos casos a atividade que motiva o uso de um carro particular (por exemplo ir a um shopping center) pode ser realizada em muitos locais diferentes (por exemplo todos os shoppings em uma determinada área). Nossa suposição é que, quando há a possibilidade de compartilhar uma carona, as pessoas podem aceitar visitas a destinos alternativos para satisfazer suas necessidades. Nós propomos o Activity-Based Ride Matching (ABRM), um algoritmo que visa atender às solicitações de caronas usando destinos alternativos onde a atividade pretendida pelo passageiro ainda pode ser executada. Através da análise de dois grande conjuntos de dados de mobilidade, mostramos que nossa abordagem alcança um aumento de até 54,69% nas oportunidades de caronas em comparação com abordagens tradicionais orientadas a destinos fixos. Para a segunda contribuição nos concentramos na análise de mídias sociais para inferir as demandas de transporte para grandes eventos tais como concertos musicais e eventos esportivos. Investigamos um problema que consiste em explorar o conteúdo de postagens não geolocalizadas para inferir a participação dos usuários em grandes eventos. Nós identificamos três períodos temporais: antes, durante e depois de um evento. Detalhamos as features usadas para treinar classificadores capazes de inferir a participação de usuários em um dado evento nos três períodos temporais. Os experimentos foram conduzidos usando postagens em mídias sociais referentes a dois grandes festivais de música no Reino Unido. Nossos classificadores obtiveram alta accuracy, com o maior resultado observado para o festival Creamfields (∼91% de accuracy para classificar os usuários que participarão do evento). Propusemos também uma aplicação de nosso método que visa avaliar as áreas geográficas com maior potencial de demanda por serviços de transporte para um evento. Le persone che vivono in città densamente popolate subiscono sempre più un impoverimento delle loro qualità della vita a causa dell’inquinamento e dei problemi di congestione del traffico causati dall’enorme numero di veicoli circolanti. La riduzione dei veicoli circolanti è una delle sfide più difficili nelle grandi aree metropolitane. Questa tesi di dottorato propone un contributo di ricerca con l’obiettivo finale di ridurre i numeri di veicoli in viaggio. Questo eśtato sviluppato verso due direzioni: da un lato, vogliamo migliorare l’efficacia dei sistemi di ride sharing, aumentando la possibilità di ricevere e dare passaggi in base alla attività di destinazione dei passeggeri. D’altra parte, vogliamo proporre un metodo basato sul machine learning e analisi dei social media, per identificare demanda de transporte a un evento. Per quanto riguarda il primo contributo di ricerca, abbiamo studiato un nuovo approccio per aumentare la condivisione dei passagi non solo su destinazioni fisse, ma anche su destinazioni alternative preservando l’attività prevista dall’utente. Osserviamo infatti che in molti casi l’attività che motiva l’uso di un’auto privata (ad es. andare in un centro commerciale) può essere eseguito in molti luoghi diversi (ad esempio tutti i centri commerciali in una determinata area). La nostra ipotesi è che, quando c’è la possibilità di condividere un passaggio, le persone possono accettare di visitare una destinazione alternativa per soddisfare i loro bisogni. Basato su questa idea, proponiamo Activity-Based Ride Matching (ABRM), un algoritmo che mira a soddisfare le richieste di carpool utilizzando destinazioni alternative, dove l’attività desiderata dal passeggero può ancora essere eseguita. Attraverso l’analisi di due grandi insiemi di dati di mobilità, mostriamo che il nostro approccio raggiunge un aumento fino al 54,69% nelle opportunità di condivisione di car pooling rispetto agli approcci tradizionali rivolti a destinazioni fisse. Per il secondo contributo della ricerca ci concentriamo sull’analisi dei social media per inferire le richieste di trasporto verso grandi eventi come concerti musicali e giochi sportivi. In questo contesto, indaghiamo sul nuovo problema dello sfruttamento del contenuto di non geotagged post per inferire la presenza di utenti a grandi eventi. Abbiamo identificato tre periodi temporali: prima, durante e dopo un evento. Descriviamo in dettaglio le caratteristiche utilizzate per addestrare i classificatori per inferire la partecipazione all’evento sui tre periodi temporali. Riportiamo gli esperimenti condotti su due grandi festival musicali nel Regno Unito. I nostri classificatori raggiungono uma alta accuracy, con il risultato più alto osservato per il festival Creamfields (∼91% di accuracy per classificare gli utenti che parteciperanno all’evento). Inoltre, abbiamo proposto un’applicazione della nostra metodologia che ha come scopo valutare le aree geografiche con il maggior potenziale di domanda di servizi di trasporto per un evento. Document type: Conference objec
    corecore