10 research outputs found

    Parallel Trajectory-to-Location Join

    Get PDF

    Parallel trajectory similarity joins in spatial networks

    Get PDF
    2018 Springer-Verlag GmbH Germany, part of Springer Nature The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold (Formula presented.), the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above (Formula presented.). In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude

    Clustering-Based Pre-Processing Approaches To Improve Similarity Join Techniques

    Get PDF
    Research on similarity join techniques is becoming one of the growing practical areas for study, especially with the increasing E-availability of vast amounts of digital data from more and more source systems. This research is focused on pre-processing clustering-based techniques to improve existing similarity join approaches. Identifying and extracting the same real-world entities from different data sources is still a big challenge and a significant task in the digital information era. Dissimilar extracts may indeed represent the same real-world entity because of inconsistent values and naming conventions, incorrect or missing data values, or incomplete information. Therefore discovering efficient and accurate approaches to determine the similarity of data objects or values is of theoretical as well as practical significance. Semantic problems are raised even on the concept of similarity regarding its usage and foundation. Existing similarity join approaches often have a very specific view of similarity measures and pre-defined predicates that represent a narrow focus on the context of similarity for a given scenario. The predicates have been assumed to be a group of clustering [MSW 72] related attributes on the join. To identify those entities for data integration purposes requires a broader view of similarity; for instance a number of generic similarity measures are useful in a given data integration systems. This study focused on string similarity join, namely based on the Levenshtein or edit distance and Q-gram. Proposed effective and efficient pre-processing clustering-based techniques were the focus of this study to identify clustering related predicates based on either attribute value or data value that improve existing similarity join techniques in enterprise data integration scenarios

    Pesquisa de eventos geográficos semelhantes: trajectórias de objectos em movimento

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaNos dias de hoje rara e a pessoa que não possui um aparelho de Geo-Posicionamento por Satelite(GPS), esteja este no seu automóvel ou no seu bolso, visto que os componentes necessários para o seu correcto funcionamento são agora adquiridos a um preço bastante convidativo. Devido à existência de um enorme volume de dados georreferenciados, sendo alguns deles referentes a movimentações de pessoas e/ou eventos ambientais/sociais, tais como furacões,migrações, tráfego, transportes, tornou-se necessário descobrir e aperfeiçoar processos que agilizem o tratamento eficaz e eficiente de pesquisas por semelhança neste tipo de dados de modo a se poder prever/analisar possíveis catástrofes, assim como ajudar na tomada de decisões referente a estrategias. Neste trabalho foi realizada uma avaliação das tecnicas existentes na pesquisa por semelhança de eventos geográficos, nomeadamente trajectórias. Para tal foi realizado um estudo de todas as tecnicas existentes que estão envolvidas neste tipo de pesquisa, em particular as funções de semelhança e os metodos de indexação mais relevantes utilizados nesta área de investigação. Foi realizada uma avaliação das pesquisas por semelhança em diferentes espaços metricos de trajectórias com as estruturas de dados metricas Recursive Lists of Clusters 2 (RLC2) e Metric-Tree (M-Tree). Com base nesta avaliação, foi proposto um mecanismo de indexação para armazenamento de trajectórias que agiliza a pesquisa dos k mais semelhantes num espaço metrico de trajectórias, denominado SimTraj

    New directions in the analysis of movement patterns in space and time

    Get PDF

    Parallel Processing of Top-K Trajectory Similarity Queries on Big Data Using GPUs

    Get PDF
    Through the use of location-sensing devices, it has been possible to collect very large datasets of trajectories. These datasets make it possible to issue spatio-temporal queries with which users can gather information about the characteristics of the movements of objects, derive patterns from that information, and understand the objects themselves. Among such spatio-temporal queries that can be issued is the top-K trajectory similarity query. This query finds many applications, such as bird migration analysis in ecology and trajectory sharing in social networks. However, the large volumes of the trajectory query sets and databases, along with their associated uncertainty, pose significant computational challenges. One way to address these challenges is through the use of parallel architectures like GPUs, and through the use of models that can produce accurate trajectory estimates. Nevertheless, not much research has been done to design efficient and scalable techniques to process this type of query on parallel architectures. In this dissertation, we propose a novel system to process top-K trajectory similarity queries in parallel on Big Data using GPUs that is capable of handling both certain and uncertain trajectory data. The system consists of four novel algorithms: TKSimGPU to process top-K trajectory similarity queries; Top-KaBT to reduce the size of the candidate set generated by top-K trajectory similarity query algorithms; TrajEstU to estimate the true trajectory when data uncertainty exists; and TraclusGPU to perform local trajectory clustering to aid in the preprocessing stage of TrajEstU. TKSimGPU works by iteratively processing near-join similarity queries, while Top-KaBT calculates the lower and upper bounds of the Hausdorff distance between candidate pairs, and then uses these bounds to remove spurious candidates. Top-KaBT exploits GPUs to improve TKSimGPU by ensuring load balancing across the threads, ensuring memory coalescing, and using special pruning techniques that reduce the size of the candidate set. TrajEstU splits the lifetime of an object’s trajectory into time intervals where the object’s acceleration is nearly constant. Then TrajEstU uses the local trajectory clusters to obtain the movement patterns that are prevalent in the areas where trajectories have low-sampling rates, and uses linear regression to fit a constant acceleration model to the observed positions of the moving object. Finally, TraclusGPU helps TrajEstU scalably find those local trajectory clusters that are used in the construction of trajectory models. Extensive theoretical and experimental evaluations performed on our proposed techniques showed that each of them has better performance in terms of accuracy and execution time than state-of-the-art techniques when applied to large real-life and synthetic trajectory datasets for Big Data applications
    corecore