2,807 research outputs found

    Parallel trajectory similarity joins in spatial networks

    Get PDF
    2018 Springer-Verlag GmbH Germany, part of Springer Nature The matching of similar pairs of objects, called similarity join, is fundamental functionality in data management. We consider two cases of trajectory similarity joins (TS-Joins), including a threshold-based join (Tb-TS-Join) and a top-k TS-Join (k-TS-Join), where the objects are trajectories of vehicles moving in road networks. Given two sets of trajectories and a threshold (Formula presented.), the Tb-TS-Join returns all pairs of trajectories from the two sets with similarity above (Formula presented.). In contrast, the k-TS-Join does not take a threshold as a parameter, and it returns the top-k most similar trajectory pairs from the two sets. The TS-Joins target diverse applications such as trajectory near-duplicate detection, data cleaning, ridesharing recommendation, and traffic congestion prediction. With these applications in mind, we provide purposeful definitions of similarity. To enable efficient processing of the TS-Joins on large sets of trajectories, we develop search space pruning techniques and enable use of the parallel processing capabilities of modern processors. Specifically, we present a two-phase divide-and-conquer search framework that lays the foundation for the algorithms for the Tb-TS-Join and the k-TS-Join that rely on different pruning techniques to achieve efficiency. For each trajectory, the algorithms first find similar trajectories. Then they merge the results to obtain the final result. The algorithms for the two joins exploit different upper and lower bounds on the spatiotemporal trajectory similarity and different heuristic scheduling strategies for search space pruning. Their per-trajectory searches are independent of each other and can be performed in parallel, and the mergings have constant cost. An empirical study with real data offers insight in the performance of the algorithms and demonstrates that they are capable of outperforming well-designed baseline algorithms by an order of magnitude

    Parallel Trajectory-to-Location Join

    Get PDF

    An XML-based implementation of the parametric model for ad-hoc query of temporal and spatiotemporal data

    Get PDF
    The parametric model is one of the data models for dimensional data. Values in the parametric model are defined as functions. Such modeling concept helps one achieve a one-to-one correspondence between objects in the real world and records in a database. One of the important requirements is that domains of values should be closed under the set theoretic operations such as union, intersection, and complementation. Because of this, ParaSQL, a query language of the parametric model, is able to mimic natural languages more closely. In this dissertation we validate and implement the parametric model for temporal and spatiotemporal data. We also develop a preliminary prototype for the users of NC-94, an interesting dataset in agriculture;Viewing values as functions leads variable-length tuples. Potentially, such values vary in size ranging from a few bytes to gigabytes and beyond. This makes implementation of the parametric model a challenging problem. To meet the challenge, we develop an XML-based storage and deploy it in our implementation. Incidentally, XML is also used for interfacing various modules and artifacts like parse tree, expression tree, and iterators to fetch data from a disk;The NC-94 dataset, mentioned above, contains the most complete record of spatiotemporal variables that characterize the dynamics of agriculture covering the north central region in the United States. To support ad-hoc query of data in its geospatial context, a novel hybrid structure is designed and implemented. We use GML to describe geospatial information. Use of GML is a good match, because it is XML-based. More importantly, it meets the set theoretic closure requirements proposed by the parametric model;Validation and implementation methodologies introduced in this dissertation will contribute to database and GIS communities. The validation demonstrates the ease of use and efficiency of the parametric model for temporal and spatiotemporal data. This should help settle a debate in temporal database community which has continued since the mid 1980s. The findings also extend to spatial and spatiotemporal data. It is an important baby-step toward full-fledged implementation of the parametric model. We hope that this work will also help bring database and GIS communities together
    corecore