2 research outputs found

    A Framework for Spatio-Temporal Trajectory Data Segmentation and Query

    Get PDF
    Trajectory segmentation is a technique of dividing sequential trajectory data into segments. These segments are building blocks to various applications for big trajectory data. Hence a system framework is essential to support trajectory segment indexing, storage, and query. When the size of segments is beyond the computing capacity of a single processing node, a distributed solution is proposed. In this thesis, a distributed trajectory segmentation framework that includes a greedy-split segmentation method is created. This framework consists of distributed in-memory processing and a cluster of graph storage respectively. For fast trajectory queries, distributed spatial R-tree index of trajectory segments is applied. Using the trajectory indexes, this framework builds queries of segments from in-memory processing and from the graph storage. Based on this segmentation framework, two metrics to measure trajectory similarity and chance of collision are defined. These two metrics are further applied to identify moving groups of trajectories. This study quantitatively evaluates the effects of data partition, parallelism, and data size on the system. The study identifies the bottleneck factors at the data partition stage, and validate two mitigation solutions. The evaluation demonstrates the distributed segmentation method and the system framework scale as the growth of the workload and the size of the parallel cluster

    Streaming Data Algorithm Design for Big Trajectory Data Analysis

    Get PDF
    Trajectory streams consist of large volumes of time-stamped spatial data that are constantly generated from diverse and geographically distributed sources. Discovery of traveling patterns on trajectorystreamssuchasgatheringandcompaniesneedstoprocesseachrecordwhenitarrivesand correlatesacrossmultiplerecordsnearreal-time. Thustechniquesforhandlinghigh-speedtrajectorystreamsshouldscaleondistributedclustercomputing. Themainissuesencapsulatethreeaspects, namely a data model to represent the continuous trajectory data, the parallelism of a discovery algorithm, and end-to-end performance improvement. In this thesis, I propose two parallel discovery methods,namelysnapshotmodelandslotmodelthateachconsistsof1)amodelofpartitioningtrajectoriessampledondifferenttimeintervals;2)definitionondistancemeasurementsoftrajectories; and 3) a parallel discovery algorithm. I develop these methods in a stream processing workflow. I evaluate our solution with a public dataset on Amazon Web Services (AWS) cloud cluster. From parallelization point of view, I investigate system performance, scalability, stability and pinpoint principle operations that contribute most to the run-time cost of computation and data shuffling. I improve data locality with fine-tuned data partition and data aggregation techniques. I observe that both models can scale on a cluster of nodes as the intensity of trajectory data streams grows. Generally, snapshot model has higher throughput thus lower latency, while slot model produce more accurate trajectory discovery