685 research outputs found

    Multi-scale window specification over streaming trajectories

    Get PDF
    Enormous amounts of positional information are collected by monitoring applications in domains such as fleet management cargo transport wildlife protection etc. With the advent of modern location-based services processing such data mostly focuses on providing real-time response to a variety of user requests in continuous and scalable fashion. An important class of such queries concerns evolving trajectories that continuously trace the streaming locations of moving objects like GPS-equipped vehicles commodities with RFID\u27s people with smartphones etc. In this work we propose an advanced windowing operator that enables online incremental examination of recent motion paths at multiple resolutions for numerous point entities. When applied against incoming positions this window can abstract trajectories at coarser representations towards the past while retaining progressively finer features closer to the present. We explain the semantics of such multi-scale sliding windows through parameterized functions that reflect the sequential nature of trajectories and can effectively capture their spatiotemporal properties. Such window specification goes beyond its usual role for non-blocking processing of multiple concurrent queries. Actually it can offer concrete subsequences from each trajectory thus preserving continuity in time and contiguity in space along the respective segments. Further we suggest language extensions in order to express characteristic spatiotemporal queries using windows. Finally we discuss algorithms for nested maintenance of multi-scale windows and evaluate their efficiency against streaming positional data offering empirical evidence of their benefits to online trajectory processing

    Continuous Spatial Query Processing in Mobile Information Systems

    Get PDF
    Nowadays, many mobile applications provide location-based services that allow users to access location-related information from anywhere, whenever they desire. A moving user can issue queries to access information about moving or static objects. Continuous spatial query processing systems are used for this type of application. We propose two query processing strategies for location based services. The objectives of our strategies are to reduce: (1) the server workload, (2) the data transmission cost and (3) the query response time, for location-based services while providing an answer for a continuous region query. We compare our first strategy with a brute-force strategy and found that our strategy can significantly reduce the server workload and data transmission cost over the brute-force method. We compare our improved strategy with the original strategy and brute-force strategy. The experimental results show that the improved strategy achieves lower query response time than the original and brute-force strategy

    Towards an Efficient, Scalable Stream Query Operator Framework for Representing and Analyzing Continuous Fields

    Get PDF
    Advancements in sensor technology have made it less expensive to deploy massive numbers of sensors to observe continuous geographic phenomena at high sample rates and stream live sensor observations. This fact has raised new challenges since sensor streams have pushed the limits of traditional geo-sensor data management technology. Data Stream Engines (DSEs) provide facilities for near real-time processing of streams, however, algorithms supporting representing and analyzing Spatio-Temporal (ST) phenomena are limited. This dissertation investigates near real-time representation and analysis of continuous ST phenomena, observed by large numbers of mobile, asynchronously sampling sensors, using a DSE and proposes two novel stream query operator frameworks. First, the ST Interpolation Stream Query Operator Framework (STI-SQO framework) continuously transforms sensor streams into rasters using a novel set of stream query operators that perform ST-IDW interpolation. A key component of the STI-SQO framework is the 3D, main memory-based, ST Grid Index that enables high performance ST insertion and deletion of massive numbers of sensor observations through Isotropic Time Cell and Time Block-based partitioning. The ST Grid Index facilitates fast ST search for samples using ST shell-based neighborhood search templates, namely the Cylindrical Shell Template and Nested Shell Template. Furthermore, the framework contains the stream-based ST-IDW algorithms ST Shell and ST ak-Shell for high performance, parallel grid cell interpolation. Secondly, the proposed ST Predicate Stream Query Operator Framework (STP-SQO framework) efficiently evaluates value predicates over ST streams of ST continuous phenomena. The framework contains several stream-based predicate evaluation algorithms, including Region-Growing, Tile-based, and Phenomenon-Aware algorithms, that target predicate evaluation to regions with seed points and minimize the number of raster cells that are interpolated when evaluating value predicates. The performance of the proposed frameworks was assessed with regard to prediction accuracy of output results and runtime. The STI-SQO framework achieved a processing throughput of 250,000 observations in 2.5 s with a Normalized Root Mean Square Error under 0.19 using a 500ร—500 grid. The STP-SQO framework processed over 250,000 observations in under 0.25 s for predicate results covering less than 40% of the observation area, and the Scan Line Region Growing algorithm was consistently the fastest algorithm tested

    Semi-Lazy Learning Approach to Dynamic Spatio-Temporal Data Analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Sparse Coding for Event Tracking and Image Retrieval

    Get PDF
    Comparing regions of images is a fundamental task in both similarity based object tracking as well as retrieval of images from image datasets, where an exemplar image is used as the query. In this thesis, we focus on the task of creating a method of comparison for images produced by NASAโ€™s Solar Dynamic Observatory mission. This mission has been in operation for several years and produces almost 700 Gigabytes of data per day from the Atmospheric Imaging Assembly instrument alone. This has created a massive repository of high-quality solar images to analyze and categorize. To this end, we are concerned with the creation of image region descriptors that are selective enough to differentiate between highly similar images yet compact enough to be compared in an efficient manner, while also being indexable with current indexing technology. We produce such descriptors by pooling sparse coding vectors produced by spanning learned basis dictionaries. Various pooled vectors are used to describe regions of images in event tracking, entire image descriptors for image comparison in content based image retrieval, and as region descriptors to be used in a content based image retrieval system on the SDO AIA image pipeline

    ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ์ƒ์˜ ๋น ๋ฅธ ์ ์ง„์  ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2022. 8. ๋ฌธ๋ด‰๊ธฐ.Given the prevalence of mobile and IoT devices, continuous clustering against streaming data has become an essential tool of increasing importance for data analytics. Among many clustering approaches, density-based clustering has garnered much attention due to its unique advantage that it can detect clusters of an arbitrary shape when noise exists. However, when the clusters need to be updated continuously along with an evolving input dataset, a relatively high computational cost is required. Particularly, deleting data points from the clusters causes severe performance degradation. In this dissertation, the performance limits of the incremental density-based clustering over sliding windows are addressed. Ultimately, two algorithms, DISC and DenForest, are proposed. The first algorithm DISC is an incremental density-based clustering algorithm that efficiently produces the same clustering results as DBSCAN over sliding windows. It focuses on redundancy issues that occur when updating clusters. When multiple data points are inserted or deleted individually, surrounding data points are explored and retrieved redundantly. DISC addresses these issues and improves the performance by updating multiple points in a batch. It also presents several optimization techniques. The second algorithm DenForest is an incremental density-based clustering algorithm that primarily focuses on the deletion process. Unlike previous methods that manage clusters as a graph, DenForest manages clusters as a group of spanning trees, which contributes to very efficient deletion performance. Moreover, it provides a batch-optimized technique to improve the insertion performance. To prove the effectiveness of the two algorithms, extensive evaluations were conducted, and it is demonstrated that DISC and DenForest outperform the state-of-the-art density-based clustering algorithms significantly.๋ชจ๋ฐ”์ผ ๋ฐ IoT ์žฅ์น˜๊ฐ€ ๋„๋ฆฌ ๋ณด๊ธ‰๋จ์— ๋”ฐ๋ผ ์ŠคํŠธ๋ฆฌ๋ฐ ๋ฐ์ดํ„ฐ์ƒ์—์„œ ์ง€์†์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ ์ ์  ๋” ์ค‘์š”ํ•ด์ง€๋Š” ํ•„์ˆ˜ ๋„๊ตฌ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋งŽ์€ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๋ฐฉ๋ฒ• ์ค‘์—์„œ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์กด์žฌํ•  ๋•Œ ์ž„์˜์˜ ๋ชจ์–‘์˜ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ณ ์œ ํ•œ ์žฅ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ์ด์— ๋”ฐ๋ผ ๋งŽ์€ ๊ด€์‹ฌ์„ ๋ฐ›์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์€ ๋ณ€ํ™”ํ•˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ์…‹์— ๋”ฐ๋ผ ์ง€์†์ ์œผ๋กœ ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์—…๋ฐ์ดํŠธํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ ๋น„๊ต์  ๋†’์€ ๊ณ„์‚ฐ ๋น„์šฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ํด๋Ÿฌ์Šคํ„ฐ์—์„œ์˜ ๋ฐ์ดํ„ฐ ์ ๋“ค์˜ ์‚ญ์ œ๋Š” ์‹ฌ๊ฐํ•œ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ดˆ๋ž˜ํ•ฉ๋‹ˆ๋‹ค. ๋ณธ ๋ฐ•์‚ฌ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ์ƒ์˜ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์˜ ์„ฑ๋Šฅ ํ•œ๊ณ„๋ฅผ ๋‹ค๋ฃจ๋ฉฐ ๊ถ๊ทน์ ์œผ๋กœ ๋‘ ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ DISC๋Š” ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ์ƒ์—์„œ DBSCAN๊ณผ ๋™์ผํ•œ ํด๋Ÿฌ์Šคํ„ฐ๋ง ๊ฒฐ๊ณผ๋ฅผ ์ฐพ๋Š” ์ ์ง„์  ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํด๋Ÿฌ์Šคํ„ฐ ์—…๋ฐ์ดํŠธ ์‹œ์— ๋ฐœ์ƒํ•˜๋Š” ์ค‘๋ณต ๋ฌธ์ œ๋“ค์— ์ดˆ์ ์„ ๋‘ก๋‹ˆ๋‹ค. ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง์—์„œ๋Š” ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์ ๋“ค์„ ๊ฐœ๋ณ„์ ์œผ๋กœ ์‚ฝ์ž… ํ˜น์€ ์‚ญ์ œํ•  ๋•Œ ์ฃผ๋ณ€ ์ ๋“ค์„ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ์ค‘๋ณต์ ์œผ๋กœ ํƒ์ƒ‰ํ•˜๊ณ  ํšŒ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. DISC ๋Š” ๋ฐฐ์น˜ ์—…๋ฐ์ดํŠธ๋กœ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ ์—ฌ๋Ÿฌ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋“ค์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ DenForest ๋Š” ์‚ญ์ œ ๊ณผ์ •์— ์ดˆ์ ์„ ๋‘” ์ ์ง„์  ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ๊ทธ๋ž˜ํ”„๋กœ ๊ด€๋ฆฌํ•˜๋Š” ์ด์ „ ๋ฐฉ๋ฒ•๋“ค๊ณผ ๋‹ฌ๋ฆฌ DenForest ๋Š” ํด๋Ÿฌ์Šคํ„ฐ๋ฅผ ์‹ ์žฅ ํŠธ๋ฆฌ์˜ ๊ทธ๋ฃน์œผ๋กœ ๊ด€๋ฆฌํ•จ์œผ๋กœ์จ ํšจ์œจ์ ์ธ ์‚ญ์ œ ์„ฑ๋Šฅ์— ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค. ๋‚˜์•„๊ฐ€ ๋ฐฐ์น˜ ์ตœ์ ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์‚ฝ์ž… ์„ฑ๋Šฅ ํ–ฅ์ƒ์—๋„ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค. ๋‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•˜๊ธฐ ์œ„ํ•ด ๊ด‘๋ฒ”์œ„ํ•œ ํ‰๊ฐ€๋ฅผ ์ˆ˜ํ–‰ํ•˜์˜€์œผ๋ฉฐ DISC ๋ฐ DenForest ๋Š” ์ตœ์‹ ์˜ ๋ฐ€๋„ ๊ธฐ๋ฐ˜ ํด๋Ÿฌ์Šคํ„ฐ๋ง ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค๋ณด๋‹ค ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.1 Introduction 1 1.1 Overview of Dissertation 3 2 Related Works 7 2.1 Clustering 7 2.2 Density-Based Clustering for Static Datasets 8 2.2.1 Extension of DBSCAN 8 2.2.2 Approximation of Density-Based Clustering 9 2.2.3 Parallelization of Density-Based Clustering 10 2.3 Incremental Density-Based Clustering 10 2.3.1 Approximated Density-Based Clustering for Dynamic Datasets 11 2.4 Density-Based Clustering for Data Streams 11 2.4.1 Micro-clusters 12 2.4.2 Density-Based Clustering in Damped Window Model 12 2.4.3 Density-Based Clustering in Sliding Window Model 13 2.5 Non-Density-Based Clustering 14 2.5.1 Partitional Clustering and Hierarchical Clustering 14 2.5.2 Distribution-Based Clustering 15 2.5.3 High-Dimensional Data Clustering 15 2.5.4 Spectral Clustering 16 3 Background 17 3.1 DBSCAN 17 3.1.1 Reformulation of Density-Based Clustering 19 3.2 Incremental DBSCAN 20 3.3 Sliding Windows 22 3.3.1 Density-Based Clustering over Sliding Windows 23 3.3.2 Slow Deletion Problem 24 4 Avoiding Redundant Searches in Updating Clusters 26 4.1 The DISC Algorithm 27 4.1.1 Overview of DISC 27 4.1.2 COLLECT 29 4.1.3 CLUSTER 30 4.1.3.1 Splitting a Cluster 32 4.1.3.2 Merging Clusters 37 4.1.4 Horizontal Manner vs. Vertical Manner 38 4.2 Checking Reachability 39 4.2.1 Multi-Starter BFS 40 4.2.2 Epoch-Based Probing of R-tree Index 41 4.3 Updating Labels 43 5 Avoiding Graph Traversals in Updating Clusters 45 5.1 The DenForest Algorithm 46 5.1.1 Overview of DenForest 47 5.1.1.1 Supported Types of the Sliding Window Model 48 5.1.2 Nostalgic Core and Density-based Clusters 49 5.1.2.1 Cluster Membership of Border 51 5.1.3 DenTree 51 5.2 Operations of DenForest 54 5.2.1 Insertion 54 5.2.1.1 MST based on Link-Cut Tree 57 5.2.1.2 Time Complexity of Insert Operation 58 5.2.2 Deletion 59 5.2.2.1 Time Complexity of Delete Operation 61 5.2.3 Insertion/Deletion Examples 64 5.2.4 Cluster Membership 65 5.2.5 Batch-Optimized Update 65 5.3 Clustering Quality of DenForest 68 5.3.1 Clustering Quality for Static Data 68 5.3.2 Discussion 70 5.3.3 Replaceability 70 5.3.3.1 Nostalgic Cores and Density 71 5.3.3.2 Nostalgic Cores and Quality 72 5.3.4 1D Example 74 6 Evaluation 76 6.1 Real-World Datasets 76 6.2 Competing Methods 77 6.2.1 Exact Methods 77 6.2.2 Non-Exact Methods 77 6.3 Experimental Settings 78 6.4 Evaluation of DISC 78 6.4.1 Parameters 79 6.4.2 Baseline Evaluation 79 6.4.3 Drilled-Down Evaluation 82 6.4.3.1 Effects of Threshold Values 82 6.4.3.2 Insertions vs. Deletions 83 6.4.3.3 Range Searches 84 6.4.3.4 MS-BFS and Epoch-Based Probing 85 6.4.4 Comparison with Summarization/Approximation-Based Methods 86 6.5 Evaluation of DenForest 90 6.5.1 Parameters 90 6.5.2 Baseline Evaluation 91 6.5.3 Drilled-Down Evaluation 94 6.5.3.1 Varying Size of Window/Stride 94 6.5.3.2 Effect of Density and Distance Thresholds 95 6.5.3.3 Memory Usage 98 6.5.3.4 Clustering Quality over Sliding Windows 98 6.5.3.5 Clustering Quality under Various Density and Distance Thresholds 101 6.5.3.6 Relaxed Parameter Settings 102 6.5.4 Comparison with Summarization-Based Methods 102 7 Future Work: Extension to Varying/Relative Densities 105 8 Conclusion 107 Abstract (In Korean) 120๋ฐ•

    Sequence queries on temporal graphs

    Get PDF
    Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC

    A study of two problems in data mining: anomaly monitoring and privacy preservation.

    Get PDF
    Bu, Yingyi.Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.Includes bibliographical references (leaves 89-94).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.vChapter 1 --- Introduction --- p.1Chapter 1.1 --- Anomaly Monitoring --- p.1Chapter 1.2 --- Privacy Preservation --- p.5Chapter 1.2.1 --- Motivation --- p.7Chapter 1.2.2 --- Contribution --- p.12Chapter 2 --- Anomaly Monitoring --- p.16Chapter 2.1 --- Problem Statement --- p.16Chapter 2.2 --- A Preliminary Solution: Simple Pruning --- p.19Chapter 2.3 --- Efficient Monitoring by Local Clusters --- p.21Chapter 2.3.1 --- Incremental Local Clustering --- p.22Chapter 2.3.2 --- Batch Monitoring by Cluster Join --- p.24Chapter 2.3.3 --- Cost Analysis and Optimization --- p.28Chapter 2.4 --- Piecewise Index and Query Reschedule --- p.31Chapter 2.4.1 --- Piecewise VP-trees --- p.32Chapter 2.4.2 --- Candidate Rescheduling --- p.35Chapter 2.4.3 --- Cost Analysis --- p.36Chapter 2.5 --- Upper Bound Lemma: For Dynamic Time Warping Distance --- p.37Chapter 2.6 --- Experimental Evaluations --- p.39Chapter 2.6.1 --- Effectiveness --- p.40Chapter 2.6.2 --- Efficiency --- p.46Chapter 2.7 --- Related Work --- p.49Chapter 3 --- Privacy Preservation --- p.52Chapter 3.1 --- Problem Definition --- p.52Chapter 3.2 --- HD-Composition --- p.58Chapter 3.2.1 --- Role-based Partition --- p.59Chapter 3.2.2 --- Cohort-based Partition --- p.61Chapter 3.2.3 --- Privacy Guarantee --- p.70Chapter 3.2.4 --- Refinement of HD-composition --- p.75Chapter 3.2.5 --- Anonymization Algorithm --- p.76Chapter 3.3 --- Experiments --- p.77Chapter 3.3.1 --- Failures of Conventional Generalizations --- p.78Chapter 3.3.2 --- Evaluations of HD-Composition --- p.79Chapter 3.4 --- Related Work --- p.85Chapter 4 --- Conclusions --- p.87Bibliography --- p.8
    • โ€ฆ
    corecore