873 research outputs found

    Knowledge discovery from trajectories

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesAs a newly proliferating study area, knowledge discovery from trajectories has attracted more and more researchers from different background. However, there is, until now, no theoretical framework for researchers gaining a systematic view of the researches going on. The complexity of spatial and temporal information along with their combination is producing numerous spatio-temporal patterns. In addition, it is very probable that a pattern may have different definition and mining methodology for researchers from different background, such as Geographic Information Science, Data Mining, Database, and Computational Geometry. How to systematically define these patterns, so that the whole community can make better use of previous research? This paper is trying to tackle with this challenge by three steps. First, the input trajectory data is classified; second, taxonomy of spatio-temporal patterns is developed from data mining point of view; lastly, the spatio-temporal patterns appeared on the previous publications are discussed and put into the theoretical framework. In this way, researchers can easily find needed methodology to mining specific pattern in this framework; also the algorithms needing to be developed can be identified for further research. Under the guidance of this framework, an application to a real data set from Starkey Project is performed. Two questions are answers by applying data mining algorithms. First is where the elks would like to stay in the whole range, and the second is whether there are corridors among these regions of interest

    Co-movement Pattern Mining from Videos

    Full text link
    Co-movement pattern mining from GPS trajectories has been an intriguing subject in spatial-temporal data mining. In this paper, we extend this research line by migrating the data source from GPS sensors to surveillance cameras, and presenting the first investigation into co-movement pattern mining from videos. We formulate the new problem, re-define the spatial-temporal proximity constraints from cameras deployed in a road network, and theoretically prove its hardness. Due to the lack of readily applicable solutions, we adapt existing techniques and propose two competitive baselines using Apriori-based enumerator and CMC algorithm, respectively. As the principal technical contributions, we introduce a novel index called temporal-cluster suffix tree (TCS-tree), which performs two-level temporal clustering within each camera and constructs a suffix tree from the resulting clusters. Moreover, we present a sequence-ahead pruning framework based on TCS-tree, which allows for the simultaneous leverage of all pattern constraints to filter candidate paths. Finally, to reduce verification cost on the candidate paths, we propose a sliding-window based co-movement pattern enumeration strategy and a hashing-based dominance eliminator, both of which are effective in avoiding redundant operations. We conduct extensive experiments for scalability and effectiveness analysis. Our results validate the efficiency of the proposed index and mining algorithm, which runs remarkably faster than the two baseline methods. Additionally, we construct a video database with 1169 cameras and perform an end-to-end pipeline analysis to study the performance gap between GPS-driven and video-driven methods. Our results demonstrate that the derived patterns from the video-driven approach are similar to those derived from groundtruth trajectories, providing evidence of its effectiveness

    NEW METHODS FOR MINING SEQUENTIAL AND TIME SERIES DATA

    Get PDF
    Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-temporal (ST) and time series data (TSD) is the rationale for developing specialized techniques to excavate such data. In spatial data mining, the spatial co-location rule problem is different from the association rule problem, since there is no natural notion of transactions in spatial datasets that are embedded in continuous geographic space. Therefore, we have proposed an efficient algorithm (GridClique) to mine interesting spatial co-location patterns (maximal cliques). These patterns are used as the raw transactions for an association rule mining technique to discover complex co-location rules. Our proposal includes certain types of complex relationships – especially negative relationships – in the patterns. The relationships can be obtained from only the maximal clique patterns, which have never been used until now. Our approach is applied on a well-known astronomy dataset obtained from the Sloan Digital Sky Survey (SDSS). ST data is continuously collected and made accessible in the public domain. We present an approach to mine and query large ST data with the aim of finding interesting patterns and understanding the underlying process of data generation. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a predefined time. One approach to processing a “flock query” is to map ST data into high-dimensional space and to reduce the query to a sequence of standard range queries that can be answered using a spatial indexing structure; however, the performance of spatial indexing structures rapidly deteriorates in high-dimensional space. This thesis sets out a preprocessing strategy that uses a random projection to reduce the dimensionality of the transformed space. We use probabilistic arguments to prove the accuracy of the projection and to present experimental results that show the possibility of managing the curse of dimensionality in a ST setting by combining random projections with traditional data structures. In time series data mining, we devised a new space-efficient algorithm (SparseDTW) to compute the dynamic time warping (DTW) distance between two time series, which always yields the optimal result. This is in contrast to other approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between the time series: the more the similarity between the time series, the less space required to compute the DTW between them. Other techniques for speeding up DTW, impose a priori constraints and do not exploit similarity characteristics that may be present in the data. Our experiments demonstrate that SparseDTW outperforms these approaches. We discover an interesting pattern by applying SparseDTW algorithm: “pairs trading” in a large stock-market dataset, of the index daily prices from the Australian stock exchange (ASX) from 1980 to 2002

    Developing new approaches for the analysis of movement data : a sport-oriented application

    Get PDF

    Probabilistic Model To Identify Movement Patterns In Geospatial Data

    Get PDF
    The task of trying to determine the movement pattern of objects based on available databases is a daunting one. Tracking the movement of these dynamic objects is important in different areas to understand the higher order patterns of movement that carry special meaning for a target application. However this is still a largely unsolved problem and recent work has focused on the relationships of moving point objects with stationary objects or landmarks on a map. Global Position System (GPS) is a widely used satellite-based navigation system. Popular use of these devices has produced large collections of data, some of which have been archived. These archived data sets and sometimes real time GPS data are now readily available over the internet and their analysis through computational methods can generate meaningful insights. These insights when applied appropriately can be used in everyday life. The purpose of this research is to make the case that automated analysis can provide insight that can otherwise be difficult to achieve due to the large volume and noisy characteristics of GPS data. We present experiments that have been performed on one of these archived databases which contain GPS traces of 536 yellow cabs in the San Francisco Bay area. Using data analysis, we determine the most visited tourist destinations within the San Francisco Bay area during the time period of the captured data. We also propose a probabilistic framework, which determines the probability of a new routing pattern using previous patterns. We use simulated routing patterns built on the same data format as that of the San Francisco cab data to predict the possible routes to be taken by a vehicle. All the probability calculations performed are done using Bayes’ theorem of conditional probability formula

    Probabilistic Model To Identify Movement Patterns In Geospatial Data

    Get PDF
    The task of trying to determine the movement pattern of objects based on available databases is a daunting one. Tracking the movement of these dynamic objects is important in different areas to understand the higher order patterns of movement that carry special meaning for a target application. However this is still a largely unsolved problem and recent work has focused on the relationships of moving point objects with stationary objects or landmarks on a map. Global Position System (GPS) is a widely used satellite-based navigation system. Popular use of these devices has produced large collections of data, some of which have been archived. These archived data sets and sometimes real time GPS data are now readily available over the internet and their analysis through computational methods can generate meaningful insights. These insights when applied appropriately can be used in everyday life. The purpose of this research is to make the case that automated analysis can provide insight that can otherwise be difficult to achieve due to the large volume and noisy characteristics of GPS data. We present experiments that have been performed on one of these archived databases which contain GPS traces of 536 yellow cabs in the San Francisco Bay area. Using data analysis, we determine the most visited tourist destinations within the San Francisco Bay area during the time period of the captured data. We also propose a probabilistic framework, which determines the probability of a new routing pattern using previous patterns. We use simulated routing patterns built on the same data format as that of the San Francisco cab data to predict the possible routes to be taken by a vehicle. All the probability calculations performed are done using Bayes’ theorem of conditional probability formula

    Reporting flock patterns

    Get PDF
    Data representing moving objects is rapidly getting more available, especially in the area of wildlife GPS tracking. It is a central belief that information is hidden in large data sets in the form of interesting patterns. One of the most common spatio-temporal patterns sought after is flocks. A flock is a large enough subset of objects moving along paths close to each other for a certain pre-defined time. We give a new definition that we argue is more realistic than the previous ones, and by the use of techniques from computational geometry we present fast algorithms to detect and report flocks. The algorithms are analysed both theoretically and experimentally
    • …
    corecore