10,965 research outputs found

    Modeling Individual Cyclic Variation in Human Behavior

    Full text link
    Cycles are fundamental to human health and behavior. However, modeling cycles in time series data is challenging because in most cases the cycles are not labeled or directly observed and need to be inferred from multidimensional measurements taken over time. Here, we present CyHMMs, a cyclic hidden Markov model method for detecting and modeling cycles in a collection of multidimensional heterogeneous time series data. In contrast to previous cycle modeling methods, CyHMMs deal with a number of challenges encountered in modeling real-world cycles: they can model multivariate data with discrete and continuous dimensions; they explicitly model and are robust to missing data; and they can share information across individuals to model variation both within and between individual time series. Experiments on synthetic and real-world health-tracking data demonstrate that CyHMMs infer cycle lengths more accurately than existing methods, with 58% lower error on simulated data and 63% lower error on real-world data compared to the best-performing baseline. CyHMMs can also perform functions which baselines cannot: they can model the progression of individual features/symptoms over the course of the cycle, identify the most variable features, and cluster individual time series into groups with distinct characteristics. Applying CyHMMs to two real-world health-tracking datasets -- of menstrual cycle symptoms and physical activity tracking data -- yields important insights including which symptoms to expect at each point during the cycle. We also find that people fall into several groups with distinct cycle patterns, and that these groups differ along dimensions not provided to the model. For example, by modeling missing data in the menstrual cycles dataset, we are able to discover a medically relevant group of birth control users even though information on birth control is not given to the model.Comment: Accepted at WWW 201

    DESQ: Frequent Sequence Mining with Subsequence Constraints

    Full text link
    Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this paper, we show that many subsequence constraints---including and beyond those considered in the literature---can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive "pattern expressions" to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to compressed finite state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms---although more general---are competitive to existing state-of-the-art algorithms.Comment: Long version of the paper accepted at the IEEE ICDM 2016 conferenc

    Periodic pattern mining from spatio-temporal trajectory data

    Get PDF
    Rapid development in GPS tracking techniques produces a large number of spatio-temporal trajectory data. The analysis of these data provides us with a new opportunity to discover useful behavioural patterns. Spatio-temporal periodic pattern mining is employed to find temporal regularities for interesting places. Mining periodic patterns from spatio-temporal trajectories can reveal useful, important and valuable information about people's regular and recurrent movements and behaviours. Previous studies have been proposed to extract people's regular and repeating movement behavior from spatio-temporal trajectories. These previous approaches can target three following issues, (1) long individual trajectory; (2) spatial fuzziness; and (3) temporal fuzziness. First, periodic pattern mining is different to other pattern mining, such as association rule ming and sequential pattern mining, periodic pattern mining requires a very long trajectory from an individual so that the regular period can be extracted from this long single trajectory, for example, one month or one year period. Second, spatial fuzziness shows although a moving object can regularly move along the similar route, it is impossible for it to appear at the exactly same location. For instance, Bob goes to work everyday, and although he can follow a similar path from home to his workplace, the same location cannot be repeated across different days. Third, temporal fuzziness shows that periodicity is complicated including partial time span and multiple interleaving periods. In reality, the period is partial, it is highly impossible to occur through the whole movement of the object. Alternatively, the moving object has only a few periods, such as a daily period for work, or yearly period for holidays. However, it is insufficient to find effective periodic patterns considering these three issues only. This thesis aims to develop a new framework to extract more effective, understandable and meaningful periodic patterns by taking more features of spatio-temporal trajectories into account. The first feature is trajectory sequence, GPS trajectory data is temporally ordered sequences of geolocation which can be represented as consecutive trajectory segments, where each entry in each trajectory segment is closely related to the previous sampled point (trajectory node) and the latter one, rather than being isolated. Existing approaches disregard the important sequential nature of trajectory. Furthermore, they introduce both unwanted false positive reference spots and false negative reference spots. The second feature is spatial and temporal aspects. GPS trajectory data can be presented as triple data (x; y; t), x and y represent longitude and latitude respectively whilst t shows corresponding time in this location. Obviously, spatial and temporal aspects are two key factors. Existing methods do not consider these two aspects together in periodic pattern mining. Irregular time interval is the third feature of spatio-temporal trajectory. In reality, due to weather conditions, device malfunctions, or battery issues, the trajectory data are not always regularly sampled. Existing algorithms cannot deal with this issue but instead require a computationally expensive trajectory interpolation process, or it is assumed that trajectory is with regular time interval. The fourth feature is hierarchy of space. Hierarchy is an inherent property of spatial data that can be expressed in different levels, such as a country includes many states, a shopping mall is comprised of many shops. Hierarchy of space can find more hidden and valuable periodic patterns. Existing studies do not consider this inherent property of trajectory. Hidden background semantic information is the final feature. Aspatial semantic information is one of important features in spatio-temporal data, and it is embedded into the trajectory data. If the background semantic information is considered, more meaningful, understandable and useful periodic patterns can be extracted. However, existing methods do not consider the geographical information underlying trajectories. In addition, at times we are interested in finding periodic patterns among trajectory paths rather than trajectory nodes for different applications. This means periodic patterns should be identified and detected against trajectory paths rather than trajectory nodes for some applications. Existing approaches for periodic pattern mining focus on trajectories nodes rather than paths. To sum up, the aim of this thesis is to investigate solutions to these problems in periodic pattern mining in order to extract more meaningful, understandable periodic patterns. Each of three chapters addresses a different problem and then proposes adequate solutions to problems currently not addressed in existing studies. Finally, this thesis proposes a new framework to address all problems. First, we investigated a path-based solution which can target trajectory sequence and spatio-temporal aspects. We proposed an algorithm called Traclus (spatio-temporal) which can take spatial and temporal aspects into account at the same time instead of only considering spatial aspect. The result indicated our method produced more effective periodic patterns based on trajectory paths than existing node-based methods using two real-world trajectories. In order to consider hierarchy of space, we investigated existing hierarchical clustering approaches to obtain hierarchical reference spots (trajectory paths) for periodic pattern mining. HDBSCAN is an incremental version of DBSCAN which is able to handle clusters with different densities to generate a hierarchical clustering result using the single-linkage method, and then it automatically extracts clusters from a hierarchical tree. Thus, we modified traditional clustering method DBSCAN in Traclus (spatio-temporal) to HDBSCAN for extraction of hierarchical reference spots. The result is convincing, and reveals more periodic patterns than those of existing methods. Second, we introduced a stop/move method to annotate each spatio-temporal entry with a semantic label, such as restaurant, university and hospital. This method can enrich a trajectory with background semantic information so that we can easily infer people's repeating behaviors. In addition, existing methods use interpolation to make trajectory regular and then apply Fourier transform and autocorrelation to automatically detect period for each reference spot. An increasing number of trajectory nodes leads to an exponential increase of running time. Thus, we employed Lomb-Scargle periodogram to detect period for each reference spot based on raw trajectory without requiring any interpolation method. The results showed our method outperformed existing approaches on effectiveness and efficiency based on two real datasets. For hierarchical aspect, we extended previous work to find hierarchical semantic periodic patterns by applying HDBSCAN. The results were promising. Third, we apply our methodology to a case study, which reveals many interesting medical periodic patterns. These patterns can effectively explore human movement behaviors for positive medical outcomes. To sum up, this research proposed a new framework to gradually target the problems that existing methods cannot handle. These include: how to consider trajectory sequence, how to consider spatial temporal aspects together, how to deal with trajectory with irregular time interval, how to consider hierarchy of space and how to extract semantic information behind trajectory. After addressing all these problems, the experimental results demonstrate that our method can find more understandable, meaningful and effective periodic patterns than existing approaches

    Efficiently Mining Temporal Patterns in Time Series Using Information Theory

    Get PDF

    Improved Periodicity Mining in Time Series Databases

    Get PDF
    Time series data represents information about real world phenomena and periodicity mining explores the interesting periodic behavior that is inherent in the data. Periodicity mining has numerous applications such as in weather forecasting, stock market prediction and analysis, pattern recognition, etc. Recently, the suffix tree, a powerful data structure that efficiently solves many strings related problems has been used to gather information about repeated substrings in the text and then perform periodicity mining. However, periodicity mining deals with large amounts of data which makes it difficult to perform mining in main memory due to the space constraints of the suffix tree. Thus, we first propose the use of the Compressed Suffix Tree (CST) for space efficient periodicity mining in very large datasets. Given the time-space trade-off that comes with any practical usage of the CST, we provide a comprehensive empirical analysis on the practical usage of CSTs and traditional suffix trees for periodicity mining.;Noise is an inherent part of practical time series data, and it is important to mine periods in spite of the noise. This leads to the problem of approximate periodicity mining. Existing algorithms have dealt with the noise introduced between the occurrences of the periodic pattern, but not the noise introduced in the structure of the pattern itself. We present a taxonomy for approximate periodicity and then propose an algorithm that performs periodicity mining in the presence of noise introduced simultaneously in both the structure of the pattern and between the periodic occurrences of the pattern
    • …
    corecore