573 research outputs found

    Piecewise Trend Approximation: A Ratio-Based Time Series Representation

    Get PDF
    A time series representation, piecewise trend approximation (PTA), is proposed to improve efficiency of time series data mining in high dimensional large databases. PTA represents time series in concise form while retaining main trends in original time series; the dimensionality of original data is therefore reduced, and the key features are maintained. Different from the representations that based on original data space, PTA transforms original data space into the feature space of ratio between any two consecutive data points in original time series, of which sign and magnitude indicate changing direction and degree of local trend, respectively. Based on the ratio-based feature space, segmentation is performed such that each two conjoint segments have different trends, and then the piecewise segments are approximated by the ratios between the first and last points within the segments. To validate the proposed PTA, it is compared with classical time series representations PAA and APCA on two classical datasets by applying the commonly used K-NN classification algorithm. For ControlChart dataset, PTA outperforms them by 3.55% and 2.33% higher classification accuracy and 8.94% and 7.07% higher for Mixed-BagShapes dataset, respectively. It is indicated that the proposed PTA is effective for high dimensional time series data mining

    A New Time Series Similarity Measurement Method Based on Fluctuation Features

    Get PDF
    Time series similarity measurement is one of the fundamental tasks in time series data mining, and there are many studies on time series similarity measurement methods. However, the majority of them only calculate the distance between equal-length time series, and also cannot adequately reflect the fluctuation features of time series. To solve this problem, a new time series similarity measurement method based on fluctuation features is proposed in this paper. Firstly, the fluctuation features extraction method of time series is introduced. By defining and identifying fluctuation points, the fluctuation points sequence is obtained to represent the original time series for subsequent analysis. Then, a new similarity measurement (D_SM) is put forward to calculate the distance between different fluctuation points sequences. This method can calculate the distance of unequal-length time series, and it includes two main steps: similarity matching and the distance calculation based on similarity matching. Finally, the experiments are performed on some public time series using agglomerative hierarchical clustering based on D_SM. Compared to some traditional time series similarity measurements, the clustering results show that the proposed method can effectively distinguish time series with similar shapes from different classes and get a visible improvement in clustering accuracy in terms of F-Measure

    Diffeomorphic Transformations for Time Series Analysis: An Efficient Approach to Nonlinear Warping

    Full text link
    The proliferation and ubiquity of temporal data across many disciplines has sparked interest for similarity, classification and clustering methods specifically designed to handle time series data. A core issue when dealing with time series is determining their pairwise similarity, i.e., the degree to which a given time series resembles another. Traditional distance measures such as the Euclidean are not well-suited due to the time-dependent nature of the data. Elastic metrics such as dynamic time warping (DTW) offer a promising approach, but are limited by their computational complexity, non-differentiability and sensitivity to noise and outliers. This thesis proposes novel elastic alignment methods that use parametric \& diffeomorphic warping transformations as a means of overcoming the shortcomings of DTW-based metrics. The proposed method is differentiable \& invertible, well-suited for deep learning architectures, robust to noise and outliers, computationally efficient, and is expressive and flexible enough to capture complex patterns. Furthermore, a closed-form solution was developed for the gradient of these diffeomorphic transformations, which allows an efficient search in the parameter space, leading to better solutions at convergence. Leveraging the benefits of these closed-form diffeomorphic transformations, this thesis proposes a suite of advancements that include: (a) an enhanced temporal transformer network for time series alignment and averaging, (b) a deep-learning based time series classification model to simultaneously align and classify signals with high accuracy, (c) an incremental time series clustering algorithm that is warping-invariant, scalable and can operate under limited computational and time resources, and finally, (d) a normalizing flow model that enhances the flexibility of affine transformations in coupling and autoregressive layers.Comment: PhD Thesis, defended at the University of Navarra on July 17, 2023. 277 pages, 8 chapters, 1 appendi

    Predefined pattern detection in large time series

    Get PDF
    Predefined pattern detection from time series is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of time series representation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly defined beginnings and endings, where all data points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the data and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from overfitting. Templates are provided by common users or domain experts, and represent interesting patterns we want to detect from time series. Instead of utilising templates to match all the potential subsequences in the time series, we translate the time series and templates into landmark sequences, and detect patterns from landmark sequence of the time series. Through defining constraints within the template landmark sequence, we effectively extract all the landmark subsequences from the time series landmark sequence, and obtain a number of landmark segments (time series subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the influence of noise, we introduce the concept oftrust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and efficiency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance

    Master of Science

    Get PDF
    thesisTraffic simulations, which attempt to describe how individual vehicles move on road segments in a network, rely on mathematical traffic flow models developed from empirical vehicle trajectory data (position, speed, acceleration, etc.). Many of these microscopic traffic flow models are described as car-following models, which assume that a driver will respond to the actions of the driver/s or vehicle/s located in front of them (stimulus-response behavior). Model calibration can be performed using regression and/or optimization techniques, but the process is often complicated by uncertainty and variation in human behavior, which can be described as driver heterogeneity. Driver heterogeneity is conceptually based on the idea that different drivers may have different reactions to the same stimuli (interdriver heterogeneity), and an individual driver may react differently to the same type of stimulus (intradriver heterogeneity). To capture interdriver heterogeneity, car-following model parameters must be estimated for each driver/vehicle in the dataset, which are then used to describe a probability distribution associated with those model parameters. Capturing intradriver heterogeneity requires going one step further, calculating those same model parameters over much smaller time periods (i.e., seconds, or fractions of sections) within one vehicle's trajectory. This significantly reduces the amount of data available for calibration, limiting the ability to use traditional calibration procedures
    corecore