56 research outputs found

    Efficient Retrieval of Similar Time Sequences Using DFT

    Full text link
    We propose an improvement of the known DFT-based indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data

    Work in progress: Data explorer - Assessment data integration, analytics, and visualization for STEM education research

    Get PDF
    Citation: Weese, J. L., & Hsu, W. H. (2016). Work in progress: Data explorer - Assessment data integration, analytics, and visualization for STEM education research.We describe a comprehensive system for comparative evaluation of uploaded and preprocessed data in physics education research with applicability to standardized assessments for discipline-based education research, especially in science, technology, mathematics, and engineering. Views are provided for inspection of aggregate statistics about student scores, comparison over time within one course, or comparison across multiple years. The design of this system includes a search facility for retrieving anonymized data from classes similar to the uploader's own. These visualizations include tracking of student performance on a range of standardized assessments. These assessments can be viewed as pre- and post-tests with comparative statistics (e.g., normalized gain), decomposed by answer in the case of multiple-choice questions, and manipulated using pre-specified data transformations such as aggregation and refinement (drill down and roll up). Furthermore, the system is designed to incorporate a scalable framework for machine learning-based analytics, including clustering and similarity-based retrieval, time series prediction, and probabilistic reasoning. © American Society for Engineering Education, 2016

    Implications of Z-normalization in the matrix profile

    Get PDF
    Companies are increasingly measuring their products and services, resulting in a rising amount of available time series data, making techniques to extract usable information needed. One state-of-the-art technique for time series is the Matrix Profile, which has been used for various applications including motif/discord discovery, visualizations and semantic segmentation. Internally, the Matrix Profile utilizes the z-normalized Euclidean distance to compare the shape of subsequences between two series. However, when comparing subsequences that are relatively flat and contain noise, the resulting distance is high despite the visual similarity of these subsequences. This property violates some of the assumptions made by Matrix Profile based techniques, resulting in worse performance when series contain flat and noisy subsequences. By studying the properties of the z-normalized Euclidean distance, we derived a method to eliminate this effect requiring only an estimate of the standard deviation of the noise. In this paper we describe various practical properties of the z-normalized Euclidean distance and show how these can be used to correct the performance of Matrix Profile related techniques. We demonstrate our techniques using anomaly detection using a Yahoo! Webscope anomaly dataset, semantic segmentation on the PAMAP2 activity dataset and for data visualization on a UCI activity dataset, all containing real-world data, and obtain overall better results after applying our technique. Our technique is a straightforward extension of the distance calculation in the Matrix Profile and will benefit any derived technique dealing with time series containing flat and noisy subsequences

    Comparing Time Series Through Event Clusterin

    Get PDF
    The comparison of two time series and the extraction of subsequences that are common to the two is a complex data mining problem. Many existing techniques, like the Discrete Fourier Transform (DFT), offer solutions for comparing two whole time series. Often, however, the important thing is to analyse certain regions, known as events, rather than the whole times series. This applies to domains like the stock market, seismography or medicine. In this paper, we propose a method for comparing two time series by analysing the events present in the two. The proposed method is applied to time series generated by stabilometric and posture graphic systems within a branch of medicine studying balance-related functions in human beings

    Symbol Extraction Method and Symbolic Distance for Analysing Medical Time Series

    Get PDF
    The analysis of time series databases is very important in the area of medicine. Most of the approaches that address this problem are based on numerical algorithms that calculate distances, clusters, index trees, etc. However, a symbolic rather than numerical analysis is sometimes needed to search for the characteristics of the time series. Symbolic information helps users to efficiently analyse and compare time series in the same or in a similar way as a domain expert would. This paper focuses on the process of transforming numerical time series into a symbolic domain and on the definition of both this domain and a distance for comparing symbolic temporal sequences. The work is applied to the isokinetics domain within an application called I4

    Semantic Reference Model in Medical Time Series

    Get PDF
    The analysis of time series databases is very important in the area of medicine. Most of the approaches that address this problem are based on numerical algorithms that calculate distances, clusters, index trees, etc. However, a domain-dependent analyis sometimes needs to be conducted to search for the symblic rather than numerical characteristics of the time series. This paper focuses on our work on the discovery of reference models in time series of isokinetics data and a technique that transforms the numerical time series into symblic series. We briefly describe the algorithm used to create reference models for population groups an its application in the real world. Then, we describe a method based on extracting semantic information from a numerical series. This symbolic information helps users to effciently analyze and compare time series in the same or similar way as a domain expert would

    Aggregation and comparison of trajectories

    Full text link
    corecore