Search CORE

2,289 research outputs found

Recommended from our members

The Swiss army knife of time series data mining: ten useful things you can do with the matrix profile and ten lines of code

Author: Almaslukh Abdulaziz
Dau Hoang Anh
Funning Gareth
Gharghabi Shaghayegh
Kamgar Kaveh
Keogh Eamonn
Mueen Abdullah
Shakibay Senobari Nader
Silva Diego Furtado
Yeh Chin-Chia Michael
Zhu Yan
Zimmerman Zachary
Publication venue: eScholarship, University of California
Publication date: 01/07/2020
Field of study

eScholarship - University of California

Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs

Author: Kramer Stefan
Raza Atif
Publication venue
Publication date: 22/02/2017
Field of study

Shapelets are discriminative time series subsequences that allow generation of interpretable classification models, which provide faster and generally better classification than the nearest neighbor approach. However, the shapelet discovery process requires the evaluation of all possible subsequences of all time series in the training set, making it extremely computation intensive. Consequently, shapelet discovery for large time series datasets quickly becomes intractable. A number of improvements have been proposed to reduce the training time. These techniques use approximation or discretization and often lead to reduced classification accuracy compared to the exact method. We are proposing the use of ensembles of shapelet-based classifiers obtained using random sampling of the shapelet candidates. Using random sampling reduces the number of evaluated candidates and consequently the required computational cost, while the classification accuracy of the resulting models is also not significantly different than that of the exact algorithm. The combination of randomized classifiers rectifies the inaccuracies of individual models because of the diversity of the solutions. Based on the experiments performed, it is shown that the proposed approach of using an ensemble of inexpensive classifiers provides better classification accuracy compared to the exact method at a significantly lesser computational cost

arXiv.org e-Print Archive

FigShare

Personalized Purchase Prediction of Market Baskets with Wasserstein-Based Sequence Matching

Author: Agrawal Rakesh
Hidasi Balázs
Kusner Matt
Rendle Steffen
Sakurai Yasushi
Tran Nhat-Quang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2019
Field of study

Personalization in marketing aims at improving the shopping experience of customers by tailoring services to individuals. In order to achieve this, businesses must be able to make personalized predictions regarding the next purchase. That is, one must forecast the exact list of items that will comprise the next purchase, i.e., the so-called market basket. Despite its relevance to firm operations, this problem has received surprisingly little attention in prior research, largely due to its inherent complexity. In fact, state-of-the-art approaches are limited to intuitive decision rules for pattern extraction. However, the simplicity of the pre-coded rules impedes performance, since decision rules operate in an autoregressive fashion: the rules can only make inferences from past purchases of a single customer without taking into account the knowledge transfer that takes place between customers. In contrast, our research overcomes the limitations of pre-set rules by contributing a novel predictor of market baskets from sequential purchase histories: our predictions are based on similarity matching in order to identify similar purchase habits among the complete shopping histories of all customers. Our contributions are as follows: (1) We propose similarity matching based on subsequential dynamic time warping (SDTW) as a novel predictor of market baskets. Thereby, we can effectively identify cross-customer patterns. (2) We leverage the Wasserstein distance for measuring the similarity among embedded purchase histories. (3) We develop a fast approximation algorithm for computing a lower bound of the Wasserstein distance in our setting. An extensive series of computational experiments demonstrates the effectiveness of our approach. The accuracy of identifying the exact market baskets based on state-of-the-art decision rules from the literature is outperformed by a factor of 4.0.Comment: Accepted for oral presentation at 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019

arXiv.org e-Print Archive

Crossref

The Influence of Global Constraints on Similarity Measures for Time-Series Databases

Author: Geler Zoltan
Ivanović Mirjana
Kurbalija Vladimir
Radovanović Miloš
Publication venue
Publication date: 25/12/2013
Field of study

A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents and important tool in many application areas, such as stock market analysis, process and quality control, observation of natural phenomena, medical treatments, etc. A vital component in many types of time-series analysis is the choice of an appropriate distance/similarity measure. Numerous measures have been proposed to date, with the most successful ones based on dynamic programming. Being of quadratic time complexity, however, global constraints are often employed to limit the search space in the matrix during the dynamic programming procedure, in order to speed up computation. Furthermore, it has been reported that such constrained measures can also achieve better accuracy. In this paper, we investigate two representative time-series distance/similarity measures based on dynamic programming, Dynamic Time Warping (DTW) and Longest Common Subsequence (LCS), and the effects of global constraints on them. Through extensive experiments on a large number of time-series data sets, we demonstrate how global constrains can significantly reduce the computation time of DTW and LCS. We also show that, if the constraint parameter is tight enough (less than 10-15% of time-series length), the constrained measure becomes significantly different from its unconstrained counterpart, in the sense of producing qualitatively different 1-nearest neighbor graphs. This observation explains the potential for accuracy gains when using constrained measures, highlighting the need for careful tuning of constraint parameters in order to achieve a good trade-off between speed and accuracy

arXiv.org e-Print Archive

CiteSeerX