Existing systems dealing with the increasing volume of data series cannot
guarantee interactive response times, even for fundamental tasks such as
similarity search. Therefore, it is necessary to develop analytic approaches
that support exploration and decision making by providing progressive results,
before the final and exact ones have been computed. Prior works lack both
efficiency and accuracy when applied to large-scale data series collections. We
present and experimentally evaluate ProS, a new probabilistic learning-based
method that provides quality guarantees for progressive Nearest Neighbor (NN)
query answering. We develop our method for k-NN queries and demonstrate how it
can be applied with the two most popular distance measures, namely, Euclidean
and Dynamic Time Warping (DTW). We provide both initial and progressive
estimates of the final answer that are getting better during the similarity
search, as well suitable stopping criteria for the progressive queries.
Moreover, we describe how this method can be used in order to develop a
progressive algorithm for data series classification (based on a k-NN
classifier), and we additionally propose a method designed specifically for the
classification task. Experiments with several and diverse synthetic and real
datasets demonstrate that our prediction methods constitute the first practical
solutions to the problem, significantly outperforming competing approaches.
This paper was published in the VLDB Journal (2022)