Search CORE

2 research outputs found

Figure 1. Comparison of the number of matches

Author: Tingjian Ge
Zheng Li
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT Text data is prevalent in life. Some of this data is uncertain and is best modeled by probability distributions. Examples include biological sequence data and automatic ECG annotations, among others. Approximate substring matching over uncertain texts is largely an unexplored problem in data management. In this paper, we study this intriguing question. We propose a semantics called (k, τ)-matching queries and argue that it is more suitable in this context than a related semantics that has been proposed previously. Since uncertainty incurs considerable overhead on indexing as well as the final verification for a match, we devise techniques for both. For indexing, we propose a multilevel filtering technique based on measuring signature distance; for verification, we design two algorithms that give upper and lower bounds and significantly reduce the costs. We validate our algorithms with a systematic evaluation on two real-world datasets and some synthetic datasets

CiteSeerX