Search CORE

1 research outputs found

Practical Regular Expression Mining And Its Information Quality Applications

Author: Sergei Savchenko
Publication venue
Publication date: 01/01/2002
Field of study

Abstract: Regular expressions are convenient devices representing common patterns in collections of text strings that can be used as filters insuring information quality in textual data. An algorithm inducing a representative regular expression given a set of text strings (possibly containing errors) is described. Such an algorithm is useful in estimating information quality and performing automated cleansing of legacy data or the data obtained by the means of automated sensing (e.g. OCR). A number of practical heuristics improving algorithm’s reallife performance are introduced. A framework employing this algorithm is outlined

CiteSeerX