1 research outputs found

    Practical Regular Expression Mining And Its Information Quality Applications

    No full text
    Abstract: Regular expressions are convenient devices representing common patterns in collections of text strings that can be used as filters insuring information quality in textual data. An algorithm inducing a representative regular expression given a set of text strings (possibly containing errors) is described. Such an algorithm is useful in estimating information quality and performing automated cleansing of legacy data or the data obtained by the means of automated sensing (e.g. OCR). A number of practical heuristics improving algorithm’s reallife performance are introduced. A framework employing this algorithm is outlined
    corecore