1 research outputs found

    Approximate membership localization (AML) for web-based join

    No full text
    In this paper, we propose a search-based approach to join two tables in the absence of clean join attributes. Non-structured documents from the web are used to express the correlations between a given query and a reference list. To implement this approach, a major challenge we meet is how to efficiently determine the number of times and the locations of each clean reference from the reference list that is approximately mentioned in the retrieved documents. We formalize the Approximate Membership Localization (AML) problem and propose an efficient partial pruning algorithm to solve it. A study using real-word data sets demonstrates the effectiveness of our search-based approach, and the efficiency of our AML algorithm
    corecore