20 research outputs found

    Wrapping PDF Documents Exploiting Uncertain Knowledge

    No full text

    Information Extraction in Structured Documents using Tree Automata Induction

    No full text
    Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, HTML and XML documents have a tree structure
    corecore