Information extraction (IE) addresses the problem of extracting speci c information from a collection of documents. Much of the previous work for IE from structured documents formatted in HTML or XML uses techniques for IE from strings, such as grammar and automata induction. However, HTML and XML documents have a tree structure