unknown

ESSENCE: a portable methodology for building information extraction systems

Abstract

One of the most important issues when constructing an Information Extraction System is how to obtain the knowledge needed for identifying relevant information in a document. A manual approach not only is an expensive solution but also has a negative effect on the portability of the system across domains. To automatize the knowledge acquisition process may partially solve this problem even if a human expert takes part in it only for specific tasks. This work presents a methodology ({sc Essence}) to automatically learn information extraction patterns from unrestricted text corpus representative of the domain. The methodology includes different steps from which we stress the specific pattern generalization process. Generalization reduces the pattern base and therefore reduces the amount of information to validate by an expert. As we will see, the use of the lexical knowledge along with the lexico-semantic relations from WordNet are our basis knowledge source, especially, for the generalization process.Postprint (published version

    Similar works