One of the central knowledge sources of an information extraction system is a
dictionary of linguistic patterns that can be used to identify the conceptual
content of a text. This paper describes CRYSTAL, a system which automatically
induces a dictionary of "concept-node definitions" sufficient to identify
relevant information from a training corpus. Each of these concept-node
definitions is generalized as far as possible without producing errors, so that
a minimum number of dictionary entries cover the positive training instances.
Because it tests the accuracy of each proposed definition, CRYSTAL can often
surpass human intuitions in creating reliable extraction rules.Comment: 6 pages, Postscript, IJCAI-95
http://ciir.cs.umass.edu/info/psfiles/tepubs/tepubs.htm