A speech understanding system typically includes a natural language understanding module that defines concepts, i.e., groups of semantically related words. It is a challenge to build a set of concepts for a new domain for which prior knowledge and paining dah are limited. In our work, concepts are induced automatically from unannotated paining data by grouping semantically similar words and phrases together into concept classes. Four wntextdependent similarity metrics are proposed and their performance for auto-inducing concepts is evaluated. Two of these metrics are based on the Kullback-Leibler (KL) distance measure, a third is the Manhattan norm, and the fourth is the vector product (W) similarity measure. The KL and VP melrics consistently underperform the other metrics on the four tasks investigated: movie informatiob a children's game, travel reservations, and Wall Street- Journal news articles. Correct concept classification rates are up to 90 % for the movie task. In -, the idea of auto-induction of semantic classes using a similarity metric was proposed. The choice of the metric used to determine the degree of similarity between two candidate words being considered for a semantic class is clearly a critical issue. In this paper, we compare the performance of four different metrics used for auto-induction. These melrics are the Kullback-Leibler distance, the Information-Radius distance, the Manhattan-Non distance, and the Vector-Pmduct similarity [6, IO]. The metrics are evaluated for four different application domains: a movie information retrieval service, the Carmen-Sandiego computer game, a travel reservation system, and the Wall Sweet Journal. The WSI was a large, text-based corpus. The other three were small, transcribed dialogues between human subjects and agents. The metrics are evaluated by comparing results from automatic and manual annotation of semantic classes. 2. AUTO-INDUCTION OF CONCEPTS 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.