1 research outputs found

    Semantic annotation of a Japanese speech corpus

    No full text
    This paper describes the semantic annotations we are performing on the CallHome Japanese corpus of spontaneous, unscripted telephone conversations (LDC, 1996). Our annotations include (i) semantic classes for all nouns and verbs; (ii) verb senses for all main verbs; and (iii) relations between main verbs and their complements in the same utterance. Our semantic tagset is taken from NTT's Goi-Taikei semantic lexicon and ontology (Ikehara et al., 1997). A pilot study demonstrates that the verb sense tagging can be e#ciently performed by native Japanese speakers using computergenerated HTML forms, and that good interannotator reliability can be obtained in the right conditions. 1 Introduction Semantic annotations have proved valuable for a variety of NLP tasks, including parsing, word sense disambiguation, coreference resolution, summarization, and information retrieval and extraction. The most challenging domain for all these tasks is spontaneous spoken language, which tends to be more..