Search CORE

1 research outputs found

Perspectives of Turning Prague Dependency Treebank into a Knowledge Base

Author: Jan Hajič
Václav Novák
Publication venue
Publication date
Field of study

Recently, the Prague Dependency Treebank 2.0 (PDT 2.0) has emerged as the largest text corpora annotated on the level of tectogrammatical representation (“linguistic meaning”) described in Sgall et al. (2004) and containing about 0.8 milion words (see Hajič (2004)). We hope that this level of annotation is so close to the meaning of the utterances contained in the corpora that it should enable us to automatically transform texts contained in the corpora to the form of knowledge base, usable for information extraction, question answering, summarization, etc. We can use Multilayered Extended Semantic Networks (MultiNet) described in Helbig (2006) as the target formalism. In this paper we discuss the suitability of such approach and some of the main issues that will arise in the process. In section 1. we introduce formalisms underlying PDT 2.0 and MultiNet, in section 2. we describe the role MultiNet can play in the system of Functional Generative Description (FGD), section 3. discusses issues of automatic conversion to MultiNet and section 4. gives some conclusions. 1

CiteSeerX