1,441 research outputs found
Multiple hierarchies : new aspects of an old solution
In this paper, we present the Multiple Annotation approach, which solves two problems: the problem of annotating overlapping structures, and the problem that occurs when documents should be annotated according to different, possibly heterogeneous tag sets. This approach has many advantages: it is based on XML, the modeling of alternative annotations is possible, each level can be viewed separately, and new levels can be added at any time. The files can be regarded as an interrelated unit, with the text serving as the implicit link. Two representations of the information contained in the multiple files (one in Prolog and one in XML) are described. These representations serve as a base for several applications
Guideline: Multiple Hierarchies
As the title of the Dagstuhl Seminar ``Digital Historical Corpora - Architecture,
Annotation, and Retrieval\u27\u27 already suggests, corpus architecture and corpus annotation is an important topic for representing (historical) texts. Especially the limitation of SGML-based markup languages to tree structured annotations raises a special problems when dealing with manuscripts: How is it possible to represent overlap. This problem was addressed by the Text Encoding Initiative (TEI) and by several scholars. This text gives an overview of several techniques for handling the overlap problem
Co-reference in Japanese Task-oriented Dialogues: A Contribution to the Development of Language-specific and Language-general Annotation Schemes and Resources
This paper describes a corpus of Japanese task-oriented dialogues, i.e. its data, annotations, analysis methodology and preliminary results for the modeling of co-referential phenomena. Current corpus based approaches to co-reference concentrate on textual data from English or other European languages. Hence, the emerging language-general models of co-reference miss input from dialogue data of non-European languages. We aim to fill this gap and contribute to a model of co-reference on various language-specific and language-general levels
Informationsinfrastrukturen am Institut fĂĽr Deutsche Sprache
This paper describes the effort of the Institut fĂĽr Deutsche Sprache (IDS), the central research institution for the German language, connected with Information and Communications Technology (ICT). Use of ICT in a language research institute is twofold. On the one hand, ICT provides basic services for researches to accomplish their daily work. On the other hand, several national and international institutions have a strong interest in ICT. Therefore, ICT can also be seen as an amplifier for language research. The first part of this paper reports on the activates of the IDS in internal and external ICT-related projects and initiatives. The second part describes a general strategy towards an ICT strategy that could be useful both for the IDS and other national language institutes. We think such a general strategy is necessary to create a strong foundation not only for the ICT-related projects, but as a basis for a modem research institute
Méthodes pour la représentation informatisée de données lexicales / Methoden der Speicherung lexikalischer Daten
In recent years, new developments in the area of lexicography have altered not only the management, processing and publishing of lexicographical data, but also created new types of products such as electronic dictionaries and thesauri. These expand th range of possible uses of lexical data and support users with more flexibility, for instance in assisting human translation. In this article, we give a short and easy-to-understand introduction to the problematic nature of the storage, display and interpretation of lexical data. We then describe the main methods and specifications used to build and represent lexical data.In diesem Beitrag werden zwei Darstellungen zur Speicherung lexikalischer Daten in zwei verschiedenen Sprachen prasentiert. Die Texte beschreiben zwar in einer parallelen Gliederung dieselben Themen, sind aber keine direkte Ubersetzung voneinander.Dieses Kapitel richtet sich an unterschiedliche Zielgruppen, neben Sprachwissenschaftler(inne)n und Lexikograph(inn)en richtet es sich auch an Informatiker(innen) und Computerlinguist(inn)en, die mehr uber die Grundlagen der Modellierung und Darstellung von digitalen Worterbuchern lernen mochten. Wir betrachten dieses Kapitel als moglichen Ausgangspunkt fur diejenigen, die lexikographische Projekte beginnen wollen, und pladieren fur eine grundliche Durchdringung der Problematik der Speicherung lexikalischer Daten
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
- …