Search CORE

5,775 research outputs found

Corpora and evaluation tools for multilingual named entity grammar development

Author: Bering Christian
Droźdźyński Witold
Erbach Gregor
Guasch Clara
Homola Petr
Krieger Hans-Ulrich
Lehmann Sabine
Li Hong
Piskorski Jakub
Schäfer Ulrich
Shimada Atsuko
Siegel Melanie
Xu Feiyu
Ziegler-Eisele Dorothee
Publication venue
Publication date: 14/12/2011
Field of study

We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats

Treebank-based multilingual unification-grammar development

Author: Cahill Aoife
Forst Martin
McCarthy Mairéad
O'Donovan Ruth
Rohrer Christian
van Genabith Josef
Way Andy
Publication venue
Publication date: 01/01/2003
Field of study

Broad-coverage, deep unification grammar development is time-consuming and costly. This problem can be exacerbated in multilingual grammar development scenarios. Recently (Cahill et al., 2002) presented a treebank-based methodology to semi-automatically create broadcoverage, deep, unification grammar resources for English. In this paper we present a project which adapts this model to a multilingual grammar development scenario to obtain robust, wide-coverage, probabilistic Lexical-Functional Grammars (LFGs) for English and German via automatic f-structure annotation algorithms based on the Penn-II and TIGER treebanks. We outline our method used to extract a probabilistic LFG from the TIGER treebank and report on the quality of the f-structures produced. We achieve an f-score of 66.23 on the evaluation of 100 random sentences against a manually constructed gold standard

CiteSeerX

Data-Oriented Language Processing. An Overview

Author: Bod Rens
Scha Remko
Publication venue
Publication date: 01/01/1996
Field of study

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et al. 1994; Bod 1992-96; Bod et al. 1996a/b; Bonnema 1996; Charniak 1996a/b; Goodman 1996; Kaplan 1996; Rajman 1995a/b; Scha 1990-92; Sekine & Grishman 1995; Sima'an et al. 1994; Sima'an 1995-96; Tugwell 1995). This approach, which we will call "data-oriented processing" or "DOP", embodies the assumption that human language perception and production works with representations of concrete past language experiences, rather than with abstract linguistic rules. The models that instantiate this approach therefore maintain large corpora of linguistic representations of previously occurring utterances. When processing a new input utterance, analyses of this utterance are constructed by combining fragments from the corpus; the occurrence-frequencies of the fragments are used to estimate which analysis is the most probable one. In this paper we give an in-depth discussion of a data-oriented processing model which employs a corpus of labelled phrase-structure trees. Then we review some other models that instantiate the DOP approach. Many of these models also employ labelled phrase-structure trees, but use different criteria for extracting fragments from the corpus or employ different disambiguation strategies (Bod 1996b; Charniak 1996a/b; Goodman 1996; Rajman 1995a/b; Sekine & Grishman 1995; Sima'an 1995-96); other models use richer formalisms for their corpus annotations (van den Berg et al. 1994; Bod et al., 1996a/b; Bonnema 1996; Kaplan 1996; Tugwell 1995).Comment: 34 pages, Postscrip

arXiv.org e-Print Archive

CiteSeerX

Transformation of Attributed Structures with Cloning (Long Version)

Author: Duval Dominique
Echahed Rachid
Prost Frederic
Ribeiro Leila
Publication venue
Publication date: 13/01/2014
Field of study

Copying, or cloning, is a basic operation used in the specification of many applications in computer science. However, when dealing with complex structures, like graphs, cloning is not a straightforward operation since a copy of a single vertex may involve (implicitly)copying many edges. Therefore, most graph transformation approaches forbid the possibility of cloning. We tackle this problem by providing a framework for graph transformations with cloning. We use attributed graphs and allow rules to change attributes. These two features (cloning/changing attributes) together give rise to a powerful formal specification approach. In order to handle different kinds of graphs and attributes, we first define the notion of attributed structures in an abstract way. Then we generalise the sesqui-pushout approach of graph transformation in the proposed general framework and give appropriate conditions under which attributed structures can be transformed. Finally, we instantiate our general framework with different examples, showing that many structures can be handled and that the proposed framework allows one to specify complex operations in a natural way

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Generalising tree traversals and tree transformations to DAGs:Exploiting sharing without the pain

Author: Abbott
Aho
Anantharaman
Axelsson
Axelsson
Bahr
Bahr
Bahr
Bahr
Bahr
Bahr
Barr
Bille
Bird
Bird
Buneman
Busatto
Charatonik
Charikar
Chlipala
Downey
Eisenberg
Ekman
Emil Axelsson
Engelfriet
Engelfriet
Fila
Fokker
Fors
Fujiyoshi
Fülöp
Gibbons
Gill
Hasuo
Hedin
Hidaka
Johann
Kamimura
Khedker
Khedker
Kiselyov
Knuth
Knuth
Kobayashi
Kuiper
Lerner
Lewis
Lohrey
Magnusson
Maneth
Meijer
Middelkoop
Moor
Morrison
Oliveira
Oliveira
Patrick Bahr
Peyton Jones
Pierce
Quernheim
Raoult
Rompf
Rosendahl
Sakr
Saraiva
Sloane
Uustalu
Viera
Vogt
Wilson
Yakushev
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We present a recursion scheme based on attribute grammars that can be transparently applied to trees and acyclic graphs. Our recursion scheme allows the programmer to implement a tree traversal or a tree transformation and then apply it to compact graph representations of trees instead. The resulting graph traversal or graph transformation avoids recomputation of intermediate results for shared nodes – even if intermediate results are used in different contexts. Consequently, this approach leads to asymptotic speedup proportional to the compression provided by the graph representation. In general, however, this sharing of intermediate results is not sound. Therefore, we complement our implementation of the recursion scheme with a number of correspondence theorems that ensure soundness for various classes of traversals. We illustrate the practical applicability of the implementation as well as the complementing theory with a number of examples

CiteSeerX

Chalmers Research

Chalmers Publication Library