Search CORE

33,694 research outputs found

TuLiPA : towards a multi-formalism parsing environment for grammar engineering

Author: Dellert Johannes
Evang Kilian
Kallmeyer Laura
Lichte Timm
Maier Wolfgang
Parmentier Yannick
Publication venue
Publication date: 01/01/2008
Field of study

In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German

Hochschulschriftenserver - Universität Frankfurt am Main

TuLiPA : towards a multi-formalism parsing environment for grammar engineering

Author: Dellert Johannes
Evang Kilian
Kallmeyer Laura
Lichte Timm
Maier Wolfgang
Parmentier Yannick
Publication venue
Publication date: 01/01/2008
Field of study

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Hochschulschriftenserver - Universität Frankfurt am Main

The Unsupervised Acquisition of a Lexicon from Continuous Speech

Author: de Marcken Carl
Publication venue
Publication date: 01/01/1995
Field of study

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Description of the LTG system used for MUC-7

Author: Grover Claire
Mikheev Andrei
Moens Marc
Publication venue
Publication date: 01/01/1998
Field of study

The basic building blocks in our muc system are reusable text handling tools which wehave been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tooldoesavery speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an o cial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. Wewere involved in the developmentofthexml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously adde

CiteSeerX

Edinburgh Research Explorer

Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

Author: Chander Ishwar
Haines Matthew
Hatzivassiloglou Vasileios
Hovy Eduard
Iida Masayo
Knight Kevin
Luk Steve K.
Whitney Richard
Yamada Kenji
Publication venue
Publication date: 01/01/1995
Field of study

Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

arXiv.org e-Print Archive

CiteSeerX

A Lexicalized Tree-Adjoining Grammar for Vietnamese

Author: Hong Phuong L.
Nguyen T.
Romary L.
Roussanaly A.
Publication venue
Publication date: 24/05/2006
Field of study

In this paper, we present the first sizable grammar built for Vietnamese using LTAG, developed over the past two years, named vnLTAG. This grammar aims at modelling written language and is general enough to be both application- and domain-independent. It can be used for the morpho-syntactic tagging and syntactic parsing of Vietnamese texts, as well as text generation. We then present a robust parsing scheme using vnLTAG and a parser for the grammar. We finish with an evaluation using a test suite

INRIA a CCSD electronic archive server

HAL Descartes

MPG.PuRe

Hal-Diderot

The linguistics of gender

Author: Van Berkum J.
Publication venue
Publication date: 01/01/1996
Field of study

This chapter explores grammatical gender as a linguistic phenomenon. First, I define gender in terms of agreement, and look at the parts of speech that can take gender agreement. Because it relates to assumptions underlying much psycholinguistic gender research, I also examine the reasons why gender systems are thought to emerge, change, and disappear. Then, I describe the gender system of Dutch. The frequent confusion about the number of genders in Dutch will be resolved by looking at the history of the system, and the role of pronominal reference therein. In addition, I report on three lexical- statistical analyses of the distribution of genders in the language. After having dealt with Dutch, I look at whether the genders of Dutch and other languages are more or less randomly assigned, or whether there is some system to it. In contrast to what many people think, regularities do indeed exist. Native speakers could in principle exploit such regularities to compute rather than memorize gender, at least in part. Although this should be taken into account as a possibility, I will also argue that it is by no means a necessary implication

MPG.PuRe