33,694 research outputs found

    TuLiPA : towards a multi-formalism parsing environment for grammar engineering

    Get PDF
    In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German

    TuLiPA : towards a multi-formalism parsing environment for grammar engineering

    Get PDF
    In this paper, we present an open-source parsing environment (Tübingen Linguistic Parsing Architecture, TuLiPA) which uses Range Concatenation Grammar (RCG) as a pivot formalism, thus opening the way to the parsing of several mildly context-sensitive formalisms. This environment currently supports tree-based grammars (namely Tree-Adjoining Grammars (TAG) and Multi-Component Tree-Adjoining Grammars with Tree Tuples (TT-MCTAG)) and allows computation not only of syntactic structures, but also of the corresponding semantic representations. It is used for the development of a tree-based grammar for German

    The Unsupervised Acquisition of a Lexicon from Continuous Speech

    Get PDF
    We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

    Description of the LTG system used for MUC-7

    Get PDF
    The basic building blocks in our muc system are reusable text handling tools which wehave been developing and using for a number of years at the Language Technology Group. They are modular tools with stream input/output; each tooldoesavery speci c job, but can be combined with other tools in a unix pipeline. Di erent combinations of the same tools can thus be used in a pipeline for completing di erent tasks. Our architecture imposes an additional constraint on the input/output streams: they should have a common syntactic format. For this common format we chose eXtensible Markup Language (xml). xml is an o cial, simpli ed version of Standard Generalised Markup Language (sgml), simpli ed to make processing easier [3]. Wewere involved in the developmentofthexml standard, building on our expertise in the design of our own Normalised sgml (nsl) and nsl tool lt nsl [10], and our xml tool lt xml [11]. A detailed comparison of this sgml-oriented architecture with more traditional data-base oriented architectures can be found in [9]. A tool in our architecture is thus a piece of software which uses an api for all its access to xml and sgml data and performs a particular task: exploiting markup which has previously been added by other tools, removing markup, or adding new markup to the stream(s) without destroying the previously adde

    Filling Knowledge Gaps in a Broad-Coverage Machine Translation System

    Full text link
    Knowledge-based machine translation (KBMT) techniques yield high quality in domains with detailed semantic models, limited vocabulary, and controlled input grammar. Scaling up along these dimensions means acquiring large knowledge resources. It also means behaving reasonably when definitive knowledge is not yet available. This paper describes how we can fill various KBMT knowledge gaps, often using robust statistical techniques. We describe quantitative and qualitative results from JAPANGLOSS, a broad-coverage Japanese-English MT system.Comment: 7 pages, Compressed and uuencoded postscript. To appear: IJCAI-9

    A Lexicalized Tree-Adjoining Grammar for Vietnamese

    Get PDF
    In this paper, we present the first sizable grammar built for Vietnamese using LTAG, developed over the past two years, named vnLTAG. This grammar aims at modelling written language and is general enough to be both application- and domain-independent. It can be used for the morpho-syntactic tagging and syntactic parsing of Vietnamese texts, as well as text generation. We then present a robust parsing scheme using vnLTAG and a parser for the grammar. We finish with an evaluation using a test suite

    The linguistics of gender

    Get PDF
    This chapter explores grammatical gender as a linguistic phenomenon. First, I define gender in terms of agreement, and look at the parts of speech that can take gender agreement. Because it relates to assumptions underlying much psycholinguistic gender research, I also examine the reasons why gender systems are thought to emerge, change, and disappear. Then, I describe the gender system of Dutch. The frequent confusion about the number of genders in Dutch will be resolved by looking at the history of the system, and the role of pronominal reference therein. In addition, I report on three lexical- statistical analyses of the distribution of genders in the language. After having dealt with Dutch, I look at whether the genders of Dutch and other languages are more or less randomly assigned, or whether there is some system to it. In contrast to what many people think, regularities do indeed exist. Native speakers could in principle exploit such regularities to compute rather than memorize gender, at least in part. Although this should be taken into account as a possibility, I will also argue that it is by no means a necessary implication
    • …
    corecore