Search CORE

147 research outputs found

Language Technology Methods Inspired by an Agglutinative, Free Phrase-Order Language

Author: Merényi Csaba
Prószéky Gábor
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2012
Field of study

Rapid Development of Morphological Descriptions for Full Language Processing Systems

Author: Carter David
Publication venue
Publication date: 01/01/1995
Field of study

I describe a compiler and development environment for feature-augmented two-level morphology rules integrated into a full NLP system. The compiler is optimized for a class of languages including many or most European ones, and for rapid development and debugging of descriptions of new languages. The key design decision is to compose morphophonological and morphosyntactic information, but not the lexicon, when compiling the description. This results in typical compilation times of about a minute, and has allowed a reasonably full, feature-based description of French inflectional morphology to be developed in about a month by a linguist new to the system.Comment: 8 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of Euro ACL-9

arXiv.org e-Print Archive

CiteSeerX

Crossref

Learning tree patterns for syntactic parsing

Author: Hócza András
Publication venue
Publication date: 01/01/2006
Field of study

This paper presents a method for parsing Hungarian texts using a machine learning approach. The method collects the initial grammar for a learner from an annotated corpus with the help of tree shapes. The PGS algorithm, an improved version of the RGLearn algorithm, was developed and applied to learning tree patterns with various phrase types described by regular expressions. The method also calculates the probability values of the learned tree patterns. The syntactic parser of learned grammar using the Viterbi algorithm performs a quick search for finding the most probable derivation of a sentence. The results were built into an information extraction pipeline

University of Szeged

A Feature-Based Lexicalized Tree Adjoining Grammar for Korean

Author: Han Chung-hye
Kim Nari
Palmer Martha
Yoon Juntae
Publication venue: ScholarlyCommons
Publication date: 01/09/2000
Field of study

This document describes an on-going project of developing a grammar of Korean, the Korean XTAG grammar, written in the TAG formalism and implemented for use with the XTAG system enriched with a Korean morphological analyzer. The Korean XTAG grammar described in this report is based on the TAG formalism (Joshi et al. (1975)), which has been extended to include lexicalization (Schabes et al. (1988)), and unification-based feature structures (Vijay-Shanker and Joshi (1991)). The document first describes the modifications that we have made to the XTAG system (The XTAG-Group (1998)) to handle rich inflectional morphology in Korean. Then various syntactic phenomena that can be currently handled are described, including adverb modification, relative clauses, complex noun phrases, auxiliary verb constructions, gerunds and adjunct clauses. The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones

ScholarlyCommons@Penn

Translating While Parsing

Author: Prószéky Gábor
Publication venue: The Linguistic Association of Finland
Publication date: 01/01/2006
Field of study

Repository of the Academy's Library

Recognition Assistance

Author: Kis Balázs
Naszódi Mátyás
Prószéky Gábor
Publication venue: Morgan Kaufmann
Publication date: 01/01/2002
Field of study

Repository of the Academy's Library

Corpus based evaluation of stemmers

Author: Endrédy István
Publication venue: Uniwersytet im. Adama Mickiewicza w Poznaniu
Publication date: 01/01/2015
Field of study

Repository of the Academy's Library

Wide-coverage parsing for Turkish

Author: Çakici Ruket
Publication venue: The University of Edinburgh
Publication date: 01/01/2009
Field of study

Wide-coverage parsing is an area that attracts much attention in natural language processing research. This is due to the fact that it is the first step tomany other applications in natural language understanding, such as question answering. Supervised learning using human-labelled data is currently the best performing method. Therefore, there is great demand for annotated data. However, human annotation is very expensive and always, the amount of annotated data is much less than is needed to train well-performing parsers. This is the motivation behind making the best use of data available. Turkish presents a challenge both because syntactically annotated Turkish data is relatively small and Turkish is highly agglutinative, hence unusually sparse at the whole word level. METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface dependency relations and morphological analyses for words. We show that including even the crudest forms of morphological information extracted from the data boosts the performance of both generative and discriminative parsers, contrary to received opinion concerning English. We induce word-based and morpheme-based CCG grammars from Turkish dependency treebank. We use these grammars to train a state-of-the-art CCG parser that predicts long-distance dependencies in addition to the ones that other parsers are capable of predicting. We also use the correct CCG categories as simple features in a graph-based dependency parser and show that this improves the parsing results. We show that a morpheme-based CCG lexicon for Turkish is able to solve many problems such as conflicts of semantic scope, recovering long-range dependencies, and obtaining smoother statistics from the models. CCG handles linguistic phenomena i.e. local and long-range dependencies more naturally and effectively than other linguistic theories while potentially supporting semantic interpretation in parallel. Using morphological information and a morpheme-cluster based lexicon improve the performance both quantitatively and qualitatively for Turkish. We also provide an improved version of the treebank which will be released by kind permission of METU and Sabancı

Edinburgh Research Archive

MetaMorpho: A Pattern-Based Machine Translation System

Author: Prószéky Gábor
Tihanyi László
Publication venue
Publication date: 01/01/2002
Field of study

Repository of the Academy's Library

Hungarian-Somali-English Online Dictionary and Taxonomy

Author: Endrédy István
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2014
Field of study

Repository of the Academy's Library