328 research outputs found

    Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation

    Get PDF
    We present a system for verbal Word Sense Disambiguation (WSD) that is able to exploit additional information from parallel texts and lexicons. It is an extension of our previous WSD method, which gave promising results but used only monolingual features. In the follow-up work described here, we have explored two additional ideas: using English-Czech bilingual resources (as features only - the task itself remains a monolingual WSD task), and using a 'hybrid' approach, adding features extracted both from a parallel corpus and from manually aligned bilingual valency lexicon entries, which contain subcategorization information. Albeit not all types of features proved useful, both ideas and additions have led to significant improvements for both languages explored

    Towards English-to-Czech MT via Tectogrammatical Layer

    Get PDF
    Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories. Editors: Koenraad De Smedt, Jan Hajič and Sandra Kübler. NEALT Proceedings Series, Vol. 1 (2007), 7-18. © 2007 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/4476

    Comparing Czech and English AMRs

    Get PDF
    This paper compares Czech and English annotation using Abstract Meaning Represantation formalism

    Treebanking in the world of Thucydides. Linguistic annotation for the Hellespont Project

    Get PDF
    The Hellespont project (DAI, Tufts University) aims to structure the text of a passage from the ancient Greek historian Thucydides (1.89-118), in order to highlight events, persons and peoples that populate the world of the author and connect the different digital sources available for their study. Event annotation in the text in particular requires an in-depth linguistic analysis of morphology, syntax and semantics. However, the available resources for Ancient Greek do not provide adequate standards to support the encoding of semantic and pragmatic phenomena in Ancient Greek texts. In this paper, we discuss the motivation of the project and how we adapted the so called tectogrammatical annotation of the Prague Dependency Treebank to identify the events and describe their structure. The linguistic notion of valency, which is central to tectogrammatical sentence representation, proves very useful for this analysis of Ancient Greek

    Sastavljanje Hrvatske ovisnosne banke stabala: početne etape

    Get PDF
    The paper presents work–in–progress on the building of the Croatian Dependency Treebank. Its design principles, procedures and the pilot corpus used within are described. Perspectives for further development of the Croatian Dependency Treebank are presented at the end.Članak donosi međurezultate sastavljanja Hrvatske ovisnosne banke stabala koje je istraživanje u tijeku. Opisuju se njezina načela oblikovanja, postupci i uporabljeni pilot korpus. Na kraju se članka predstavljaju perspektive za daljnji razvitak Hrvatske ovisnosne banke stabala

    Sastavljanje Hrvatske ovisnosne banke stabala: početne etape

    Get PDF
    The paper presents work–in–progress on the building of the Croatian Dependency Treebank. Its design principles, procedures and the pilot corpus used within are described. Perspectives for further development of the Croatian Dependency Treebank are presented at the end.Članak donosi međurezultate sastavljanja Hrvatske ovisnosne banke stabala koje je istraživanje u tijeku. Opisuju se njezina načela oblikovanja, postupci i uporabljeni pilot korpus. Na kraju se članka predstavljaju perspektive za daljnji razvitak Hrvatske ovisnosne banke stabala

    Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

    Get PDF
    The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations

    Proceedings

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), vi+69 pp. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    Důležitá slova. Podklady ke kolokačnímu švédsko-českému slovníku základních sloves

    Get PDF
    Basic verbs, i.e. very common verbs that typically denote physical movements, locations, states or actions, undergo various semantic shifts and acquire different secondary uses. In extreme cases, the distribution of secondary uses grows so general that they are regarded as auxiliary verbs (go and to be going to), phase verbs (turn, grow), etc. ese uses are usually well-documented by grammars and language textbooks, and so are idiomatic expressions (phraseologisms) in dictionaries. ere is, however, a grey area in between, which is extremely difficult to learn for non-native speakers. is consists of secondary uses with limited collocability, in particular light verb constructions, and secondary meanings that only get activated under particular morphosyntactic conditions. e basic-verb secondary uses and constructions are usually semantically transparent, such that they do not pose understanding problems, but they are generally unpredictable and language-specific, such that they easily become an issue in non-native text production. In this thesis, Swedish basic verbs are approached from the contrastive point of view of an advanced Czech learner of Swedish. A selection of Swedish constructions with basic verbs is explored. e observations result in a proposal for the structure of a machine-readable Swedish-Czech...Základní slovesa (basic verbs), tj. frekventovaná významová slovesa, jež zpravidla popisují fyzický pohyb, umístění, stav, nebo děj, procházejí řadou sémantických posunů, díky kterým se používají k vyjádření druhotných, přenesených významů. V krajních případech se dané sloveso stává pomocným, způsobovým, nebo fázovým slovesem a přestávají pro ně platit kolokační omezení, jež se vztahují na sloveso užité v jeho primárním (tj. doslovném) významu. Tato užití sloves bývají většinou dobře dokumentována v gramatikách i učebnicích, stejně jako kvalitní slovníky podávají podrobnou informaci o užití těchto sloves v ustálených frazeologických spojeních. Mezi plně gramatikalizovaným užitím na jedné straně a idiomatickým, frazeologickým užitím na druhé straně však existuje celá škála užití základních sloves v přenesených významech, jejíž zvládnutí je pro nerodilého mluvčího značně obtížné: užití v přeneseném významu, jež mají omezenou kolokabilitu. To jsou především verbonominální konstrukce někdy nazývané analytické predikáty (light verb constructions), ale také užití, která za určitých omezených morfosyntaktických podmínek (např. pouze v negaci) aktivují abstraktní sémantické rysy u jiných predikátů, např. zesilují význam, nebo implikují, že daný děj již trvá dlouho, a podobně. Tato druhotná užití významových sloves...Institute of Germanic StudiesÚstav germánských studiíFilozofická fakultaFaculty of Art

    Extending an Event-type Ontology: Adding Verbs and Classes Using Fine-tuned LLMs Suggestions

    Full text link
    In this project, we have investigated the use of advanced machine learning methods, specifically fine-tuned large language models, for pre-annotating data for a lexical extension task, namely adding descriptive words (verbs) to an existing (but incomplete, as of yet) ontology of event types. Several research questions have been focused on, from the investigation of a possible heuristics to provide at least hints to annotators which verbs to include and which are outside the current version of the ontology, to the possible use of the automatic scores to help the annotators to be more efficient in finding a threshold for identifying verbs that cannot be assigned to any existing class and therefore they are to be used as seeds for a new class. We have also carefully examined the correlation of the automatic scores with the human annotation. While the correlation turned out to be strong, its influence on the annotation proper is modest due to its near linearity, even though the mere fact of such pre-annotation leads to relatively short annotation times.Comment: Accepted to LAW-XVII @ ACL 202
    corecore