16 research outputs found

    The genitive case with postpositions in Turkish

    Get PDF
    There is a diverse set of postpositions in Turkish that take genitive marked complements. The genitive case found on these postpositional complements is idiosyncratic, suggesting that it is a lexical case rather than a structural one (Ozt ¨ urk ¨ & Taylan 2016; Satik 2021; Kornfilt 1985; Baker 2015). The lexical genitive case exhibits distinctive behavioral patterns that we do not observe in the structural genitive case. That is, it is only overt on bare pronominals; otherwise, it is zero marked. The overt form of the lexical genitive case is syncretic with the structural genitive case. Through an analysis of these behaviors, we explore the relationship between the lexical and the structural genitive case

    The role of verb semantics in Hungarian verb-object order

    Get PDF
    Hungarian is often referred to as a discourse-configurational language, since the structural position of constituents is determined by their logical function (topic or comment) rather than their grammatical function (e.g., subject or object). We build on work by Komlósy (1989) and argue that in addition to discourse context, the lexical semantics of the verb also plays a significant role in determining Hungarian word order. In order to investigate the role of lexical semantics in determining Hungarian word order, we conduct a large-scale, data-driven analysis on the ordering of 380 transitive verbs and their objects, as observed in hundreds of thousands of examples extracted from the Hungarian Gigaword Corpus. We test the effect of lexical semantics on the ordering of verbs and their objects by grouping verbs into 11 semantic classes. In addition to the semantic class of the verb, we also include two control features related to information structure, object definiteness and object NP weight, chosen to allow a comparison of their effect size to that of verb semantics. Our results suggest that all three features have a significant effect on verb-object ordering in Hungarian and among these features, the semantic class of the verb has the largest effect. Specifically, we find that stative verbs, such as fed 'cover', jelent 'mean' and övez 'surround', tend to be OV-preferring (with the exception of psych verbs which are strongly VO-preferring) and non-stative verbs, such as bírál 'judge', csökkent 'reduce' and csókol 'kiss', verbs tend to be VO-preferring. These findings support our hypothesis that lexical semantic factors influence word order in Hungarian

    Vonzatkeretek vizsgálata orvostudományi tárgyú, angol nyelvű szabadalmi szövegeken

    Get PDF
    Orvostudományi tárgyú, angol nyelv szabadalmi szövegekben el- forduló igék s fnevek vonzatkereteit vizsgáltuk. Az elfordulási gyakoriságuk alapján összeállítottunk egy kifejezetten az orvostudományi tárgyú szabadalmi szövegekre jellemz vonzatkerettárat, amely hasznosítható a hasonló tárgyú szövegekre alkalmazandó szintaktikai és szemantikai elemzk építésében

    Félig kompozicionális szerkezetek a SzegedParalell angol-magyar párhuzamos korpuszban

    Get PDF
    A természetes nyelvi feldolgozásban az egyik legnehezebb problémát a többszavas kifejezések azonosítása és megfelel kezelése jelenti. Ezt megkönnyítend, a SzegedParalell angol–magyar párhuzamos korpusz egy részében kézzel bejelöltük a félig kompozicionális szerkezeteket. A szerkezeteket mindkét nyelven annotáltuk, lehetvé téve ezáltal az angol és magyar szerkezetek automatikus párosítását. Az annotált korpusz jól használható tanuló adatbázisként mind egynyelv, mind többnyelv alkalmazásokban, de a kontrasztív nyelvészetben, stilisztikában, illetve a nyelvoktatásban is hasznosítható

    4FX: Light Verb Constructions in a Multilingual Parallel Corpus

    Get PDF
    Abstract In this paper, we describe 4FX, a quadrilingual (English-Spanish-German-Hungarian) parallel corpus annotated for light verb constructions. We present the annotation process, and report statistical data on the frequency of LVCs in each language. We also offer inter-annotator agreement rates and we highlight some interesting facts and tendencies on the basis of comparing multilingual data from the four corpora. According to the frequency of LVC categories and the calculated Kendalls coefficient for the four corpora, we found that Spanish and German are very similar to each other, Hungarian is also similar to both, but German differs from all these three. The qualitative and quantitative data analysis might prove useful in theoretical linguistic research for all the four languages. Moreover, the corpus will be an excellent testbed for the development and evaluation of machine learning based methods aiming at extracting or identifying light verb constructions in these four languages

    The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions

    Get PDF
    International audienceMultiword expressions (MWEs) are known as a "pain in the neck" for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one's heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as " words with spaces ". We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems
    corecore