716 research outputs found

    LEXICOGRAPHICAL TREATMENT OF SLOVAK ADJECTIVE-NOUN COLLOCATIONS

    Get PDF
    In the article, we focus on the conceptual treatment of adjective-noun collocations in the Dictionary of Slovak Collocations. Adjectives. We deal with the analysis of the formal and semantic aspects of collocations, especially those collocations that lie on the border between fixed and free word combinations. We evaluate the formal side of collocations from the perspective of their formal preference and observe how it is reflected in their changed word-class features. We analyse the semantic side of the chosen collocations using their restrictions in collocability. We demonstrate that the formal and collocability preference of words reflects itself in their statistical characteristics, which is why we consider the preference an important lexicographical criterion for choosing collocation candidates

    Uvid u automatsko izlučivanje metaforičkih kolokacija

    Get PDF
    Collocations have been the subject of much scientific research over the years. The focus of this research is on a subset of collocations, namely metaphorical collocations. In metaphorical collocations, a semantic shift has taken place in one of the components, i.e., one of the components takes on a transferred meaning. The main goal of this paper is to review the existing literature and provide a systematic overview of the existing research on collocation extraction, as well as the overview of existing methods, measures, and resources. The existing research is classified according to the approach (statistical, hybrid, and distributional semantics) and presented in three separate sections. The insights gained from existing research serve as a first step in exploring the possibility of developing a method for automatic extraction of metaphorical collocations. The methods, tools, and resources that may prove useful for future work are highlighted.Kolokacije su već dugi niz godina tema mnogih znanstvenih istraživanja. U fokusu ovoga istraživanja podskupina je kolokacija koju čine metaforičke kolokacije. Kod metaforičkih je kolokacija kod jedne od sastavnica došlo do semantičkoga pomaka, tj. jedna od sastavnica poprima preneseno značenje. Glavni su ciljevi ovoga rada istražiti postojeću literaturu te dati sustavan pregled postojećih istraživanja na temu izlučivanja kolokacija i postojećih metoda, mjera i resursa. Postojeća istraživanja opisana su i klasificirana prema različitim pristupima (statistički, hibridni i zasnovani na distribucijskoj semantici). Također su opisane različite asocijativne mjere i postojeći načini procjene rezultata automatskoga izlučivanja kolokacija. Metode, alati i resursi koji su korišteni u prethodnim istraživanjima, a mogli bi biti korisni za naš budući rad posebno su istaknuti. Stečeni uvidi u postojeća istraživanja čine prvi korak u razmatranju mogućnosti razvijanja postupka za automatsko izlučivanje metaforičkih kolokacija

    A Computational Lexicon and Representational Model for Arabic Multiword Expressions

    Get PDF
    The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

    Chapter 3 - Corpora and corpus linguistics

    Get PDF

    Modelling morphosyntactic variation in World Englishes from a register perspective

    Get PDF
    This paper addresses Miller’s (2000) and Brown and Miller’s (2017) hypothesis that the adverbs just, (n)ever and yet are becoming markers of perfect meaning in spoken English, and this at the expense of weakening semantically and reducing the use of the have + past participle periphrasis. The hypothesis is tested in eight varieties of Present-Day English from the perspective of Usage Based Theory (Bybee 2006, 2011, 2013) and with a corpus-based, onomasiological methodology. The results confirm the hypothesis only partially; crucially, data reveal that in order to model morphosyntactic variation in a rigorous way we need to adopt a register perspective such as that used by Biber and associates (e.g. Biber and Gray 2016), who demonstrate that language variation and change is mediated by register variation.Este artículo versa sobre la hipótesis vertida en Miller (2000) y Brown y Miller (2017) sobre los adverbios just, (n)ever y yet, según la cual estos adverbios se están convirtiendo paulatinamente en marcadores de perfecto a expensas del debilitamiento semántico de la perífrasis de perfecto have + participio de pasado. Este trabajo comprueba esta hipótesis en ocho variedades de inglés contemporáneo desde el enfoque de la UBT (Usage Based Theory, cf. Bybee 2006, 2011, 2013), con una metodología basada en corpus y un enfoque onomasiológico. Los resultados confirman la antedicha hipótesis solamente de modo parcial pero, crucialmente, los datos revelan que para describir de modo riguroso la variación morfosintáctica se necesita adoptar una perspectiva de registro como hacen Biber y colegas (por ejemplo, Biber y Gray 2016), quienes demuestran que la variación el cambio lingüístico está mediado y depende de los patrones de variación observados en los distintos registros

    D6.1: Technologies and Tools for Lexical Acquisition

    Get PDF
    This report describes the technologies and tools to be used for Lexical Acquisition in PANACEA. It includes descriptions of existing technologies and tools which can be built on and improved within PANACEA, as well as of new technologies and tools to be developed and integrated in PANACEA platform. The report also specifies the Lexical Resources to be produced. Four main areas of lexical acquisition are included: Subcategorization frames (SCFs), Selectional Preferences (SPs), Lexical-semantic Classes (LCs), for both nouns and verbs, and Multi-Word Expressions (MWEs)

    Principles of event framing : genetic stability in grammar and discourse

    Get PDF
    Ever since Wilhelm von Humboldt’s (1836) pioneering study of Nahuatl, linguists have recurrently recognized that languages differ fundamentally in the syntactic weight they attribute to noun-phrases as the arguments of a verb. Currently, the most prominent attempts to turn this intuition into a precise hypothesis revolve around the notion of ‘configurationality’

    Register-Specific Collocational Constructions in English and Spanish: A Usage-Based Approach

    Get PDF
    Constructions are usage-based, conventionalised pairings of form and function within a cline of complexity and schematisation. Most research within Construction Grammar has focused on the monolingual description of schematic constructions: Mainly in English, but to a lesser extent in other languages as well. By contrast, very little constructional analyses have been carried out across languages. In this study we will focus on a type of partially substantive construction from the point of view of contrastive analysis and translation which, to the best of our knowledge, is one of the first studies of this kind. The first half of the article lays down the theoretical foundations of the study and introduces Construction Grammar as well as other formalisms used in literature in order to provide a construal account of collocations, a pervasive phenomenon in language. The experimental part describes the case study of V NP collocations with disease/enfermedad in comparable corpora in English and Spanish, both in the general domain and in the specialised medical domain. It is provided a comparative analysis of these constructions across domains and languages in terms of token-type ratio (constructional restriction-rate), lexical function, type of determiner, frequency ranking of the verbal collocate and domain specificity of collocates, among others. New measures to assess construal bondness will be put forward (lexical filledness rate and individual productivity rate) and special attention will be paid to register-dependent equivalent semantic-functional counterparts in English and Spanish and mismatches.Published versio

    Verb similarity: comparing corpus and psycholinguistic data

    Get PDF
    Similarity, which plays a key role in fields like cognitive science, psycholinguistics and natural language processing, is a broad and multifaceted concept. In this work we analyse how two approaches that belong to different perspectives, the corpus view and the psycholinguistic view, articulate similarity between verb senses in Spanish. Specifically, we compare the similarity between verb senses based on their argument structure, which is captured through semantic roles, with their similarity defined by word associations. We address the question of whether verb argument structure, which reflects the expression of the events, and word associations, which are related to the speakers' organization of the mental lexicon, shape similarity between verbs in a congruent manner, a topic which has not been explored previously. While we find significant correlations between verb sense similarities obtained from these two approaches, our findings also highlight some discrepancies between them and the importance of the degree of abstraction of the corpus annotation and psycholinguistic representations.La similitud, que desempeña un papel clave en campos como la ciencia cognitiva, la psicolingüística y el procesamiento del lenguaje natural, es un concepto amplio y multifacético. En este trabajo analizamos cómo dos enfoques que pertenecen a diferentes perspectivas, la visión del corpus y la visión psicolingüística, articulan la semejanza entre los sentidos verbales en español. Específicamente, comparamos la similitud entre los sentidos verbales basados en su estructura argumental, que se capta a través de roles semánticos, con su similitud definida por las asociaciones de palabras. Abordamos la cuestión de si la estructura del argumento verbal, que refleja la expresión de los acontecimientos, y las asociaciones de palabras, que están relacionadas con la organización de los hablantes del léxico mental, forman similitud entre los verbos de una manera congruente, un tema que no ha sido explorado previamente. Mientras que encontramos correlaciones significativas entre las similitudes de los sentidos verbales obtenidas de estos dos enfoques, nuestros hallazgos también resaltan algunas discrepancias entre ellos y la importancia del grado de abstracción de la anotación del corpus y las representaciones psicolingüísticas.La similitud, que exerceix un paper clau en camps com la ciència cognitiva, la psicolingüística i el processament del llenguatge natural, és un concepte ampli i multifacètic. En aquest treball analitzem com dos enfocaments que pertanyen a diferents perspectives, la visió del corpus i la visió psicolingüística, articulen la semblança entre els sentits verbals en espanyol. Específicament, comparem la similitud entre els sentits verbals basats en la seva estructura argumental, que es capta a través de rols semàntics, amb la seva similitud definida per les associacions de paraules. Abordem la qüestió de si l'estructura de l'argument verbal, que reflecteix l'expressió dels esdeveniments, i les associacions de paraules, que estan relacionades amb l'organització dels parlants del lèxic mental, formen similitud entre els verbs d'una manera congruent, un tema que no ha estat explorat prèviament. Mentre que trobem correlacions significatives entre les similituds dels sentits verbals obtingudes d'aquests dos enfocaments, les nostres troballes també ressalten algunes discrepàncies entre ells i la importància del grau d'abstracció de l'anotació del corpus i les representacions psicolingüístiques
    corecore