874 research outputs found

    Designing a Russian Idiom-Annotated Corpus

    Get PDF
    This paper describes the development of an idiom-annotated corpus of Russian. The corpus is compiled from freely available resources online and contains texts of different genres. The idiom extraction, annotation procedure, and a pilot experiment using the new corpus are outlined in the paper. Considering the scarcity of publicly available Russian annotated corpora, the corpus is a much-needed resource that can be utilized for literary and linguistic studies, pedagogy as well as for various Natural Language Processing tasks

    Corpus-oriented lexicographic database for Beserman Udmurt

    Get PDF
    Beserman Udmurt documentation project is a long-term undertaking aimed primarily at collecting lexicographic and corpus data in the field. During our work on the project, we developed a pipeline for collecting, annotating and publishing our data. In this paper, we describe this pipeline and present the online web interface we developed for providing public access to Beserman materials. We use TLex lexicographic software for working on the dictionary and Fieldworks FLEX for annotating the corpus. After the data have been annotated, they are exported to XML and stored in the online web interface, where these two types of data become interconnected and searchable. We propose solutions to challenges that arise in projects of such kind and reflect on various constraints imposed on lexicographic databases being developed in long-term projects aimed at description of underresourced languages. We suggest that the proposed pipeline and the web interface we developed could be employed by similar projects dealing with other minority languages. The web interface based on the database and a corpus of oral Beserman texts is available online at beserman.ru

    El tratamiento y la representación de las colocaciones verbales en el lenguaje especializado del turismo de aventura

    Get PDF
    A collocation is considered a frequent co-occurrence of two words which hold a syntactic relationship and whose elements enjoy a different status. Given their perception as a unit in language, access to the prominent word (base) involves immediate access to the other item (collocate). In terms of meaning, some combinations tend to be more transparent than others. The pervasiveness of these word associations in language has sparked a strong research interest in the last decades. A compelling reason for this approach may be the fact that they are naturally produced by native speakers but must be actively learned by non-native individuals. Not only has this reality led to their treatment in the general language, but it has also become a legitimate field of study in a wide range of specialized languages, such as the environment, computing, law or tourism, which is our object of study. As a consequence, specialized knowledge resources covering this type of word combinations have seen the light with the primary purpose of offering some extra help to people who deal with this type of language, for example, translators, linguists or other professionals. Nevertheless, there is still much to do in this respect. Taken this into account, it is hypothesized that verb collocations in the specialized language of adventure tourism convey specialized meaning that is worth being collected in terminological products. Therefore, this work endeavors, as its main purpose, to perform a deep analysis of verb collocations in this specialized domain and their implementation in the entries for motion verbs in DicoAdventure, a specialized dictionary of adventure tourism, whose inspirational idea was to highlight the significant role of verbs in the linguistic expression of concepts. Accordingly, the following theoretical objectives were set: first, to cover the linguistic branches which influence specialized lexicography; second, to define the concept of specialized collocation; and third, to examine a vast number of lexicographical and terminological resources so as to discover the items of information that would make an adequate representation of collocations in a specialized dictionary and, then, design a model for such task. Furthermore, the following practical objectives were formulated: first, to extract the motion verbs which would be the bases of the collocations implemented; second, to retrieve the lexical collocations of these verbs; and third, to classify the resulting list of collocations according to the meaning expressed, that is, actual motion or fictive (or metaphorical) motion. The practical steps taken in this research were based on the English monolingual specialized corpus ADVENCOR, which contains promotional texts about adventure tourism, and the use of corpus management software. The results of the theoretical work can be summarized as follows: (1) the specialized language of adventure tourism must be considered as specialized as any others; (2) collocations are not usually encoded in verb entries in dictionaries; and (3) a specialized collocation carries specialized knowledge which must be covered in terminological products. On the other hand, regarding the practical work, 12% of the verbs extracted were selected, as they were the ones expressing motion. However, only 46.61% of them produced collocations according to the extraction criteria established. Last, after applying more strict criteria for the collocation classification, only 25.42% of the verbs along with their collocations were collected in the dictionary. In addition to these results, the theory of Frame Semantics proved useful to understand the meaning of the verbs and their collocates. As for their implementation, which was the primary objective of this doctoral dissertation, the inclusion of verb collocations was of paramount importance for the identification of distinct meanings expressed by one verb in different contexts, as collocates conveyed subtle nuances of meaning. Finally, it was concluded that the incorporation of explanations about the combinations in lay terms facilitates the comprehension of the entries to any type of user, from experts to laypersons, which makes DicoAdventure a terminological product that can render valuable assistance to individuals with distinct specialized expertise.Una colocación es una coaparición frecuente de dos palabras que mantienen una relación sintáctica y cuyos elementos alcanzan un estatus diferente. Puesto que se perciben como una unidad del lenguaje, el acceso al elemento prominente (base) conlleva el acceso inmediato al otro componente (colocativo). Con respecto a su significado, algunas combinaciones tienden a ser más transparentes que otras. La constante presencia de las colocaciones en el lenguaje ha despertado gran interés por su investigación en las últimas décadas. Una razón convincente de este acercamiento podría ser el hecho de que los hablantes nativos las producen de forma natural, mientras que los no nativos deben aprenderlas de manera activa. Esta realidad no solo ha llevado a su tratamiento en el lenguaje general, sino también a que se hayan convertido en un campo de estudio legítimo en una amplia gama de lenguajes especializados, como son el medio ambiente, la informática, el derecho o el turismo, que es el objeto de estudio de esta investigación. Como consecuencia, se han creado recursos de conocimiento especializado con el propósito fundamental de ofrecer ayuda a las personas que interactúan con este tipo de lenguaje, por ejemplo, traductores, lingüistas u otro tipo de profesionales. No obstante, aún queda mucho por hacer en este aspecto. Teniendo esto en cuenta, la hipótesis de este trabajo se basa en la idea de que las colocaciones verbales en el lenguaje especializado del turismo de aventura expresan significados especializados que merecen ser recopilados en productos terminológicos. Por lo tanto, este trabajo tiene como principal objetivo el estudio exhaustivo de las colocaciones verbales en este campo de especialidad y su implementación en las entradas de los verbos de movimiento en DicoAdventure, un diccionario especializado del turismo de aventura, cuyo punto de partida fue la intención de destacar el importante papel que juegan los verbos en la expresión lingüística de los conceptos. Por consiguiente, se establecieron los siguientes objetivos teóricos: primero, revisar las ramas de la lingüística que ejercen una influencia en la lexicografía especializada; segundo, definir el concepto de colocación especializada; y tercero, examinar un gran número de recursos lexicográficos y terminológicos para descubrir qué tipo de información conformaría una representación adecuada de colocaciones en un diccionario especializado y, a continuación, diseñar un modelo para esta tarea. Además, se propusieron estos objetivos prácticos: primero, extraer los verbos de movimiento que serían las bases de las colocaciones implementadas; segundo, extraer las colocaciones léxicas de estos verbos; y tercero; clasificar la lista resultante de colocaciones según su significado, es decir, movimiento real o movimiento figurado (o metafórico). Los pasos prácticos que se dieron en esta investigación se llevaron a cabo mediante la gestión del corpus especializado monolingüe en inglés ADVENCOR, que contiene textos promocionales sobre el turismo de aventura, y el uso de software de gestión de corpus. Los resultados de la parte teórica del trabajo se pueden resumir de la siguiente manera: (1) el lenguaje especializado del turismo de aventura debe considerarse tan especializado como otros; (2) las colocaciones no suelen codificarse en las entradas de verbos en los diccionarios; y (3) una colocación especializada contiene conocimiento especializado que debe aparecer en productos terminológicos. Por otro lado, con respecto al trabajo práctico, se seleccionó el 12% de los verbos extraídos, ya que eran los que expresaban movimiento. Sin embargo, solo el 46,61% de ellos produjeron colocaciones según los criterios de extracción establecidos. Por último, después de aplicar criterios más estrictos para la clasificación de las colocaciones, solo el 25,42% de los verbos con sus colocaciones fueron recogidos en el diccionario. Además de estos resultados, se demostró la utilidad de la teoría de la Semántica de Marcos para entender el significado de los verbos y sus colocativos. En cuanto a su implementación, que era el objetivo principal de esta tesis doctoral, la inclusión de colocaciones verbales fue de suma importancia para la identificación de los distintos significados expresados por un verbo en diferentes contextos, puesto que los colocativos aportaban sutiles matices de significado. Finalmente, se concluyó que la incorporación de explicaciones sobre las combinaciones en términos legos favorece la comprensión de las entradas por parte de cualquier tipo de usuario, desde expertos a personas no especialistas, lo cual hace de DicoAdventure un producto terminológico que puede proporcionar valiosa ayuda a personas con diversa formación especializada

    Formulaic language

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    Theories and methods

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    Current trends

    Get PDF
    Deep parsing is the fundamental process aiming at the representation of the syntactic structure of phrases and sentences. In the traditional methodology this process is based on lexicons and grammars representing roughly properties of words and interactions of words and structures in sentences. Several linguistic frameworks, such as Headdriven Phrase Structure Grammar (HPSG), Lexical Functional Grammar (LFG), Tree Adjoining Grammar (TAG), Combinatory Categorial Grammar (CCG), etc., offer different structures and combining operations for building grammar rules. These already contain mechanisms for expressing properties of Multiword Expressions (MWE), which, however, need improvement in how they account for idiosyncrasies of MWEs on the one hand and their similarities to regular structures on the other hand. This collaborative book constitutes a survey on various attempts at representing and parsing MWEs in the context of linguistic theories and applications

    Representation and parsing of multiword expressions

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches

    Clitics in the wild

    Get PDF
    This collective monograph is the first data-oriented, empirical in-depth study of the system of clitics on Bosnian, Croatian and Serbian. It fills the gap between the theoretical and normative literature by including solid data on variation found in dialects and spoken language and obtained from massive Web Corpora and speakers’ acceptability judgements. The authors investigate three primary sources of variation: inventory, placement and morphonological processes. A separate part of the book is dedicated to the phenomenon of clitic climbing, the major challenge for any syntactic theory. The theory of complexity serves as the explanation for the very diverse constraints on clitic climbing established in the empirical studies. It allows to construct a series of hierarchies where the factors relevant for predicting clitic climbing interact with each other. Thus, the study pushes our understanding of clitics away from fine-grained descriptions and syntactic generalisations towards a probabilistic modelling of syntax
    corecore