    A Corpus-Driven Study of Idioms in Croatian

    O računalnim se korpusima od njihove pojave govori kao o korisnim alatima kojima se služe lingvisti, a do danas su gotovo nezaobilazni u suvremenim istraživanjima većine lingvističkih poddisciplina. Jedna je od njih i frazeologija. Među temeljnim obilježjima frazema smatraju se konvencionaliziranost, (ne)promjenjivost i desemantiziranost, a da bi se ta obilježja mogla sustavnije proučavati i stupnjevati, potreban je ne samo veliki broj primjera upotrebe frazema u jeziku, nego i alat s opcijama koje daju različite statističke podatke, poput onih o frekvenciji pojavljivanja i karakterističnosti pojedine višerječnice te najčešćoj okolini, ali i brzo i pregledno dostupan širi kontekst u kojem se frazemi javljaju. U ovome radu istražuju se obilježja i ponašanje frazema u najvećem hrvatskom računalnom korpusu, mrežnome korpusu hrWaC. Riječ je o korpusnim istraživanjima koja se provode u sklopu izrade Frazeološkoga rječnika hrvatskoga jezika. Rezultatima pokazanima u ovome radu cilj je utvrditi osobitosti frazema koje koriste govornici hrvatskoga jezika vezane uz njihovu promjenjivost, značenja i upotrebu, čime će se pridonijeti upotpunjivanju i detaljnijem poznavanju hrvatske frazeologije.Since their advent, electronic corpora have been considered a useful linguistic tool, but today they are virtually a matter of course in research conducted in most linguistic subdisciplines, including phraseology. Idioms, the subject matter of phraseology, are characterized by conventionality, (in)variability and desemantization. Their systematic study requires not only numerous examples of their use, but should also rely on statistical trends, including their frequency, characteristic combinations, the environments where they appear and, ideally, instantaneous access to their wider context. This paper presents a study into the characteristics of idioms by relying on data from the largest Croatian electronic web corpus, hrWaC. The study has been conducted within the project to compile the Dictionary of Croatian Idioms. The aim of the paper is to determine the characteristics of idioms used by speakers of Croatian, particularly their variability, meaning and use, in an attempt to contribute to gaining a deeper and a more complete insight into Croatian phraseology

    «A Baptism of Fire»: Towards a Practical Hybrid Approach for the Lexicographic Indexation of Phraseological Units with Religious Lexical Components in English and Spanish

    Traditionally, researchers have had a particular interest in the study of the relationship between phraseology and lexicography [e.g., Alonso Ramos (2006); Mellado Blanco (2008); Buendía Castro and Faber (2015); Paquot (2015); Nuccorini (2020)] to the point of having labeled it a «scientific marriage» (Leroyer 2006). In addition, scholars have been increasingly interested in the semantic analysis of phraseological units (henceforth PUs) [e.g., Grčić Simeunović and de Santiago (2016) and Torijano and Recio (2019)]. Among the problems that these and several other studies have pointed out, there is the recurrent reference to inaccuracy and difficulty in indexing PUs in lexicographic resources. Although some scholars consider onomasiological approaches as an interesting starting point [e.g., Bosque (2017) and Siepmann (2008)], a systematic methodology in phraseology that includes both the semantical analysis of the entries and their indexation is still needed./nWe intend to address that need here through the analysis of 242 idioms (199 in Spanish and 43 in English) extracted from a 21,045-idiom database that was compiled from two phraseological dictionaries: the Diccionario fraseológico documentado del español actual (henceforth DFDEA) (Seco, Andrés et al., 2004), and the Collins COBUILD Dictionary of Idioms (henceforth CCDOI) (Sinclair and Moon 1997). The criteria employed to select the resulting analysis units were: (i) they had to include at least one lexical component related to religion, and (ii) the idiom had to be nominal or verbal. The religious component was identified semi-automatically by using the UCREL's Semantic Analysis System (USAS) (Archer et al., 2002)./nThe contributions of this paper are as follows: (i) it presents a lexicographic analysis of the macrostructure and microstructure of the two phraseological resources previously mentioned, (ii) it offers a model of semantic analysis for PUs with religion-related components, (iii) it proposes an alternative indexation method of PUs in lexicographic resources involving semasiological and onomasiological approaches; and finally, (iv) it shows a systematic way to use semantic and pragmatic information in order to create semantic entries for PUs./nIn conclusion, by closely examining said set of phraseological entries, this study sheds light on the semantic composition of Pus. It also suggests a systematic hybrid approach for their lexicographic indexation in English and Spanish.Tradicionalmente, los investigadores han demostrado un particular interés por el estudio de la relación entre fraseología y lexicografía [p. ej. Alonso Ramos (2006); Mellado Blanco (2008); Buendía Castro y Faber (2015); Paquot (2015); Nuccorini (2020)] hasta el punto de denominarlo «matrimonio científico» (Leroyer 2006). De igual manera, los académicos se han interesado de manera creciente por el análisis semántico de las unidades fraseológicas (en nuestro texto, PUs, por sus siglas en inglés) [e.g., Grčić Simeunović y de Santiago (2016) y Torijano y Recio (2019)]. Entre los problemas que estos y otros estudios han señalado se encuentra la recurrente referencia a la inexactitud y la dificultad para indexar las PUs en los recursos lexicográficos. Aunque algunos investigadores consideran los enfoques onomasiológicos como un interesante punto de partida [e.g., Bosque (2017) y Siepmann (2008)], sigue siendo necesario establecer una metodología sistemática respecto a la fraseología que incluya tanto el análisis semántico de las entradas como su indexación./nNos proponemos abordar esta necesidad analizando 242 modismos (199 en español y 43 en inglés) extraídos de una base de datos con 21.045 modismos que ha sido compilada a partir de dos diccionarios fraseológicos: el Diccionario fraseológico documentado del español actual (en adelante DFDEA) (Seco, Andrés et al., 2004), y el Collins COBUILD Dictionary of Idioms (en adelante CCDOI) (Sinclair y Moon 1997). Los criterios empleados para seleccionar las unidades resultantes con vistas a su análisis han sido: (i) debían incluir al menos un componente léxico relacionado con la religión, y (ii) el modismo tenía que ser nominal o verbal. El componente religioso fue identificado de manera semi-automática utilizando el UCREL's Semantic Analysis System (USAS) (Archer et al., 2002)./nLas aportaciones de este artículo son las siguientes: (i) presenta un análisis lexicográfico de la macroestructura y la microestructura de los dos recursos fraseológicos antes mencionados, (ii) ofrece un modelo de análisis semántico para las PUs con componentes relacionados con la religión, (iii) propone un método de indexación alternativa de las PUs en recursos lexicográficos que implican enfoques semasiológicos y onomasiológicos; y finalmente, (iv) propone una manera sistemática de utilizar la información semántica y pragmática para crear entradas semánticas para las PUs./nEn conclusión, al examinar dicha serie de entradas fraseológicas, este estudio arroja luz sobre la composición semántica de las PUs. También sugiere un enfoque híbrido sistemático para su indexación lexicográfica en inglés y español

    “The First Swallow”: Avian Metaphors in Nikita Khrushchev’s Political Discourse

    Birds are a rich source for metaphors in paremias that are known to be a significant rhetorical force in various modes of communication. This article deals with the repertoire of ornithological proverbial texts utilized in the Soviet leader’s public speeches and memoirs, as well as in their English translations. The metaphor human is bird, in which there are various grounds of comparison, is explored. The peculiarities of using avian metaphors in the context of the original and the ways of their translation into English are scrutinized as well. The analysis of the material shows that the main features, shared by the Target (human) and the Source (bird species), are grounded on physiological characteristics and behavioral traits, having a negative slant. The equivalent and literal translations are applied as the main methods of rendition. Of particular interest are the metaphorical “animalistic metamorphoses” found in translation

    Estudio basado en corpus de 4-gramas en el artículo científico

    The analysis of phraseology in the specialized discourse of science has sparked researchers’ interest in the last few decades, probably because the use of word groupings in specific registers can provide information about certain typical features of the genre. For instance, Gledhill (2009) explores colligations of tenses in scientific articles and discovers that the present tense is used for qualitative and empirical expressions, while the past tense provides quantitative and research-oriented descriptions; Pérez-Llantada (2014) investigates 4-word lexical bundles in research articles, finding that these multiword combinations express referential meaning and organize the text; finally, Jiménez-Navarro (2019) analyzes adjective + noun collocations in a corpus of scientific papers and concludes that these phraseological units convey specific meanings when used in this genre, since they represent the contents of research articles. The aim of the current study is to contribute to the analysis of 4-grams in the language of science. To this end, two specific objectives are defined: first, to ascertain the structure of 4-grams; second, to analyze the function they perform. The methodology was based on a corpus and entailed five major steps: (1) a specialized corpus of research articles was built, (2) a list of 4-grams was automatically extracted using the software Sketch Engine, (3) the resulting list was manually verified in order to suppress inaccurate candidates, (4) the selected units were classified depending on their structural framework, and (5) the selected units were categorized according to their function in the text. The findings show that, in terms of the first objective, the most typical 4-grams were noun phrases; and as for the second objective, the sequences examined mostly concerned the research conducted and the authorship of the texts. All in all, the 4-grams identified were structures that were specific to the genre under study but could also be used in other domains.El análisis de la fraseología en el discurso especializado de la ciencia ha despertado el interés de los/as investigadores/as en las últimas décadas, probablemente porque el uso de grupos de palabras en registros específicos puede informar de algunas características típicas del género. Por ejemplo, Gledhill (2009) explora las coligaciones de tiempos verbales en artículos científicos y descubre que el tiempo presente se usa para expresiones cualitativas y empíricas, mientras que el tiempo pasado proporciona descripciones cuantitativas y orientadas a la investigación; Pérez-Llantada (2014) investiga grupos léxicos de cuatro palabras en artículos de investigación y descubre que estas combinaciones multilexémicas expresan significado referencial y organizan el texto; finalmente, Jiménez-Navarro (2019) analiza colocaciones de adjetivo + sustantivo en un corpus de artículos científicos y concluye que estas unidades fraseológicas aportan significados específicos cuando se usan en este género, puesto que representan los contenidos del artículo de investigación. El objetivo de este estudio es contribuir al análisis de 4-gramas en el lenguaje de la ciencia. Para ello, se han definido dos objetivos específicos: en primer lugar, establecer la estructura de estas secuencias de palabras; en segundo lugar, analizar su función. La metodología empleada se basó en corpus y conllevó cinco pasos principales: (1) la construcción de un corpus especializado de artículos científicos, (2) la extracción de una lista de 4-gramas de manera automática usando el software Sketch Engine, (3) la verificación manual de esa lista para eliminar candidatos inadecuados, (4) la clasificación de las unidades seleccionadas dependiendo de su estructura, y (5) la categorización de las unidades seleccionadas según su función en el texto. Los resultados muestran que, con respecto al primer objetivo, los 4-gramas más típicos fueron sintagmas nominales; en relación con el segundo objetivo, las secuencias examinadas trataban principalmente con la investigación llevada a cabo y la autoría de los textos. En conjunto, podemos decir que estas estructuras eran específicas del género estudiado, aunque también podrían ser usadas en otros dominios

    Descripción y usabilidad de HARTA, una herramienta de ayuda para la redacción de textos académicos en español

    Aquest article presenta l’eina en línia HARTA (http://www.dicesp.com:8083/), que combina diccionari i corpus, d'acord amb el corrent dels últims anys en lexicografia. HARTA es centra en les combinacions lèxiques acadèmiques (CLA) en espanyol. Les CLA comprenen fenòmens de naturalesa variada: tant col·locacions (confirmar/refutar una hipótesis ‘confirmar/refutar una hipòtesi’) com el que hem englobat sota el terme de fórmules (sin embargo ‘no obstant això’, por otra parte ‘d'altra banda’, como ya hemos señalado ‘com ja hem assenyalat’, etc.). Amb el terme CLA, doncs, ens referim a segments de paraules recurrents en el discurs acadèmic, que poden ser o no composicionals i que poden complir una funció discursiva (‘comparar’, ‘reformular’, ‘expressar certesa o possibilitat’, etc.), com és el cas de les fórmules. Per a la seva descripció, a més de basar-nos en la Teoria Sentit-Text, aportem dades quantitatives del corpus acadèmic del qual hem extret la llista de CLA (freqüència i distribució en diferents camps científics). Una vegada presentada la metodologia amb la qual hem obtingut les dades, descrivim l'arquitectura d’HARTA per mostrar diferents entrades de CLA i les diverses maneres d'accedir a la informació. Abans de finalitzar amb les línies de recerca en curs, oferim un petit estudi experimental sobre la usabilitat de l'eina.This article presents the building-up of HARTA (http://www.dicesp.com:8083/), an online tool that combines dictionary and corpus, in line with the trend of recent years in lexicography. HARTA focuses on academic lexical combinations (ALCs) in Spanish. ALCs cover phenomena of a varied nature: both collocations (confirmar/refutar una hipótesis ‘to confirm/ to refute a hypothesis’) and what we have lumped together under the umbrella term formula (sin embargo ‘however’, por otra parte ‘on the other hand’, como ya hemos señalado ‘as we have already noted’, etc.). By the term of ALC we refer to recurrent word segments in academic discourse, which may or may not be compositional and which may fulfil a discursive function ('to compare', 'to reformulate', 'to express certainty or possibility', etc.), as is the case of formulas. For their description, in addition to relying on Meaning-Text Theory, we provide quantitative data from the academic corpus from which we have extracted the list of ALCs (frequency and distribution in different scientific disciplines). Once we have presented the methodology used to obtain the data, we describe the architecture of HARTA to show different entries for ALCs and the different ways of accessing the information. Before concluding with the lines of research in progress, we offer a short experimental study on the usability of the tool.  Este artículo presenta la herramienta en línea HARTA (http://www.dicesp.com:8083/), que combina diccionario y corpus, acorde con la corriente de los últimos años en lexicografía. HARTA se centra en las combinaciones léxicas académicas (CLA) en español. Las CLA abarcan fenómenos de naturaleza variada: tanto colocaciones (confirmar/refutar una hipótesis) como lo que hemos englobado bajo el término de fórmulas (sin embargo, por otra parte, como ya hemos señalado, etc.). Con el término de CLA, por tanto, nos referimos a segmentos de palabras recurrentes en el discurso académico, que pueden ser o no composicionales y que pueden cumplir una función discursiva (‘comparar’, ‘reformular’, ‘expresar certeza o posibilidad’, etc.), como es el caso de las fórmulas. Para su descripción, además de apoyarnos en la Teoría Sentido-Texto, aportamos datos cuantitativos del corpus académico del que hemos extraído la lista de CLA (frecuencia y distribución en diferentes campos científicos). Una vez presentada la metodología con la que hemos obtenido los datos, describimos la arquitectura de HARTA para mostrar diferentes entradas de CLA y los diversos modos de acceder a la información. Antes de finalizar con las líneas de investigación en curso, ofrecemos un pequeño estudio experimental sobre la usabilidad de la herramienta

    Established phrasal idioms from the field of music in contemporary French

    Tato bakalářská práce se věnuje analýze ustálených slovních spojení ve francouzštině s důrazem na oblast hudby. Teoretická část práce je rozdělena do tří kapitol. První kapitola se zabývá frazeologií. V rámci kapitoly je zahrnut její historický vývoj, interdisciplinární aspekty a terminologie související s touto oblastí. Druhá kapitola se zaměřuje na různé perspektivy a přístupy autorů k frazeologii. Jsou v ní prezentovány odlišné teoretické koncepty autorů, kteří se touto vědní disciplínou zabývají. Třetí kapitola ve stručnosti popisuje propojení hudby s řečí a s lingvistikou. Praktická část této bakalářské práce se soustředí na vymezení a popis vybraných ustálených spojení z oblasti hudby. V rámci této kapitoly jsou zohledněny slovníkové definice vybraných výrazů. Součástí praktické části je také analýza odpovědí frankofonních respondentů, které byly získány prostřednictvím dotazníkového šetření. klíčová slova: frazeologie, frazém, hudební výrazy, idiomatika, idiom, hudba, francouzštinaThis bachelor thesis is devoted to the analysis of established phrases in French with an focus on the field of music. The theoretical part of the thesis is divided into three chapters. The first chapter deals with phraseology and includes its historical development, interdisciplinary aspects and terminology related to this field. The second chapter focuses on different perspectives and approaches of authors to phraseology. It presents different theoretical approaches of selected authors engaged in this discipline. The third chapter briefly describes the connection between music, language and linguistics. The practical part of this bachelor thesis concentrates on the definition and description of selected established phrases in the field of music. Within this chapter, dictionary definitions of the selected terms are included. The practical part also includes an analysis of the responses of the francophone respondents, which were obtained through a questionnaire survey. key words: phraseology, phraseme, musical expressions, idiomatics, idiom, music, FrenchÚstav románských studiíInstitute of Romance StudiesFilozofická fakultaFaculty of Art

    You are driving me up the wall! A corpus-based study of a special class of resultative constructions

    © 2022 The Authors. Published by Université Jean Moulin - Lyon 3. This is an open access article available under a Creative Commons licence. The published version can be accessed at the following link on the publisher’s website: https://doi.org/10.4000/lexis.6343This paper focuses on resultative constructions from a computational and corpus-based approach. We claim that the array of expressions (traditionally classed as idioms, collocations, free word combinations, etc.) that are used to convey a person’s change of mental state (typically negative) are basically instances of the same resultative construction. The first part of the study will introduce basic tenets of Construction Grammar and resultatives. Then, our corpus-based methodology will be spelled out, including a description of the two giga-token corpora used and a detailed account of our protocolised heuristic strategies and tasks. Distributional analysis of matrix slot fillers will be presented next, together with a discussion on restrictions, novel instances, and productivity. A final section will round up our study, with special attention to notions like “idiomaticity”, “productivity” and “variability” of the pairings of form and meaning analysed. To the best of our knowledge, this is one of the first studies based on giga-token corpora that explores idioms as integral parts of higher-order resultative constructions.This paper has been carried out in the framework of several research projects on language technologies (ref. PID2020-112818GB-I00, E3/04/21, UMA-CEIATECH-04).Published onlin

    Multidisciplinary analysis of the phenomenon of phraseological variation in translation and interpreting

    Número especial 6 (2020). Análisis multidisciplinar del fenómeno de la variación en traducción e interpretación / Multidisciplinary analysis of the phenomenon of phraseological variation in translation and interpreting. Pedro Mogorrón Huerta (Ed.)Special Issue 6 (2020). Multidisciplinary analysis of the phenomenon of phraseological variation in translation and interpreting /Análisis multidisciplinar del fenómeno de la variación en traducción e interpretación. Pedro Mogorrón Huerta (Ed.

    Die Verwendung von "ganz" bei der Thematisierung von Emotionen im Korpus "Emigrantendeutsch in Israel: Wiener in Jerusalem"

    The corpus Emigrantendeutsch in Israel: Wiener in Jerusalem includes interviews with so called “Jeckes”, Jewish women and man who were born and grew up in Vienna and left Austria to Mandate Palestine after the Anschluss, mostly without parents.  The interviews cover the biographies of the speakers before, during and after emigration, their childhood and youth in Austria, their anti-Semitic experiences, their emigration experiences, their new beginning and their cultural reorientation. During the interviews participants mainly elaborate on emotions from the past. They sometimes explicitly address them by naming and describing them in the narrative, sometimes they are implicit and can also be detected through paralinguistic features. The present study complements the previous analysis of the emotion vocabulary with findings on a new topic, intensifiers, which has so far received little attention in research. In particular, we will carry out a corpus-based analysis of the intensifier ganz. Focusing on different language levels (mainly semantics, syntax, prosody and other paraverbal features) we will investigate its role in the description and expression of emotions.The corpus Emigrantendeutsch in Israel: Wiener in Jerusalem includes interviews with so called “Jeckes”, Jewish women and man who were born and grew up in Vienna and left Austria to Mandate Palestine after the Anschluss, mostly without parents. The interviews cover the biographies of the speakers before, during and after emigration, their childhood and youth in Austria, their anti-Semitic experiences, their emigration experiences, their new beginning and their cultural reorientation. During the interviews participants mainly elaborate on emotions from the past. They sometimes explicitly address them by naming and describing them in the narrative, sometimes they are implicit and can also be detected through paralinguistic features. The present study complements the previous analysis of the emotion vocabulary with findings on a new topic, intensifiers, which has so far received little attention in research. In particular, we will carry out a corpus-based analysis of the intensifier ganz. Focusing on different language levels (mainly semantics, syntax, prosody and other paraverbal features) we will investigate its role in the description and expression of emotions.The corpus Emigrantendeutsch in Israel: Wiener in Jerusalem includes interviews with so called “Jeckes”, Jewish women and man who were born and grew up in Vienna and left Austria to Mandate Palestine after the Anschluss, mostly without parents. The interviews cover the biographies of the speakers before, during and after emigration, their childhood and youth in Austria, their anti-Semitic experiences, their emigration experiences, their new beginning and their cultural reorientation. During the interviews participants mainly elaborate on emotions from the past. They sometimes explicitly address them by naming and describing them in the narrative, sometimes they are implicit and can also be detected through paralinguistic features. The present study complements the previous analysis of the emotion vocabulary with findings on a new topic, intensifiers, which has so far received little attention in research. In particular, we will carry out a corpus-based analysis of the intensifier ganz. Focusing on different language levels (mainly semantics, syntax, prosody and other paraverbal features) we will investigate its role in the description and expression of emotions