Search CORE

71 research outputs found

Common Scientific Lexicon for Automatic Discourse Analysis of Scientific and Technical Texts

Author: Bolshakova Elena
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

The paper reports on preliminary results of an ongoing research aiming at development of an automatic procedure for recognition of discourse-compositional structure of scientific and technical texts, which is required in many NLP applications. The procedure exploits as discourse markers various domain-independent words and expressions that are specific for scientific and technical texts and organize scientific discourse. The paper discusses features of scientific discourse and common scientific lexicon comprising such words and expressions. Methodological issues of development of a computer dictionary for common scientific lexicon are concerned; basic principles of its organization are described as well. Main steps of the discourse-analyzing procedure based on the dictionary and surface syntactical analysis are pointed out

Bulgarian Digital Mathematics Library at IMI-BAS

Semantic frames and semantic networks in the Health Science Corpus

Author: Verdaguer Clavera Isabel
Publication venue
Publication date: 01/01/2020
Field of study

The aim of this paper is to apply frame semantics principles to the analysis of a specialized corpus, the Health Science Corpus, implemented in the lexical data b ase SciE-Lex. Taking FrameNet as the basis for this research, I will assign frame semantic features to Scie-Lex data in order to highlight the shared semantic and syntactic background of the related words in the biomedical register, give motivation to their patterns of collocates and establish frame-based semantic networks of related lexical units.El objetivo de este artículo es aplicar los principios de la semántica de marcos al análisis de un corpus especializado, el Health Science Corpus, implementado en la base de datos léxica SciE-Lex. Tomando FrameNet como base para esta investigación, se aplica la semántica de marcos a los datos de Scie-Lex para destacar los aspectos sintácticos y semánticos communes de los términos del registro biomédico, motivar sus patrones combinatorios y establecer redes semánticas basadas en marcos

Diposit Digital de Documents de la UAB

BOP Serials

PLPrepare: A Grammar Checker for Challenging Cases

Author: Hoyos Jacob
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2021
Field of study

This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications

East Tennessee State University

Semantic frames and semantic networks in the Health Science Corpus

Author: Verdaguer Isabel
Publication venue: 'University of Bern'
Publication date: 14/02/2023
Field of study

[eng] The aim of this paper is to apply frame semantics principles to the analysis of a specialized corpus, the Health Science Corpus, implemented in the lexical database SciE-Lex. Taking FrameNet as the basis for this research, I will assign frame semantic features to Scie-Lex data in order to highlight the shared semantic and syntactic background of the related words in the biomedical register, give motivation to their patterns of collocates and establish frame-based semantic networks of related lexical units.[spa] El objetivo de este artículo es aplicar los principios de la semántica de marcos al análisis de un corpus especializado, el Health Science Corpus, implementado en la base de datos léxica SciE-Lex. Tomando FrameNet como base para esta investigación, se aplica la semántica de marcos a los datos de Scie-Lex para destacar los aspectos sintácticos y semánticos communes de los términos del registro biomédico, motivar sus patrones combinatorios y establecer redes semánticas basadas en marcos

Diposit Digital de la Universitat de Barcelona

Multiword expressions

Author
Publication venue
Publication date
Field of study

Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar

OAPEN Library

Vieraan kielen sanat ja idiomiperiaate

Author: Vetchinnikova Svetlana
Publication venue: 'University of Helsinki Libraries'
Publication date: 29/08/2014
Field of study

This work sets out to examine how second language (L2) users of English acquire, use and process lexical items. For this purpose three types of data were collected from five non-native students of the University of Helsinki. First, each student s drafts of Master s thesis chapters written over a period of time were compiled into a language usage corpus. Second, academic publications a student referred to in her thesis were compiled into a corpus representing her language exposure. Third, several hundreds of words a student used in her thesis were presented to her as stimuli in word association tasks to obtain psycholinguistic data on the representation of the patterns in the mind. Lexical usage patterns, conceived of in accordance with John Sinclair s conceptualisation of lexis and meaning, were then compared to (1) language exposure and (2) word association responses. The results of this triangulation show that, contrary to mainstream thinking in SLA, language production on the idiom principle, i.e. by retrieving holistic patterns glued by syntagmatic association rather than constructing them word by word, is available to L2 users to a much larger degree than is often claimed. More than half of significant multi-word units used by the students also occur in the language they were exposed to. The idiosyncratic multi-word units are often a result of approximation or fixing. Approximation is a process through which a more or less fixed pattern loosens and becomes variable on the semantic or grammatical axis due to frequency effects and the properties of human memory. Fixing, on the other hand, is a reverse process making the wording of the pattern become overly fixed through repeated usage. Neither of the processes damage the meaning communicated in any way. Word association responses also support the main conclusion of the availability of the idiom principle showing that multi-word units used are also represented holistically in the mind and so confirming the continuity between exposure, usage and psycholinguistic representation. Furthermore, they suggest that the model of a unit of meaning developed by Sinclair has psycholinguistic reality as representations of lexical items in the mind seem to mirror the components of a unit of meaning: collocation, colligation and semantic preference. This work offers an in-depth discussion of Sinclair s conceptualisation of meaning and a novel methodology for studying units of meaning in L2 use both quantitatively and qualitatively by triangulating usage, exposure and word association data. It is hoped that the dissertation will be of interest to scholars specialising in second language acquisition and use, English as a lingua franca, phraseological view of language and corpus linguistic methodology.Miksi joskus tuntuu siltä, ettei koskaan pysty puhumaan toista kieltä virheettömästi? Tämä tutkimus osoittaa, että puhujan sanaston rakenteet ja sanojen käyttöprosessit ovat vieraalla kielellä hyvin samankaltaisia kuin äidinkielessä ja kielen muutoksessa. Tarkastelun kohteena on Helsingin yliopiston eri kielitaustaisten opiskelijoiden käyttämä englannin kielen sanasto heidän omissa teksteissään ja sana-assosiaatiotesteissä. Tutkimus soveltaa kielen analyysiin sellaista monisanaisen merkitysyksikön mallia, joka mahdollistaa yksikön sisäisen vaihtelun ja muutoksen havainnoinnin. Tutkimuksessa kehitetyn mallin avulla voi havainnoida sitä, miten merkityksen siirtymä tapahtuu vapaassa sanayhdistelmässä niin, että se kiteytyy uudeksi monisanaiseksi merkitysyksiköksi ja sitä, miten tämä yksikkö jatkaa edelleen vakiintumista ja muuttumista merkitysjatkumoa pitkin jopa idiomiin asti. Merkityksen yksikkö voi myös muuttua taaksepäin ja vakiintumisen sijaan löystyä ilman, että se kuitenkaan täysin hajoaa. Tätä vaihtelua voidaan kognitiivisesti selittää frekvenssivaikutuksella: mitä yleisempi yksikkö on, sitä paremmin meillä on sen tarkka käyttö hallussamme ja kääntäen: mitä harvinaisempi se on, sitä todennäköisempää on, että emme tuota sitä sanatarkasti. Harvinaisemmat yksiköt tuotetaan todennäköisemmin likiarvona eli korvaamalla muutama niiden komponentti abstraktimmalla komponentilla. Ilmauksen vakiintumisilmiö on tuttu kaikille, joilla on kokemusta saman tekstin, esimerkiksi saman luennon tai puheen, esittämistä useampaan kertaan: samat ilmaukset päädytään toistamaan melkein samoin sanoin. Ilmauksen likiarvo on taas kysymyksessä silloin, kun vaikkapa etsitään kirjastosta kirjaa, jonka nimestä on mielessä hieman epätarkka muistikuva: oliko se Looking at the Sun vai Gazing at the Sun , kun itse asiassa se on Staring at the Sun . On perusteltua olettaa, että sama prosessi toimii kun toisen kielen käyttäjä lausuu so to say eikä so to speak , the hen or the egg eikä the chicken or the egg tai to my head eikä to my mind , koska muistamme merkityksen paremmin kuin kielellisen ilmiasun. Siksi toisen kielen käyttö ei enimmäkseen ole virheellistä vaan ainoastaan hieman epämääräisempää, kielen muotojen likiarvoista käyttöä

Helsingin yliopiston digitaalinen arkisto

Promoting multiword expressions in A* TAG parsing

Author: Parmentier Yannick
Savary Agata
Waszczuk Jakub
Publication venue: HAL CCSD
Publication date: 13/12/2016
Field of study

International audienceMultiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives

HAL Université de Tours

Proceedings of the LREC workshop on partial parsing : between chunk parsing and deep parsing

Author: Kübler Sandra
Piskorski Jakub
Przepiorkowski Adam
Publication venue
Publication date: 03/11/2008
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

Criteria for the validation of specialized verb equivalents : application in bilingual terminography

Author: Pimentel Janine
Publication venue
Publication date: 01/05/2012
Field of study

Multilingual terminological resources do not always include valid equivalents of legal terms for two main reasons. Firstly, legal systems can differ from one language community to another and even from one country to another because each has its own history and traditions. As a result, the non-isomorphism between legal and linguistic systems may render the identification of equivalents a particularly challenging task. Secondly, by focusing primarily on the definition of equivalence, a notion widely discussed in translation but not in terminology, the literature does not offer solid and systematic methodologies for assigning terminological equivalents. As a result, there is a lack of criteria to guide both terminologists and translators in the search and validation of equivalent terms. This problem is even more evident in the case of predicative units, such as verbs. Although some terminologists (L‘Homme 1998; Lerat 2002; Lorente 2007) have worked on specialized verbs, terminological equivalence between units that belong to this part of speech would benefit from a thorough study. By proposing a novel methodology to assign the equivalents of specialized verbs, this research aims at defining validation criteria for this kind of predicative units, so as to contribute to a better understanding of the phenomenon of terminological equivalence as well as to the development of multilingual terminography in general, and to the development of legal terminography, in particular. The study uses a Portuguese-English comparable corpus that consists of a single genre of texts, i.e. Supreme Court judgments, from which 100 Portuguese and 100 English specialized verbs were selected. The description of the verbs is based on the theory of Frame Semantics (Fillmore 1976, 1977, 1982, 1985; Fillmore and Atkins 1992), on the FrameNet methodology (Ruppenhofer et al. 2010), as well as on the methodology for compiling specialized lexical resources, such as DiCoInfo (L‘Homme 2008), developed in the Observatoire de linguistique Sens-Texte at the Université de Montréal. The research reviews contributions that have adopted the same theoretical and methodological framework to the compilation of lexical resources and proposes adaptations to the specific objectives of the project. In contrast to the top-down approach adopted by FrameNet lexicographers, the approach described here is bottom-up, i.e. verbs are first analyzed and then grouped into frames for each language separately. Specialized verbs are said to evoke a semantic frame, a sort of conceptual scenario in which a number of mandatory elements (core Frame Elements) play specific roles (e.g. ARGUER, JUDGE, LAW), but specialized verbs are often accompanied by other optional information (non-core Frame Elements), such as the criteria and reasons used by the judge to reach a decision (statutes, codes, previous decisions). The information concerning the semantic frame that each verb evokes was encoded in an xml editor and about twenty contexts illustrating the specific way each specialized verb evokes a given frame were semantically and syntactically annotated. The labels attributed to each semantic frame (e.g. [Compliance], [Verdict]) were used to group together certain synonyms, antonyms as well as equivalent terms. The research identified 165 pairs of candidate equivalents among the 200 Portuguese and English terms that were grouped together into 76 frames. 71% of the pairs of equivalents were considered full equivalents because not only do the verbs evoke the same conceptual scenario but their actantial structures, the linguistic realizations of the actants and their syntactic patterns were similar. 29% of the pairs of equivalents did not entirely meet these criteria and were considered partial equivalents. Reasons for partial equivalence are provided along with illustrative examples. Finally, the study describes the semasiological and onomasiological entry points that JuriDiCo, the bilingual lexical resource compiled during the project, offers to future users.Les ressources multilingues portant sur le domaine juridique n‘incluent pas toujours d‘équivalents valides pour deux raisons. D‘abord, les systèmes juridiques peuvent différer d‘une communauté linguistique à l‘autre et même d‘un pays à l‘autre, car chacun a son histoire et ses traditions. Par conséquent, le phénomène de la non-isomorphie entre les systèmes juridiques et linguistiques rend difficile la tâche d‘identification des équivalents. En deuxième lieu, en se concentrant surtout sur la définition de la notion d‘équivalence, notion largement débattue en traductologie, mais non suffisamment en terminologie, la littérature ne propose pas de méthodologies solides et systématiques pour identifier les équivalents. On assiste donc à une absence de critères pouvant guider tant les terminologues que les traducteurs dans la recherche et la validation des équivalents des termes. Ce problème est encore plus évident dans le cas d‘unités prédicatives comme les verbes. Bien que certains terminologues (L'Homme, 1998; Lorente et Bevilacqua 2000; Costa et Silva 2004) aient déjà travaillé sur les verbes spécialisés, l‘équivalence terminologique, en ce qui concerne ce type d‘unités, bénéficierait d‘une étude approfondie. En proposant une méthodologie originale pour identifier les équivalents des verbes spécialisés, cette recherche consiste donc à définir des critères de validation de ce type d‘unités prédicatives afin de mieux comprendre le phénomène de l‘équivalence et aussi améliorer les ressources terminologiques multilingues, en général, et les ressources terminologiques multilingues couvrant le domaine juridique, en particulier. Cette étude utilise un corpus comparable portugais-anglais contenant un seul genre de textes, à savoir les décisions des cours suprêmes, à partir duquel 100 verbes spécialisés ont été sélectionnés pour chaque langue. La description des verbes se base sur la théorie de la sémantique des cadres (Fillmore 1976, 1977, 1982, 1985; Fillmore and Atkins 1992), sur la méthodologie de FrameNet (Ruppenhofer et al. 2010), ainsi que sur la méthodologie développée à l‘Observatoire de linguistique Sens-Texte pour compiler des ressources lexicales spécialisées, telles que le DiCoInfo (L‘Homme 2008). La recherche examine d‘autres contributions ayant déjà utilisé ce cadre théorique et méthodologique et propose des adaptations objectives du projet. Au lieu de suivre une démarche descendante comme le font les lexicographes de FrameNet, la démarche que nous décrivons est ascendante, c‘est-à-dire, pour chaque langue séparément, les verbes sont d‘abord analysés puis regroupés par cadres sémantiques. Dans cette recherche, chacun des verbes « évoque » un cadre ou frame, une sorte de scénario conceptuel, dans lequel un certain nombre d‘acteurs obligatoires (core Frame Elements) jouent des rôles spécifiques (le rôle de juge, le rôle d‘appelant, le rôle de la loi). Mis en discours, les termes sont souvent accompagnés d‘autres renseignements optionnels (non-core Frame Elements) comme ceux des critères utilisés par le juge pour rendre une décision (des lois, des codes, d‘autres décisions antérieures). Tous les renseignements concernant les cadres sémantiques que chacun des verbes évoque ont été encodés dans un éditeur xml et une vingtaine de contextes illustrant la façon spécifique dont chacun des verbes évoque un cadre donné ont été annotés. Les étiquettes attribuées à chaque cadre sémantique (ex. [Compliance], [Verdict]) ont servi à relier certains termes synonymes, certains termes antonymes ainsi que des candidats équivalents. Parmi les 200 termes portugais et anglais regroupés en 76 cadres, 165 paires de candidats équivalents ont été identifiés. 71% des paires d‘équivalents sont des équivalents parfaits parce que les verbes évoquent le même scénario conceptuel, leurs structures actancielles sont identiques, les réalisations linguistiques de chacun des actants sont équivalentes, et les patrons syntaxiques des verbes sont similaires. 29% des paires d‘équivalents correspondent à des équivalents partiels parce qu‘ils ne remplissent pas tous ces critères. Au moyen d‘exemples, l‘étude illustre tous les cas de figure observés et termine en présentant les différentes façons dont les futurs utilisateurs peuvent consulter le JuriDiCo, la ressource lexicale qui a été compilée pendant ce projet

Dépôt Institutionnel Numérique