71 research outputs found

    Common Scientific Lexicon for Automatic Discourse Analysis of Scientific and Technical Texts

    Get PDF
    The paper reports on preliminary results of an ongoing research aiming at development of an automatic procedure for recognition of discourse-compositional structure of scientific and technical texts, which is required in many NLP applications. The procedure exploits as discourse markers various domain-independent words and expressions that are specific for scientific and technical texts and organize scientific discourse. The paper discusses features of scientific discourse and common scientific lexicon comprising such words and expressions. Methodological issues of development of a computer dictionary for common scientific lexicon are concerned; basic principles of its organization are described as well. Main steps of the discourse-analyzing procedure based on the dictionary and surface syntactical analysis are pointed out

    Semantic frames and semantic networks in the Health Science Corpus

    Get PDF
    The aim of this paper is to apply frame semantics principles to the analysis of a specialized corpus, the Health Science Corpus, implemented in the lexical data b ase SciE-Lex. Taking FrameNet as the basis for this research, I will assign frame semantic features to Scie-Lex data in order to highlight the shared semantic and syntactic background of the related words in the biomedical register, give motivation to their patterns of collocates and establish frame-based semantic networks of related lexical units.El objetivo de este artículo es aplicar los principios de la semåntica de marcos al anålisis de un corpus especializado, el Health Science Corpus, implementado en la base de datos léxica SciE-Lex. Tomando FrameNet como base para esta investigación, se aplica la semåntica de marcos a los datos de Scie-Lex para destacar los aspectos sintåcticos y semånticos communes de los términos del registro biomédico, motivar sus patrones combinatorios y establecer redes semånticas basadas en marcos

    PLPrepare: A Grammar Checker for Challenging Cases

    Get PDF
    This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications

    Semantic frames and semantic networks in the Health Science Corpus

    Full text link
    [eng] The aim of this paper is to apply frame semantics principles to the analysis of a specialized corpus, the Health Science Corpus, implemented in the lexical database SciE-Lex. Taking FrameNet as the basis for this research, I will assign frame semantic features to Scie-Lex data in order to highlight the shared semantic and syntactic background of the related words in the biomedical register, give motivation to their patterns of collocates and establish frame-based semantic networks of related lexical units.[spa] El objetivo de este artículo es aplicar los principios de la semåntica de marcos al anålisis de un corpus especializado, el Health Science Corpus, implementado en la base de datos léxica SciE-Lex. Tomando FrameNet como base para esta investigación, se aplica la semåntica de marcos a los datos de Scie-Lex para destacar los aspectos sintåcticos y semånticos communes de los términos del registro biomédico, motivar sus patrones combinatorios y establecer redes semånticas basadas en marcos

    Multiword expressions

    Get PDF
    Multiword expressions (MWEs) are a challenge for both the natural language applications and the linguistic theory because they often defy the application of the machinery developed for free combinations where the default is that the meaning of an utterance can be predicted from its structure. There is a rich body of primarily descriptive work on MWEs for many European languages but comparative work is little. The volume brings together MWE experts to explore the benefits of a multilingual perspective on MWEs. The ten contributions in this volume look at MWEs in Bulgarian, English, French, German, Maori, Modern Greek, Romanian, Serbian, and Spanish. They discuss prominent issues in MWE research such as classification of MWEs, their formal grammatical modeling, and the description of individual MWE types from the point of view of different theoretical frameworks, such as Dependency Grammar, Generative Grammar, Head-driven Phrase Structure Grammar, Lexical Functional Grammar, Lexicon Grammar

    Vieraan kielen sanat ja idiomiperiaate

    Get PDF
    This work sets out to examine how second language (L2) users of English acquire, use and process lexical items. For this purpose three types of data were collected from five non-native students of the University of Helsinki. First, each student s drafts of Master s thesis chapters written over a period of time were compiled into a language usage corpus. Second, academic publications a student referred to in her thesis were compiled into a corpus representing her language exposure. Third, several hundreds of words a student used in her thesis were presented to her as stimuli in word association tasks to obtain psycholinguistic data on the representation of the patterns in the mind. Lexical usage patterns, conceived of in accordance with John Sinclair s conceptualisation of lexis and meaning, were then compared to (1) language exposure and (2) word association responses. The results of this triangulation show that, contrary to mainstream thinking in SLA, language production on the idiom principle, i.e. by retrieving holistic patterns glued by syntagmatic association rather than constructing them word by word, is available to L2 users to a much larger degree than is often claimed. More than half of significant multi-word units used by the students also occur in the language they were exposed to. The idiosyncratic multi-word units are often a result of approximation or fixing. Approximation is a process through which a more or less fixed pattern loosens and becomes variable on the semantic or grammatical axis due to frequency effects and the properties of human memory. Fixing, on the other hand, is a reverse process making the wording of the pattern become overly fixed through repeated usage. Neither of the processes damage the meaning communicated in any way. Word association responses also support the main conclusion of the availability of the idiom principle showing that multi-word units used are also represented holistically in the mind and so confirming the continuity between exposure, usage and psycholinguistic representation. Furthermore, they suggest that the model of a unit of meaning developed by Sinclair has psycholinguistic reality as representations of lexical items in the mind seem to mirror the components of a unit of meaning: collocation, colligation and semantic preference. This work offers an in-depth discussion of Sinclair s conceptualisation of meaning and a novel methodology for studying units of meaning in L2 use both quantitatively and qualitatively by triangulating usage, exposure and word association data. It is hoped that the dissertation will be of interest to scholars specialising in second language acquisition and use, English as a lingua franca, phraseological view of language and corpus linguistic methodology.Miksi joskus tuntuu siltÀ, ettei koskaan pysty puhumaan toista kieltÀ virheettömÀsti? TÀmÀ tutkimus osoittaa, ettÀ puhujan sanaston rakenteet ja sanojen kÀyttöprosessit ovat vieraalla kielellÀ hyvin samankaltaisia kuin ÀidinkielessÀ ja kielen muutoksessa. Tarkastelun kohteena on Helsingin yliopiston eri kielitaustaisten opiskelijoiden kÀyttÀmÀ englannin kielen sanasto heidÀn omissa teksteissÀÀn ja sana-assosiaatiotesteissÀ. Tutkimus soveltaa kielen analyysiin sellaista monisanaisen merkitysyksikön mallia, joka mahdollistaa yksikön sisÀisen vaihtelun ja muutoksen havainnoinnin. Tutkimuksessa kehitetyn mallin avulla voi havainnoida sitÀ, miten merkityksen siirtymÀ tapahtuu vapaassa sanayhdistelmÀssÀ niin, ettÀ se kiteytyy uudeksi monisanaiseksi merkitysyksiköksi ja sitÀ, miten tÀmÀ yksikkö jatkaa edelleen vakiintumista ja muuttumista merkitysjatkumoa pitkin jopa idiomiin asti. Merkityksen yksikkö voi myös muuttua taaksepÀin ja vakiintumisen sijaan löystyÀ ilman, ettÀ se kuitenkaan tÀysin hajoaa. TÀtÀ vaihtelua voidaan kognitiivisesti selittÀÀ frekvenssivaikutuksella: mitÀ yleisempi yksikkö on, sitÀ paremmin meillÀ on sen tarkka kÀyttö hallussamme ja kÀÀntÀen: mitÀ harvinaisempi se on, sitÀ todennÀköisempÀÀ on, ettÀ emme tuota sitÀ sanatarkasti. Harvinaisemmat yksiköt tuotetaan todennÀköisemmin likiarvona eli korvaamalla muutama niiden komponentti abstraktimmalla komponentilla. Ilmauksen vakiintumisilmiö on tuttu kaikille, joilla on kokemusta saman tekstin, esimerkiksi saman luennon tai puheen, esittÀmistÀ useampaan kertaan: samat ilmaukset pÀÀdytÀÀn toistamaan melkein samoin sanoin. Ilmauksen likiarvo on taas kysymyksessÀ silloin, kun vaikkapa etsitÀÀn kirjastosta kirjaa, jonka nimestÀ on mielessÀ hieman epÀtarkka muistikuva: oliko se Looking at the Sun vai Gazing at the Sun , kun itse asiassa se on Staring at the Sun . On perusteltua olettaa, ettÀ sama prosessi toimii kun toisen kielen kÀyttÀjÀ lausuu so to say eikÀ so to speak , the hen or the egg eikÀ the chicken or the egg tai to my head eikÀ to my mind , koska muistamme merkityksen paremmin kuin kielellisen ilmiasun. Siksi toisen kielen kÀyttö ei enimmÀkseen ole virheellistÀ vaan ainoastaan hieman epÀmÀÀrÀisempÀÀ, kielen muotojen likiarvoista kÀyttöÀ

    Promoting multiword expressions in A* TAG parsing

    Get PDF
    International audienceMultiword expressions (MWEs) are pervasive in natural languages and often have both idiomatic and compositional readings, which leads to high syntactic ambiguity. We show that for some MWE types idiomatic readings are usually the correct ones. We propose a heuristic for an A* parser for Tree Adjoining Grammars which benefits from this knowledge by promoting MWE-oriented analyses. This strategy leads to a substantial reduction in the parsing search space in case of true positive MWE occurrences, while avoiding parsing failures in case of false positives

    Criteria for the validation of specialized verb equivalents : application in bilingual terminography

    Full text link
    Multilingual terminological resources do not always include valid equivalents of legal terms for two main reasons. Firstly, legal systems can differ from one language community to another and even from one country to another because each has its own history and traditions. As a result, the non-isomorphism between legal and linguistic systems may render the identification of equivalents a particularly challenging task. Secondly, by focusing primarily on the definition of equivalence, a notion widely discussed in translation but not in terminology, the literature does not offer solid and systematic methodologies for assigning terminological equivalents. As a result, there is a lack of criteria to guide both terminologists and translators in the search and validation of equivalent terms. This problem is even more evident in the case of predicative units, such as verbs. Although some terminologists (L‘Homme 1998; Lerat 2002; Lorente 2007) have worked on specialized verbs, terminological equivalence between units that belong to this part of speech would benefit from a thorough study. By proposing a novel methodology to assign the equivalents of specialized verbs, this research aims at defining validation criteria for this kind of predicative units, so as to contribute to a better understanding of the phenomenon of terminological equivalence as well as to the development of multilingual terminography in general, and to the development of legal terminography, in particular. The study uses a Portuguese-English comparable corpus that consists of a single genre of texts, i.e. Supreme Court judgments, from which 100 Portuguese and 100 English specialized verbs were selected. The description of the verbs is based on the theory of Frame Semantics (Fillmore 1976, 1977, 1982, 1985; Fillmore and Atkins 1992), on the FrameNet methodology (Ruppenhofer et al. 2010), as well as on the methodology for compiling specialized lexical resources, such as DiCoInfo (L‘Homme 2008), developed in the Observatoire de linguistique Sens-Texte at the UniversitĂ© de MontrĂ©al. The research reviews contributions that have adopted the same theoretical and methodological framework to the compilation of lexical resources and proposes adaptations to the specific objectives of the project. In contrast to the top-down approach adopted by FrameNet lexicographers, the approach described here is bottom-up, i.e. verbs are first analyzed and then grouped into frames for each language separately. Specialized verbs are said to evoke a semantic frame, a sort of conceptual scenario in which a number of mandatory elements (core Frame Elements) play specific roles (e.g. ARGUER, JUDGE, LAW), but specialized verbs are often accompanied by other optional information (non-core Frame Elements), such as the criteria and reasons used by the judge to reach a decision (statutes, codes, previous decisions). The information concerning the semantic frame that each verb evokes was encoded in an xml editor and about twenty contexts illustrating the specific way each specialized verb evokes a given frame were semantically and syntactically annotated. The labels attributed to each semantic frame (e.g. [Compliance], [Verdict]) were used to group together certain synonyms, antonyms as well as equivalent terms. The research identified 165 pairs of candidate equivalents among the 200 Portuguese and English terms that were grouped together into 76 frames. 71% of the pairs of equivalents were considered full equivalents because not only do the verbs evoke the same conceptual scenario but their actantial structures, the linguistic realizations of the actants and their syntactic patterns were similar. 29% of the pairs of equivalents did not entirely meet these criteria and were considered partial equivalents. Reasons for partial equivalence are provided along with illustrative examples. Finally, the study describes the semasiological and onomasiological entry points that JuriDiCo, the bilingual lexical resource compiled during the project, offers to future users.Les ressources multilingues portant sur le domaine juridique n‘incluent pas toujours dâ€˜Ă©quivalents valides pour deux raisons. D‘abord, les systĂšmes juridiques peuvent diffĂ©rer d‘une communautĂ© linguistique Ă  l‘autre et mĂȘme d‘un pays Ă  l‘autre, car chacun a son histoire et ses traditions. Par consĂ©quent, le phĂ©nomĂšne de la non-isomorphie entre les systĂšmes juridiques et linguistiques rend difficile la tĂąche d‘identification des Ă©quivalents. En deuxiĂšme lieu, en se concentrant surtout sur la dĂ©finition de la notion dâ€˜Ă©quivalence, notion largement dĂ©battue en traductologie, mais non suffisamment en terminologie, la littĂ©rature ne propose pas de mĂ©thodologies solides et systĂ©matiques pour identifier les Ă©quivalents. On assiste donc Ă  une absence de critĂšres pouvant guider tant les terminologues que les traducteurs dans la recherche et la validation des Ă©quivalents des termes. Ce problĂšme est encore plus Ă©vident dans le cas d‘unitĂ©s prĂ©dicatives comme les verbes. Bien que certains terminologues (L'Homme, 1998; Lorente et Bevilacqua 2000; Costa et Silva 2004) aient dĂ©jĂ  travaillĂ© sur les verbes spĂ©cialisĂ©s, lâ€˜Ă©quivalence terminologique, en ce qui concerne ce type d‘unitĂ©s, bĂ©nĂ©ficierait d‘une Ă©tude approfondie. En proposant une mĂ©thodologie originale pour identifier les Ă©quivalents des verbes spĂ©cialisĂ©s, cette recherche consiste donc Ă  dĂ©finir des critĂšres de validation de ce type d‘unitĂ©s prĂ©dicatives afin de mieux comprendre le phĂ©nomĂšne de lâ€˜Ă©quivalence et aussi amĂ©liorer les ressources terminologiques multilingues, en gĂ©nĂ©ral, et les ressources terminologiques multilingues couvrant le domaine juridique, en particulier. Cette Ă©tude utilise un corpus comparable portugais-anglais contenant un seul genre de textes, Ă  savoir les dĂ©cisions des cours suprĂȘmes, Ă  partir duquel 100 verbes spĂ©cialisĂ©s ont Ă©tĂ© sĂ©lectionnĂ©s pour chaque langue. La description des verbes se base sur la thĂ©orie de la sĂ©mantique des cadres (Fillmore 1976, 1977, 1982, 1985; Fillmore and Atkins 1992), sur la mĂ©thodologie de FrameNet (Ruppenhofer et al. 2010), ainsi que sur la mĂ©thodologie dĂ©veloppĂ©e Ă  l‘Observatoire de linguistique Sens-Texte pour compiler des ressources lexicales spĂ©cialisĂ©es, telles que le DiCoInfo (L‘Homme 2008). La recherche examine d‘autres contributions ayant dĂ©jĂ  utilisĂ© ce cadre thĂ©orique et mĂ©thodologique et propose des adaptations objectives du projet. Au lieu de suivre une dĂ©marche descendante comme le font les lexicographes de FrameNet, la dĂ©marche que nous dĂ©crivons est ascendante, c‘est-Ă -dire, pour chaque langue sĂ©parĂ©ment, les verbes sont d‘abord analysĂ©s puis regroupĂ©s par cadres sĂ©mantiques. Dans cette recherche, chacun des verbes « Ă©voque » un cadre ou frame, une sorte de scĂ©nario conceptuel, dans lequel un certain nombre d‘acteurs obligatoires (core Frame Elements) jouent des rĂŽles spĂ©cifiques (le rĂŽle de juge, le rĂŽle d‘appelant, le rĂŽle de la loi). Mis en discours, les termes sont souvent accompagnĂ©s d‘autres renseignements optionnels (non-core Frame Elements) comme ceux des critĂšres utilisĂ©s par le juge pour rendre une dĂ©cision (des lois, des codes, d‘autres dĂ©cisions antĂ©rieures). Tous les renseignements concernant les cadres sĂ©mantiques que chacun des verbes Ă©voque ont Ă©tĂ© encodĂ©s dans un Ă©diteur xml et une vingtaine de contextes illustrant la façon spĂ©cifique dont chacun des verbes Ă©voque un cadre donnĂ© ont Ă©tĂ© annotĂ©s. Les Ă©tiquettes attribuĂ©es Ă  chaque cadre sĂ©mantique (ex. [Compliance], [Verdict]) ont servi Ă  relier certains termes synonymes, certains termes antonymes ainsi que des candidats Ă©quivalents. Parmi les 200 termes portugais et anglais regroupĂ©s en 76 cadres, 165 paires de candidats Ă©quivalents ont Ă©tĂ© identifiĂ©s. 71% des paires dâ€˜Ă©quivalents sont des Ă©quivalents parfaits parce que les verbes Ă©voquent le mĂȘme scĂ©nario conceptuel, leurs structures actancielles sont identiques, les rĂ©alisations linguistiques de chacun des actants sont Ă©quivalentes, et les patrons syntaxiques des verbes sont similaires. 29% des paires dâ€˜Ă©quivalents correspondent Ă  des Ă©quivalents partiels parce qu‘ils ne remplissent pas tous ces critĂšres. Au moyen d‘exemples, lâ€˜Ă©tude illustre tous les cas de figure observĂ©s et termine en prĂ©sentant les diffĂ©rentes façons dont les futurs utilisateurs peuvent consulter le JuriDiCo, la ressource lexicale qui a Ă©tĂ© compilĂ©e pendant ce projet
    • 

    corecore