2,072 research outputs found

    Exploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction

    Get PDF

    METRICC: Harnessing Comparable Corpora for Multilingual Lexicon Development

    Get PDF
    International audienceResearch on comparable corpora has grown in recent years bringing about the possibility of developing multilingual lexicons through the exploitation of comparable corpora to create corpus-driven multilingual dictionaries. To date, this issue has not been widely addressed. This paper focuses on the use of the mechanism of collocational networks proposed by Williams (1998) for exploiting comparable corpora. The paper first provides a description of the METRICC project, which is aimed at the automatically creation of comparable corpora and describes one of the crawlers developed for comparable corpora building, and then discusses the power of collocational networks for multilingual corpus-driven dictionary development

    In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

    Get PDF
    Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

    Comparability measurement for terminology extraction

    Get PDF
    Proceedings of the Workshop CHAT 2011: Creation, Harmonization and Application of Terminology Resources. Editors: Tatiana Gornostay and Andrejs Vasiļjevs. NEALT Proceedings Series, Vol. 12 (2011), 3-10. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/16956

    The invisibility of the translator in environmental translation

    Full text link
    [ES] La visibilidad del traductor ha sido una cuestión ampliamente debatida en los estudios de traducción a partir de posiciones ideológicas distintas, sobre todo durante el denominado postestructuralismo. A diferencia de otras tipologías como la traducción audiovisual o la literaria, en la especializada son pocos los casos donde aparece su nombre, como demostramos en un trabajo de investigación anterior, en el que, a partir de un corpus ambidireccional en catalán de textos medioambientales, sólo en un 16% de los casos se explicitaba el nombre del traductor (Bracho, 2004, p. 318). En este trabajo, pues, estudiamos una muestra actual, con rasgos similares a la de aquel corpus, para analizar su perfil y determinar cuál es el comportamiento, en este sentido, más de una década después de nuestras conclusiones anteriores.[EN] The question concerning the visibility of the translator has been widely discussed in translation studies from different ideological positions, especially during the so-called post-structuralism period. Unlike other types of translation such as audiovisual or literary translation, in the case of specialized translation the translator¿s name rarely appears, as demonstrated in previous research, in which, from an ambidirectional corpus in Catalan of environmental texts, in only 16% of cases was the translator¿s name made explicit (Bracho, 2004, p.¿318). In the present article, therefore, we study a current sample with similar features to that of the original corpus, with the aim of analyzing its profile and determining the behaviour, in this sense, more than a decade after our previous conclusions.This article has received financial support from research projects FFI2015-68867-P, funded by the Spanish Ministry of Economy and Competitiveness.Bracho Lapiedra, L.; Mac Donald, P. (2017). The invisibility of the translator in environmental translation. Revista Española de Lingüística Aplicada/Spanish Journal of Applied Linguistics. 30(2):440-464. https://doi.org/10.1075/resla.00002.braS44046430

    A framework of analysis for the evaluation of automatic term extractors

    Full text link
    [EN] Following previous research on automatic term extraction, the primary aim of this paper is to propose a more robust and consistent framework of analysis for the comparative evaluation of term extractors. Within the different views for software quality outlined in ISO standards, our proposal focuses on the criterion of external quality and in particular on the characteristics of functionality, usability and efficiency together with the subcharacteristics of suitability, precision, operability and time behavior. The evaluation phase is completed by comparing four online open-access automatic term extractors: TermoStat, GaleXtract, BioTex and DEXTER. This latter resource forms part of the virtual functional laboratory for natural language processing (FUNK Lab) developed by our research group. Furthermore, the results obtained from the comparative analysis are discussed.Financial support for this research has been provided by the Spanish Ministry of Economy, Competitiveness and Science, grant FFI2014-53788-C3-1-P.Periñán-Pascual, C.; Mairal-Usón, R. (2018). A framework of analysis for the evaluation of automatic term extractors. VIAL. Vigo International Journal of Applied Linguistics. 15:105-125. https://doi.org/10.35869/vial.v0i15.88S1051251

    Simplification in specialized and non-specialized discourse: Broadening CBTS to multi-discourse analysis

    Get PDF
    This paper intends to contribute to research on the simplification hypothesis by incorporating a multi-discourse analysis. The study compares non-specialized and academic specialized discourse with the aim of describing their similarities and difference in terms of syntactic and stylistic simplification. Considering two variables (non-specialized/specialized discourse and original/translated texts) allows for examination of which has a greater influence on the tendency towards simplification. According to the adopted corpus-based methodology, four corpora are compiled, including original and translated English texts representing non-specialized and academic discourse. Then, simplification-related features (lexical variety, lexical density, mean sentence length, use of subordination and non-finite sentences) are determined and identified in each corpus. The comparison of the results across different corpora shows signs of simplification in both types of discourse. However, each presents different linguistic features, suggesting that simplification is more related to the type of discourse than to the original or translated nature of the analyzed texts.Este estudio pretende enriquecer la investigación sobre la hipótesis de simplificación analizándola en distintos tipos de discurso. El objetivo principal es identificar semejanzas y diferencias entre discurso no-especializado y especializado académico en cuanto a la tendencia a la simplificación sintáctica y estilística. La introducción de dos variables en el estudio (discurso no-especializado/especializado académico y textos originales/traducidos) permite determinar cuál tiene una mayor influencia en la tendencia a la simplificación. Tras compilar cuatro corpus de textos ingleses que representan el discurso no-especializado y especializado académico en sus versiones originales y traducidas, se determinaron los rasgos lingüísticos relacionados con la simplificación (variedad léxica, densidad léxica, longitud oracional promedio, hipotaxis, oraciones no finitas) y se identificaron en cada conjunto mediante una metodología de corpus. La comparación de los resultados muestra que, si bien ambos tipos de discursos presentan una tendencia hacia la simplificación, cada uno presenta rasgos lingüísticos distintos, lo que sugiere una mayor relación de dicha tendencia con el tipo de discurso que con la naturaleza original o traducida de los textos
    corecore