19,881 research outputs found

    Comparing human and automatic thesaurus mapping approaches in the agricultural domain

    Get PDF
    Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question "What are the pros and cons of human and automatic mapping and how can they complement each other?" By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.Comment: 10 pages, Int'l Conf. on Dublin Core and Metadata Applications 200

    Future trends in translation memory

    Get PDF
    Aquest article revisa alguns dels avenços més recents en el camp de la tecnologia de memòries de traducció, i com una aproximació des de la lingüística de corpus es podria aplicar per tal d'ampliar-los i fer-los més atractius. L'article també explora com la natura de la indústria de la traducció pot afectar que les noves tecnologies siguin, o no, adoptades de manera generalitzada.Este artículo repasa algunos de los avances más recientes en el campo de la tecnología de memorias de traducción, y analiza cómo se podría aplicar un enfoque desde la lingüística de corpus para ampliarlos y hacerlos más atractivos. El artículo también explora cómo la naturaleza de la industria de la traducción puede afectar a que las nuevas tecnologías sean, o no, adoptadas de forma generalizada.This article looks at some of the latest advances in translation memory technology and how a corpus-linguistic approach could be applied to further extend them in order to make them more appealing. It also explores how the nature of the translation industry can affect whether new technologies are widely adopted or not

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    FRASIMED: a Clinical French Annotated Resource Produced through Crosslingual BERT-Based Annotation Projection

    Full text link
    Natural language processing (NLP) applications such as named entity recognition (NER) for low-resource corpora do not benefit from recent advances in the development of large language models (LLMs) where there is still a need for larger annotated datasets. This research article introduces a methodology for generating translated versions of annotated datasets through crosslingual annotation projection. Leveraging a language agnostic BERT-based approach, it is an efficient solution to increase low-resource corpora with few human efforts and by only using already available open data resources. Quantitative and qualitative evaluations are often lacking when it comes to evaluating the quality and effectiveness of semi-automatic data generation strategies. The evaluation of our crosslingual annotation projection approach showed both effectiveness and high accuracy in the resulting dataset. As a practical application of this methodology, we present the creation of French Annotated Resource with Semantic Information for Medical Entities Detection (FRASIMED), an annotated corpus comprising 2'051 synthetic clinical cases in French. The corpus is now available for researchers and practitioners to develop and refine French natural language processing (NLP) applications in the clinical field (https://zenodo.org/record/8355629), making it the largest open annotated corpus with linked medical concepts in French

    Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation

    Get PDF
    This paper surveys the current state of the art in Natural Language Generation (NLG), defined as the task of generating text or speech from non-linguistic input. A survey of NLG is timely in view of the changes that the field has undergone over the past decade or so, especially in relation to new (usually data-driven) methods, as well as new applications of NLG technology. This survey therefore aims to (a) give an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organised; (b) highlight a number of relatively recent research topics that have arisen partly as a result of growing synergies between NLG and other areas of artificial intelligence; (c) draw attention to the challenges in NLG evaluation, relating them to similar challenges faced in other areas of Natural Language Processing, with an emphasis on different evaluation methods and the relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118 pages, 8 figures, 1 tabl

    Benchmarking the performance of two automated term-extraction systems : LOGOS and ATAO

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal.Pour consulter le document d'accompagnement du mémoire, veuillez contacter le Centre de conservation Lionel-Groulx de l'Université de Montréal ([email protected])

    Post-editing machine translated text in a commercial setting: Observation and statistical analysis

    Get PDF
    Machine translation systems, when they are used in a commercial context for publishing purposes, are usually used in combination with human post-editing. Thus understanding human post-editing behaviour is crucial in order to maximise the benefit of machine translation systems. Though there have been a number of studies carried out on human post-editing to date, there is a lack of large-scale studies on post-editing in industrial contexts which focus on the activity in real-life settings. This study observes professional Japanese post-editors’ work and examines the effect of the amount of editing made during post-editing, source text characteristics, and post-editing behaviour, on the amount of post-editing effort. A mixed method approach was employed to both quantitatively and qualitatively analyse the data and gain detailed insights into the post-editing activity from various view points. The results indicate that a number of factors, such as sentence structure, document component types, use of product specific terms, and post-editing patterns and behaviour, have effect on the amount of post-editing effort in an intertwined manner. The findings will contribute to a better utilisation of machine translation systems in the industry as well as the development of the skills and strategies of post-editors

    WikiTrans: The English Wikipedia in Esperanto

    Get PDF
    Proceedings of the NODALIDA 2011 Workshop Constraint Grammar Applications. Editors: Eckhard Bick, Kristin Hagen, Kaili Müürisep, Trond Trosterud. NEALT Proceedings Series, Vol. 14 (2011), 8–16. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/19231

    Language data and project specialist: A new modular profile for graduates in language-related disciplines. UPSKILLS Intellectual output 1.6

    Get PDF
    The UPSKILLS needs analysis explored the current academic offer in language- and linguistics-related fields (modern languages and cultures, translation, general linguistics, etc.) and the requirements the job market has for graduates in these areas. The analysis highlighted the need for a new skill set and a new mind frame to meet the demands as well as the professional challenges of the industry. Taking into consideration the results of the individual components of the needs analysis, this final report outlines a new professional profile, that of the language data and project specialist, and includes a detailed description of the knowledge, skills and competences that present-day and future graduates in languages and linguistics should obtain to improve their employability in the digital business sector
    corecore