Search CORE

5 research outputs found

Incorporating Human Translator Style into English-Turkish Literary Machine Translation

Author: Dallı Harun
Dursun Olgun
Güngör Tunga
Gürses Sabri
Hodzik Ena
Yirmibeşoğlu Zeynep
Şahin Mehmet
Publication venue
Publication date: 21/07/2023
Field of study

Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator

arXiv.org e-Print Archive

Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

Author: Aduriz Itziar
Antoine Jean-Yves
Barbu Mititelu Verginica
Berk Gozde
Bhatia Archna
Candito Marie
Carlino Carola
Caruso Valeria
Chen Jia
Constant Matthieu
Cordeiro Silvio Ricardo
de Medeiros Caseli Helena
Di Buono Maria Pia
Ehren Rafael
Elyovitch Hevi
Erden Berna
Estarrona Ainara
Foster Jennifer
Fotopoulou Aggeliki
Foufi Vassiliki
Ge Xiaomin
Giouli Voula
Gonzalez Itziar
Guillaume Bruno
Gurrutxaga Antton
Güngör Tunga
Ha-Cohen Kerner Yaakov
Hu Fangyuan
Hu Sha
Ionescu Mihaela
Iñurrieta Uxoa
Jain Kanishka
Jiang Menghan
Li Minli
Lichte Timm
Liebeskind Chaya
Liu Siyuan
Louizou Sevasti
Lynn Teresa
Malka Ruth
Markantonatou Stella
Miranda Isaac
Monti Johanna
Onofrei Mihaela
Palka-Binkiewicz Emilia
Papadelli Stella
Parmentier Yannick
Pascucci Antonio
Pasquer Caroline
Puri Vandana
Qin Zhenzhen
Rademaker Alexandre
Raffone Annalisa
Ramisch Carlos
Ramisch Renata
Ramisch Renata
Ratori Shraddha
Riccio Anna
Rizea Monica-Mihaela
Sangati Federico
Savary Agata
Shukla Vishakha
Speranza Giulia
Srivastava Shubham
Stymme Sara
Stymne Sara
Sun Ruilong
Uria Larraitz
Urizar Ruben
Vaidya Ashwini
Vale Oto
Villavicencio Aline
Walsh Abigail
Wang Chenweng
Waszczuk Jakub
Wick Pedro Gabriela
Wilkens Rodrigo
Xiao Huangyang
Xu Hongzhi
Yan Peiyi
Yih Tsy
Yirmibeşoğlu Zeynep
Yu Ke
Yu Songping
Zeng Si
Zhang Yongchen
Zhao Yun
Zilio Leonardo
Publication venue
Publication date: 15/06/2016
Field of study

La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Bordetella pertussis e Ait Rekombinant Fimbrial Protein X (FimX), Putatif Peptidil Prolil Sis-Trans İzomeraz (PPPIase), Glutamin Bağlayıcı Periplasmik Protein (GlnBP), Putatif Peptidoglikan Bağlayıcı Protein ve Şaperonin 10 (Hsp 10) Proteinlerinin Immün Koruyucu Kapasitelerinin Değerlendirilmesi

Author: Ak Eran Zeynep
Yılmaz Çiğdem
Yirmibeşoğlu Side Selin Su
Çiçek Mustafa
Özcengiz Gülay
Publication venue
Publication date: 15/04/2015
Field of study

OpenMETU (Middle East Technical University)

Morpho-syntactically annotated corpora provided for the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

This multilingual resource contains corpora for 14 languages, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). These corpora were meant to serve as additional "raw" corpora, to help discovering unseen verbal MWEs. The corpora are provided in CONLL-U (https://universaldependencies.org/format.html) format. They contain morphosyntactic annotations (parts of speech, lemmas, morphological features, and syntactic dependencies). Depending on the language, the information comes from treebanks (mostly Universal Dependencies v2.x) or from automatic parsers trained on UD v2.x treebanks (e.g., UDPipe). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

PARSEME corpora annotated for verbal multiword expressions (version 1.3)

Author: Aceta Cristina
Adalı Kübra
Aduriz Itziar
Antić Anđela
Antoine Jean-Yves
Arhar Holdt Špela
Attard Greta
Azzopardi Kirsty
Barbu Mititelu Verginica
Bejček Eduard
Ben Khelil Chérifa
Berk Gözde
Bhatia Archna
Bielinskienė Agnė
Blagus Goranka
Boizou Loïc
Bonial Claire
Bonnici Janice
Boz Mert
Buljan Maja
Busuttil Jael
Butler Alexandra
Bărbulescu Elena-Andreea
Candito Marie
Cap Fabienne
Carlino Carola
Caruso Valeria
Chen Jia
Cherchi Manuela
Constant Matthieu
Cook Paul
Cordeiro Silvio Ricardo
Cristescu Mihaela
de Medeiros Caseli Helena
De Santis Anna
Di Buono Maria Pia
Diab Mona
Dimitrova Tsvetana
Dinç Tutkum
Ehren Rafael
El Maarouf Ismail
Elbadrashiny Mohamed
Elyovich Hevi
Erden Berna
Erenmalm Elsa
Eryiğit Gülşen
Estarrona Ainara
Fabri Ray
Farrugia Alison
Findlay Jamie
Finnveden Gustav
Foster Jennifer
Fotopoulou Aggeliki
Foufi Vassiliki
Galea Luke
Galea Sara Anne
Gantar Polona
Gatt Albert
Gatt Anabelle
Ge Xiaomin
Giouli Voula
Gonzalez Itziar
Griciūtė Bernadeta
Guillaume Bruno
Gurrutxaga Antton
Güngör Tunga
Ha-Cohen Kerner Yaakov
Hadj Mohamed Najet
Hawwari Abdelati
Herrero Carlos
Hu Fangyuan
Hu Sha
Ibrahim Rehab
Iñurrieta Uxoa
Jagfeld Glorianna
Jain Kanishka
Jaknić Isidora
Jazbec Ivo-Pavao
Jiang Menghan
Kavčič Teja
Kovalevskaitė Jolanta
Kovács Viktória
Krek Simon
Krstev Cvetana
Kuzman Taja
Leseva Svetlozara
Li Minli
Lichte Timm
Liebeskind Chaya
Lindqvist Ellinor
Liu Siyuan
Ljubešić Nikola
Louizou Sevasti
Lynn Teresa
Maldonado Alfredo
Malka Ruth
Markantonatou Stella
Martínez Alonso Héctor
Matas Ivana
McCrae John
Miral Ayşenur
Miranda Isaac
Monti Johanna
Muscat Amanda
Nivre Joakim
Onofrei Mihaela
Palka-Binkiewicz Emilia
Papadelli Stella
Parmentier Yannick
Parra Escartín Carla
Pascucci Antonio
Pasquer Caroline
Petterson Eva
Pickard Thomas
Priego Sanchez Belem
Puri Vandana
QasemiZadeh Behrang
Qin Zhenzhen
Rademaker Alexandre
Raffone Annalisa
Ramisch Carlos
Ramisch Renata
Ratori Shraddha
Riccio Anna
Rimkute Erika
Rizea Monica-Mihaela
Sangati Federico
Sarlak Mahtab
Savary Agata
Schneider Nathan
Shamsfard Mehrnoush
Shukla Vishakha
Simkó Katalin
Somers Clarissa
Spagnol Michael
Speranza Giulia
Srivastava Shubham
Stank
Stanković Ranka
Stefanova Valentina
Stoyanova Ivelina
Stymne Sara
Sun Ruilong
Tabone Nicole
Tajalli Vahide
Tanti Marc
Taslimipoor Shiva
Theoxari Natasa
Todorova Maria
Urešová Zdeňka
Uria Larraitz
Urizar Ruben
Vaidya Ashwini
Vale Oto
van der Plas Lonneke
Villavicencio Aline
Vincze Veronika
Walles Rinat
Walsh Abigail
Wang Chenweng
Waszczuk Jakub
Wick Pedro Gabriela
Wilkens Rodrigo
Xiao Huangyang
Xu Hongzhi
Yan Peiyi
Yarandi Yalda
Yih Tsy
Yirmibeşoğlu Zeynep
Yu Ke
Yu Songping
Zeng Si
Zgreabăn Bianca-Mădălina
Zhang Yongchen
Zhao Yun
Zilio Leonardo
Öztürk Yağmur
Šnajder Jan
Publication venue: PARSEME
Publication date: 10/05/2023
Field of study

This multilingual resource contains corpora in which verbal MWEs have been manually annotated. VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). This is the first release of the corpora without an associated shared task. Previous version (1.2) was associated with the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. The annotation guidelines are available online: https://parsemefr.lis-lab.fr/parseme-st-guidelines/1.3 The .cupt format is detailed here: https://multiword.sourceforge.net/cupt-format

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University