Search CORE

22 research outputs found

Generating Text from Anonymised Structures

Author: Colin Emilie
Gardent Claire
Publication venue: HAL CCSD
Publication date: 01/01/2019
Field of study

International audienceSurface realisation maps a meaning representation (MR) to a text, usually a single sentence. In this paper, we introduce a new parallel dataset of deep meaning representations and French sentences and we present a novel method for MR-to-text generation which seeks to generalise by abstracting away from lexical content. Most current work on natural language generation focuses on generating text that matches a reference using BLEU as evaluation criteria. In this paper, we additionally consider the model's ability to reintroduce the function words that are absent from the deep input meaning representations. We show that our approach increases both BLEU score and the scores used to assess function words generation

Crossref

INRIA a CCSD electronic archive server

Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition. Volume 2 : Traitement Automatique des Langues Naturelles

Author: Benzitoun Christophe
Braud Chloé
Huber Laurine
Langlois David
Ouni Slim
Pogodalla Sylvain
Schneider Stéphane
Publication venue: AFCP
Publication date: 01/01/2020
Field of study

@ 6ème conférence conjointe: JEP-TALN-RECITAL 2020no abstrac

INRIA a CCSD electronic archive server

Generating Text from Anonymised Structures

Author: Colin Emilie
Gardent Claire
Publication venue: HAL CCSD
Publication date: 29/10/2019
Field of study

INRIA a CCSD electronic archive server

Corpus and Models for Lemmatisation and POS-tagging of Classical French Theatre

Author: Cafiero Florian
Camps Jean-Baptiste
Clérice Thibault
Fièvre Paul
Gabay Simon
Publication venue: 'Centre pour la Communication Scientifique Directe (CCSD)'
Publication date: 01/02/2021
Field of study

This paper describes the process of building an annotated corpus and training models for classical French literature, with a focus on theatre, and particularly comedies in verse. It was originally developed as a preliminary step to the stylometric analyses presented in Cafiero and Camps [2019]. The use of a recent lemmatiser based on neural networks and a CRF tagger allows to achieve accuracies beyond the current state-of-the art on the in-domain test, and proves to be robust during out-of-domain tests, i.e.up to 20th c.novels

arXiv.org e-Print Archive

Crossref

Episciences.org

Directory of Open Access Journals

Indirectly Named Entity Recognition

Author: Atanassova Iana
Cardey Sylviane
Gaudinat Arnaud
Greenfield Peter
Kauffmann Alexis
Madinier Hélène
Rey François-Claude
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 13/12/2021
Field of study

[EN] We define here indirectly named entities, as a term to denote multiword expressions referring to known named entities by means of periphrasis. While named entity recognition is a classical task in natural language processing, little attention has been paid to indirectly named entities and their treatment. In this paper, we try to address this gap, describing issues related to the detection and understanding of indirectly named entities in texts. We introduce a proof of concept for retrieving both lexicalised and non-lexicalised indirectly named entities in French texts. We also show example cases where this proof of concept is applied, and discuss future perspectives. We have initiated the creation of a first lexicon of 712 indirectly named entity entries that is available for future research.This research has been funded by the FEDER (Fonds européen de développement régional) and selected by the French-Swiss programme Interreg V. We would like to thank Claire Wuillemin for her preliminary work in the DecRIPT project about the State-of-the-Art in NER and SER in 2020. We would also like to thank for their advice Gilles Falquet, Luka Nerima, Eric Wehrli and Jean-Philippe Goldman at the University of Geneva.Kauffmann, A.; Rey, F.; Atanassova, I.; Gaudinat, A.; Greenfield, P.; Madinier, H.; Cardey, S. (2021). Indirectly Named Entity Recognition. Journal of Computer-Assisted Linguistic Research. 5(1):27-46. https://doi.org/10.4995/jclr.2021.15922OJS274651Abney, Steven. 1987. "The English Noun Phrase in its Sentential Aspect." PhD diss., Massachusetts Institute of Technology.Alsharaf, H., S. Cardey, P. Greenfield, D. Limame, and I. Skouratov. 2003. "Fixedness, the complexity and fragility of the phenomenon: some solutions for natural language processing." In Proceedings of ICL17. Prague, Czech Republic: Matfyzpress.Ananthanarayanan, Rema, Vijil Chenthamarakshan, Prasad M Deshpande, and Raghuram Krishnapuram. 2008. "Rule Based Synonyms for Entity Extraction from Noisy Text." In Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data AND '08, 31-38. Singapore: Association for Computing Machinery. https://doi.org/10.1145/1390749.1390756Bachellier, Jean-Louis. 1972. "Sur-Nom." Le texte: de la théorie à la recherche, no. 19: 69-92. doi :10.3406/comm.1972.1283. https://doi.org/10.3406/comm.1972.1283Baldwin, Timothy, and Su Nam Kim. 2013. "Multiword Expressions." In Handbook of Natural Language Processing, Second Edition, edited by Nitin Indurkhya and Fred J. Damerau, 267-292. Boca Raton, USA: CRCPress.Bohn, C., and Kjeti Nørvag. 2010. "Extracting Named Entities and Synonyms from Wikipedia." In Proceedings of the 24th IEEE International Conference on Advanced Information Networking and Applications, 1300-1307. https://doi.org/10.1109/AINA.2010.50Cai, Desheng, and Gongqing Wu. 2019. "Content-aware attributed entity embedding for synonymous named entity discovery." Neurocomputing 329: 237-247. https://doi.org/10.1016/j.neucom.2018.10.055Chakrabarti, K., S. Chaudhuri, T. Cheng, and Dong Xin. 2012. "A framework for robust discovery of entity synonyms." In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1384-1392, Beijing, China: Association for Computing Machinery. https://doi.org/10.1145/2339530.2339743Charton, Eric, Michel Gagnon, and Benoit Ozell. 2011. "Génération automatique de motifs de détection d'entités nommées en utilisant des contenus encyclopédiques (Automatic generation of named entity detection patterns using encyclopedic contents)" [in French]. In Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 13-24. Montpellier, France: ATALA.Cho, Hyejin, Wonjun Choi, and Hyunju Lee. 2017. "A method for named entity normalization in biomedical articles: application to diseases and plants." BMC bioinformatics 18, no. 1 ( 1-12. https://doi.org/10.1186/s12859-017-1857-8Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding." In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186. Minneapolis, Minnesota: Association for Computational Linguistics.Friburger, Nathalie. 2006. "Linguistique et reconnaissance automatique des noms propres." Meta 51, no. 4: 637-650. doi:10.7202/014331ar. https://doi.org/10.7202/014331arGuenoune, Hani, Kevin Cousot, Mathieu Lafourcade, Melissa Mekaoui, and Cédric Lopez. 2020. "A Dataset for Anaphora Analysis in French Emails." In Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference, 165-175. Barcelona, Spain (online): Association for Computational Linguistics.Honnibal, Matthew, and Ines Montani. 2017. "spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing."Kampeera, Wannachai, and Sylviane Cardey-Greenfield. 2012. "Building a Lexically and Semantically-Rich Resource for Paraphrase Processing." In Advances in Natural Language Processing, edited by Hitoshi Isahara and Kyoko Kanzaki, 138-143. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_14Kauffmann, Alexis. 2013. "Structural Asymmetries in Machine Translation: The case of English-Japanese". PhD diss., Université de Genève. https://doi.org/10.13097/archive-ouverte/unige:34540.Lample, Guillaume, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. 2016. "Neural Architectures for Named Entity Recognition." In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 260-270. San Diego, California: Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1030Lin, Bill Yuchen, Dong-Ho Lee, M. Shen, Ryan Rene Moreno, X. Huang, Prashant Shiralkar, and X. Ren. 2020. "TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 8503-8511. Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.752Lopez, C., Melissa Mekaoui, K. Aubry, Jean Bort, and Philippe Garnier. 2019. "Reconnaissance d'entités nommées itérative sur une structure en dépendances syntaxiques avec l'ontologie NERD." Revue des Nouvelles Technologies de l'Information, Extraction et Gestion des connaissances, RNTI-E-35, 81-92.Ma, Jie, Jun Liu, Y. Li, X. Hu, Yudai Pan, S. Sun, and Qika Lin. 2020. "Jointly Optimized Neural Coreference Resolution with Mutual Attention." In Proceedings of the 13th International Conference on Web Search and Data Mining. Houston, Texas, USA: Association for Computing Machinery. https://doi.org/10.1145/3336191.3371787Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David McClosky. 2014. The Stanford CoreNLP Natural Language Processing Toolkit In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55-60. Baltimore, Maryland: Association for Computational Linguistics. https://doi.org/10.3115/v1/P14-5010Martin, Louis, Benjamin Muller, Pedro Javier Ortiz Suarez, Yoann Dupont, Laurent Romary, Eric Villemonte de la Clergerie, Benoıt Sagot, and Djamé Seddah. 2020. "Les modèles de langue contextuels CamemBERT pour le français: impact de la taille et de l'hétérogénéité des données d'entrainement (CamemBERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity)" [in French]. In Actes de la 6e conférence conjointe Journées d'Etudes sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Etudiants Chercheurs en Informatique pour le' Traitement Automatique des Langues (RECITAL, 22e édition). Volume 2: Traitement Automatique des Langues Naturelles, 54-65. Nancy, France: ATALA et AFCP.Mitkov, Ruslan. 2014. Anaphora resolution. Routledge. https://doi.org/10.4324/9781315840086Mohamed, Muhidin A., and Mourad Chabane Oussalah. 2020. "A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics." Language Resources and Evaluation 54 : 457-485. https://doi.org/10.1007/s10579-019-09466-4Nadeau, David, and Satoshi Sekine. 2007. "A survey of named entity recognition and classification." Lingvisticae Investigationes 30: 3-26. https://doi.org/10.1075/li.30.1.03nadNayel, Hamada A., H. L. Shashirekha, Hiroyuki Shindo, and Yuji Matsumoto. 2019. "Improving Multi-Word Entity Recognition for Biomedical Texts." CoRRabs/1908.05691. arXiv:1908.05691.Nebhi, Kamel. 2013. "Named Entity Disambiguation using Freebase and Syntactic Parsing." In [email protected], Damien, Maud Ehrmann, and Sophie Rosset. 2016. "Evaluating Named Entity Recognition." Chap. 6 in Named Entities for Computational Linguistics, 111-129. John Wiley & Sons, Ltd. https://doi.org/10.1002/9781119268567.ch6Ortiz Suarez, Pedro Javier, Yoann Dupont, Benjamin Muller, Laurent Romary, and Benoıt Sagot. 2020. "Establishing a New State-of-the-Art for French Named Entity Recognition" [in English]. In Proceedings of the 12th Language Resources and Evaluation Conference, 4631-4638. Marseille, France: European Language Resources Association.Petit, Gérard. 2006. "Le nom de marque déposée : nom propre, nom commun et terme." Meta 51, no. 4: 690-705. doi:10.7202/014335ar. https://doi.org/10.7202/014335arQu, Meng, Xiang Ren, and Jiawei Han. 2017. "Automatic Synonym Discovery with Knowledge Bases." In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 997-1005. KDD '17. Halifax, NS, Canada: Association for Computing Machinery. https://doi.org/10.1145/3097983.3098185Racicot, André. 2009. "Traduire le monde: Venise du Nord et autres surnoms." L'Actualité langagière, vol. 6, n° 2, 23. Travaux publics et Services gouvernementaux Canada.Rey, François-Claude, and Kauffmann Alexis. 2021. "French indirectly named entities (version 1.3) [Data set]." Zenodo. https://doi.org/10.5281/zenodo.5158253.Rosales-Méndez, Henry, Aidan Hogan, and Barbara Poblete. 2019. "Fine-Grained Evaluation for Entity Linking." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 718-727. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1066Sales, Juliano Efson, André Freitas, Brian Davis, and Siegfried Handschuh. 2016. "A Compositional-Distributional Semantic Model for Searching Complex Entity Categories." In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 199-208. Berlin, Germany: Association for Computational Linguistics. https://doi.org/10.18653/v1/S16-2025Schmitt, X., S. Kubler, J. Robert, M. Papadakis, and Y. LeTraon. 2019. "A Replicable Comparison Study of NER Software: StanfordNLP, NLTK, OpenNLP, SpaCy, Gate." In Proceedings of the Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), 338-343. https://doi.org/10.1109/SNAMS.2019.8931850Shang, Jingbo, Liyuan Liu, Xiaotao Gu, Xiang Ren, Teng Ren, and Jiawei Han. 2018. "Learning Named Entity Tagger using Domain-Specific Dictionary." In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2054-2064. Brussels, Belgium: Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1230Shen, Jiaming, Ruiliang Lyu, Xiang Ren, Michelle Vanni, Brian Sadler, and Jiawei Han. 2019. "Mining entity synonyms with efficient neural set generation." In Proceedings of the AAAI Conference on Artificial Intelligence, 33:249-256. doi:10.1609/aaai.v33i01.3301249. https://doi.org/10.1609/aaai.v33i01.3301249Shinyama, Yusuke, Satoshi Sekine, and Kiyoshi Sudo. 2002. "Automatic Paraphrase Acquisition from News Articles." In Proceedings of the Second International Conference on Human Language Technology Research, 313-318. HLT '02. San Diego, California: Morgan Kaufmann Publishers Inc. https://doi.org/10.3115/1289189.1289218Sjöblom, Paula. 2016. "Commercial names." Chap. V.31 in The Oxford Handbook of Names and Naming, edited by Carole Hough, 453-464. Oxford, UK: Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199656431.013.56Tenney, Ian, Dipanjan Das, and Ellie Pavlick. 2019. "BERT Rediscovers the Classical NLP Pipeline." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 4593-4601. Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1452Treps, Marie. 2012. La rançon de la gloire - Les surnoms de nos politiques. Paris, France: Editions du Seuil.Watanabe, Taiki, Akihiro Tamura, Takashi Ninomiya, Takuya Makino, and Tomoya Iwakura. 2019. "Multi-Task Learning for Chemical Named Entity Recognition with Chemical Compound Paraphrasing." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 6244-6249. Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1648Wehrli, Eric, and Luka Nerima. 2018. "Anaphora resolution, collocations and translation." In Multiword units in machine translation and translation technology, edited by Johanna Monti, Violeta Seretan, Gloria Corpas Pastor, and Ruslan Mitkov, 244-256. John Benjamins. https://doi.org/10.1075/cilt.341.12wehWehrli, Eric, Violeta Seretan, and Luka Nerima. 2010. "Sentence Analysis and Collocation Identification." In Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications, 28-36. Beijing, China: Coling 2010 Organizing Committee.Weston, L., V. Tshitoyan, J. Dagdelen, O. Kononova, A. Trewartha, K. A. Persson, G. Ceder, and A. Jain. 2019. "Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature." Journal of Chemical Information and Modeling 59, no. 9: 3692-3702. doi: 10.1021/acs.jcim.9b00470. https://doi.org/10.1021/acs.jcim.9b00470Wu, G., Y. He, and X. Hu. 2018. "Entity Linking: An Issue to Extract Corresponding Entity With Knowledge Base." IEEE Access 6: 6220-6231. doi:10.1109/ACCESS.2017.2787787. https://doi.org/10.1109/ACCESS.2017.2787787Yang, Yiying, Xi Yin, Haiqin Yang, Xingjian Fei, Hao Peng, Kaijie Zhou, Kunfeng Lai, and Jianping Shen. 2021. "KGSynNet: A Novel Entity Synonyms Discovery Framework with Knowledge Graph." In Database Systems for Advanced Applications, edited by Christian S. Jensen, Ee-Peng Lim, De-Nian Yang, Wang-Chien Lee, Vincent S. Tseng, Vana Kalogeraki, Jen-Wei Huang, and Chih-Ya Shen, 174-190. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-73194-6_13Zhang, Ruoyu, Wenpeng Lu, Shoujin Wang, Xueping Peng, Rui Yu, and Yuan Gao. 2021. "Chinese clinical named entity recognition based on stacked neural network." Concurrency and Computation: Practice and Experience : 33:e5775. doi:10.1002/cpe.5775. https://doi.org/10.1002/cpe.577

HAL - Université de Franche-Comté

Hes-so: ArODES Open Archive (University of Applied Sciences and Arts Western Switzerland / Haute école spécialisée de Suisse occidentale / FH Westschweiz)

RiuNet

When Collaborative Treebank Curation Meets Graph Grammars: Arborator With a Grew Back-End

Author: Courtin Marine
Gerdes Kim
Guibon Gaël
Guillaume Bruno
Publication venue: HAL CCSD
Publication date: 11/05/2020
Field of study

International audienceIn this paper we present Arborator-Grew, a collaborative annotation tool for treebank development. Arborator-Grew combines the features of two preexisting tools: Arborator and Grew. Arborator is a widely used collaborative graphical online dependency treebank annotation tool. Grew is a tool for graph querying and rewriting specialized in structures needed in NLP, i.e. syntactic and semantic dependency trees and graphs. Grew also has an online version, Grew-match, where all Universal Dependencies treebanks in their classical, deep and surface-syntactic flavors can be queried. Arborator-Grew is a complete redevelopment and modernization of Arborator, replacing its own internal database storage by a new Grew API, which adds a powerful query tool to Arborator's existing treebank creation and correction features. This includes complex access control for parallel expert and crowd-sourced annotation, tree comparison visualization, and various exercise modes for teaching and training of annotators. Arborator-Grew opens up new paths of collectively creating, updating, maintaining, and curating syntactic treebanks and semantic graph banks

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Computer vision methods for unconstrained gesture recognition in the context of sign language annotation

Author: Gonzalez Preciado Matilde
Publication venue
Publication date: 24/09/2012
Field of study

Cette thèse porte sur l'étude des méthodes de vision par ordinateur pour la reconnaissance de gestes naturels dans le contexte de l'annotation de la Langue des Signes. La langue des signes (LS) est une langue gestuelle développée par les sourds pour communiquer. Un énoncé en LS consiste en une séquence de signes réalisés par les mains, accompagnés d'expressions du visage et de mouvements du haut du corps, permettant de transmettre des informations en parallèles dans le discours. Même si les signes sont définis dans des dictionnaires, on trouve une très grande variabilité liée au contexte lors de leur réalisation. De plus, les signes sont souvent séparés par des mouvements de co-articulation. Cette extrême variabilité et l'effet de co-articulation représentent un problème important dans les recherches en traitement automatique de la LS. Il est donc nécessaire d'avoir de nombreuses vidéos annotées en LS, si l'on veut étudier cette langue et utiliser des méthodes d'apprentissage automatique. Les annotations de vidéo en LS sont réalisées manuellement par des linguistes ou experts en LS, ce qui est source d'erreur, non reproductible et extrêmement chronophage. De plus, la qualité des annotations dépend des connaissances en LS de l'annotateur. L'association de l'expertise de l'annotateur aux traitements automatiques facilite cette tâche et représente un gain de temps et de robustesse. Le but de nos recherches est d'étudier des méthodes de traitement d'images afin d'assister l'annotation des corpus vidéo: suivi des composantes corporelles, segmentation des mains, segmentation temporelle, reconnaissance de gloses. Au cours de cette thèse nous avons étudié un ensemble de méthodes permettant de réaliser l'annotation en glose. Dans un premier temps, nous cherchons à détecter les limites de début et fin de signe. Cette méthode d'annotation nécessite plusieurs traitements de bas niveau afin de segmenter les signes et d'extraire les caractéristiques de mouvement et de forme de la main. D'abord nous proposons une méthode de suivi des composantes corporelles robuste aux occultations basée sur le filtrage particulaire. Ensuite, un algorithme de segmentation des mains est développé afin d'extraire la région des mains même quand elles se trouvent devant le visage. Puis, les caractéristiques de mouvement sont utilisées pour réaliser une première segmentation temporelle des signes qui est par la suite améliorée grâce à l'utilisation de caractéristiques de forme. En effet celles-ci permettent de supprimer les limites de segmentation détectées en milieu des signes. Une fois les signes segmentés, on procède à l'extraction de caractéristiques visuelles pour leur reconnaissance en termes de gloses à l'aide de modèles phonologiques. Nous avons évalué nos algorithmes à l'aide de corpus internationaux, afin de montrer leur avantages et limitations. L'évaluation montre la robustesse de nos méthodes par rapport à la dynamique et le grand nombre d'occultations entre les différents membres. L'annotation résultante est indépendante de l'annotateur et représente un gain de robustese important.This PhD thesis concerns the study of computer vision methods for the automatic recognition of unconstrained gestures in the context of sign language annotation. Sign Language (SL) is a visual-gestural language developed by deaf communities. Continuous SL consists on a sequence of signs performed one after another involving manual and non-manual features conveying simultaneous information. Even though standard signs are defined in dictionaries, we find a huge variability caused by the context-dependency of signs. In addition signs are often linked by movement epenthesis which consists on the meaningless gesture between signs. The huge variability and the co-articulation effect represent a challenging problem during automatic SL processing. It is necessary to have numerous annotated video corpus in order to train statistical machine translators and study this language. Generally the annotation of SL video corpus is manually performed by linguists or computer scientists experienced in SL. However manual annotation is error-prone, unreproducible and time consuming. In addition de quality of the results depends on the SL annotators knowledge. Associating annotator knowledge to image processing techniques facilitates the annotation task increasing robustness and speeding up the required time. The goal of this research concerns on the study and development of image processing technique in order to assist the annotation of SL video corpus: body tracking, hand segmentation, temporal segmentation, gloss recognition. Along this PhD thesis we address the problem of gloss annotation of SL video corpus. First of all we intend to detect the limits corresponding to the beginning and end of a sign. This annotation method requires several low level approaches for performing temporal segmentation and for extracting motion and hand shape features. First we propose a particle filter based approach for robustly tracking hand and face robust to occlusions. Then a segmentation method for extracting hand when it is in front of the face has been developed. Motion is used for segmenting signs and later hand shape is used to improve the results. Indeed hand shape allows to delete limits detected in the middle of a sign. Once signs have been segmented we proceed to the gloss recognition using lexical description of signs. We have evaluated our algorithms using international corpus, in order to show their advantages and limitations. The evaluation has shown the robustness of the proposed methods with respect to high dynamics and numerous occlusions between body parts. Resulting annotation is independent on the annotator and represents a gain on annotation consistency

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Speech Recognition and Scholarly Research: Usability and Sustainability

Author: Ordelman Roeland J.F.
van Hessen Adrianus J.
Publication venue
Publication date: 10/10/2018
Field of study

University of Twente Research Information

Media Suite: Unlocking Archives for Mixed Media Scholarly Research

Author: Martínez Ortíz Carlos
Melgar Estrada Liliana
Noordegraaf Julia
Ordelman Roeland J.F.
Publication venue
Publication date: 10/10/2018
Field of study

University of Twente Research Information

An investigation of English-Irish machine translation and associated resources

Author: Dowling Meghan
Publication venue: Dublin City University. ADAPT
Publication date: 01/02/2022
Field of study

As an official language in both Ireland and the European Union (EU), there is a high demand for English-Irish (EN-GA) translation in public administration. The difficulty that translators currently face in meeting this demand leads to the need for reliable domain-specific user-driven EN-GA machine translation (MT). This landscape provides a timely opportunity to address some research questions surrounding MT for the EN-GA language pair. To this end, we assess the corpora available for training data-driven MT systems, including publicly-available data, data collected through EU-supported data collection efforts and web-crawling, showing that though Irish is a low-resource language it is possible to increase the corpora available through concerted data collection efforts. We investigate how increased corpora affect domain-specific (public administration) statistical MT (SMT) and neural MT (NMT) systems using automatic metrics. The effect that different SMT and NMT parameters have on these automatic values is also explored, using sentence-level metrics to identify specific areas where output differs greatly between MT systems and providing a linguistic analysis of each. With EN-GA SMT and NMT automatic evaluation scores showing inconclusive results, we investigate the usefulness of EN-GA hybrid MT through the use of monolingual data as a source of artificial data creation via backtranslation. We evaluate these results using automatic metrics and linguistic analysis. Although results indicate that the addition of artificial data did not have a positive impact on EN-GA MT, repeated experiments involving Scottish Gaelic show that the method holds promise, given suitable conditions. Finally, given that the intended use-case of EN-GA MT is in the workflow of a professional translator, we conduct an in-depth human evaluation study for EN-GA SMT and NMT, providing a human-derived assessment of EN-GA MT quality and comparison of EN-GA SMT and NMT. We include a survey of translator opinions and recommendations surrounding EN-GA SMT and NMT as well as an analysis of data gathered through the post-editing of MT output. We compare these results to those generated automatically and provide recommendations for future work on EN-GA MT, in particular with regards to its use in a professional translation workflow within public administration

DCU Online Research Access Service