Search CORE

4,114 research outputs found

Analysis of Machine Translation Systems\u27 Errors in Tense, Aspect, and Modality

Author: Ma Qing
井佐原均
内元清貴
村田真樹
金丸敏幸
Publication venue: Institute of Linguistics, Academia Sinica
Publication date: 01/01/2005
Field of study

PACLIC 19 / Taipei, taiwan / December 1-3, 200

Waseda University Repository

Noise or music? Investigating the usefulness of normalisation for robust sentiment analysis on social media data

Author: De Clercq Orphée
Desmet Bart
Hoste Veronique
Lefever Els
Van de Kauter Marjan
Van Hee Cynthia
Publication venue
Publication date: 01/01/2017
Field of study

In the past decade, sentiment analysis research has thrived, especially on social media. While this data genre is suitable to extract opinions and sentiment, it is known to be noisy. Complex normalisation methods have been developed to transform noisy text into its standard form, but their effect on tasks like sentiment analysis remains underinvestigated. Sentiment analysis approaches mostly include spell checking or rule-based normalisation as preprocess- ing and rarely investigate its impact on the task performance. We present an optimised sentiment classifier and investigate to what extent its performance can be enhanced by integrating SMT-based normalisation as preprocessing. Experiments on a test set comprising a variety of user-generated content genres revealed that normalisation improves sentiment classification performance on tweets and blog posts, showing the model’s ability to generalise to other data genres

Ghent University Academic Bibliography

Towards a Digital Assessment: Artificial Intelligence Assisted Error Analysis in ESL

Author: Macías Borrego Manuel
Publication venue: Stallion Publication
Publication date: 29/07/2023
Field of study

The study we present here aims to explore the possibilities that new Artificial Intelligence tools offer teachers to design assessments to improve the written proficiency of students of English as a Foreign Language (the participants in this study have predominantly Spanish as their L1) in a University English Language Course with CEFR B2 objective. The group we are going to monitor is, as far as the Spanish university system is concerned, on average: more than sixty students, with diverse backgrounds and unequal proficiency in English. In such conditions, the teacher must be very attentive to meet the needs of all students/learners and, at the same time, keep track of successes and failures in the designed study plans. One of the most notable reasons for subject/class failure and dropout, in a scenario such as the one described, is the performance and time dedication to written competence (Cabrera, 2014 & López Urdaneta, 2011). Consequently, we will explore whether the union of all the theoretical baggage that underpins the linguistic and pedagogical tradition of Error Analysis, one of the most notable tools for enhancing the writing competence of English as a Foreign Language, and new intelligent technologies can provide new perspectives and strategies to effectively help learners of English as a Foreign Language to produce more appropriate written texts (more natural outputs) and, at the same time, to check whether an AI-assisted Error Analysis-based assessment produces better results in error avoidance and rule application in the collected writing samples

Integrated Journal for Research in Arts and Humanities

Multi-modal post-editing of machine translation

Author: Herbig Nico
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2022
Field of study

As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.Angesichts der stetig steigenden Qualität maschineller Übersetzungssysteme (MÜ) post-editieren (PE) immer mehr Übersetzer die MÜ-Ausgabe, was im Vergleich zur herkömmlichen Übersetzung Zeit spart und Fehler reduziert. Anstatt primär Text zu generieren, müssen Übersetzer nun Fehler in ansonsten hilfreichen Übersetzungsvorschlägen korrigieren. Dennoch bleibt die Arbeit durch wiederkehrende MÜ-Fehler mühsam und schwer zu erkennende Fehler fordern die Übersetzer kognitiv. Wir tragen auf drei Ebenen zur Verbesserung des PE bei: Erstens untersuchen wir, ob andere Interaktionsmodalitäten als Maus und Tastatur das PE unterstützen können, indem wir die Übersetzungsumgebung MMPE entwickeln und testen. MMPE ermöglicht es, Text handschriftlich, per Sprache oder über Handgesten zu verändern, Wörter per Drag & Drop neu anzuordnen oder all diese Eingabemodalitäten zu kombinieren. Zweitens stellen wir ein Sensor-Framework vor, das eine Vielzahl physiologischer und verhaltensbezogener Messwerte verwendet, um die kognitive Last (KL) abzuschätzen. In drei Studien konnten wir zeigen, dass multimodale Messung von Augen-, Herz- und Hautmerkmalen verwendet werden kann, um Übersetzungsumgebungen an die KL der Übersetzer anzupassen. Drittens stellen wir zwei Multi-Encoder-Transformer-Architekturen für das automatische Post-Editieren (APE) vor und erörtern, wie diese die MÜ-Ausgabe an eine Domäne anpassen und dadurch die Korrektur von sich wiederholenden MÜ-Fehlern vermeiden können.Deutsche Forschungsgemeinschaft (DFG), Projekt MMP

Universaar

Acronym

Interactive Pattern Recognition applied to Natural Language Processing

Author: Rodríguez Ruiz Luis
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 26/07/2010
Field of study

This thesis is about Pattern Recognition. In the last decades, huge efforts have been made to develop automatic systems able to rival human capabilities in this field. Although these systems achieve high productivity rates, they are not precise enough in most situations. Humans, on the contrary, are very accurate but comparatively quite slower. This poses an interesting question: the possibility of benefiting from both worlds by constructing cooperative systems. This thesis presents diverse contributions to this kind of collaborative approach. The point is to improve the Pattern Recognition systems by properly introducing a human operator into the system. We call this Interactive Pattern Recognition (IPR). Firstly, a general proposal for IPR will be stated. The aim is to develop a framework to easily derive new applications in this area. Some interesting IPR issues are also introduced. Multi-modality or adaptive learning are examples of extensions that can naturally fit into IPR. In the second place, we will focus on a specific application. A novel method to obtain high quality speech transcriptions (CAST, Computer Assisted Speech Transcription). We will start by proposing a CAST formalization and, next, we will cope with different implementation alternatives. Practical issues, as the system response time, will be also taken into account, in order to allow for a practical implementation of CAST. Word graphs and probabilistic error correcting parsing are tools that will be used to reach an alternative formulation that allows for the use of CAST in a real scenario. Afterwards, a special application within the general IPR framework will be discussed. This is intended to test the IPR capabilities in an extreme environment, where no input pattern is available and the system only has access to the user actions to produce a hypothesis. Specifically, we will focus here on providing assistance in the problem of text generation.Rodríguez Ruiz, L. (2010). Interactive Pattern Recognition applied to Natural Language Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8479Palanci

Crossref

RiuNet

Validating Multimedia Content Moderation Software via Semantic Fusion

Author: Chen Chang
Gu Jiazhen
He Pinjia
Huang Jingyuan
Lyu Michael
Wang Wenxuan
Wu Weibin
Zhang Jianping
Publication venue
Publication date: 22/05/2023
Field of study

The exponential growth of social media platforms, such as Facebook and TikTok, has revolutionized communication and content publication in human society. Users on these platforms can publish multimedia content that delivers information via the combination of text, audio, images, and video. Meanwhile, the multimedia content release facility has been increasingly exploited to propagate toxic content, such as hate speech, malicious advertisements, and pornography. To this end, content moderation software has been widely deployed on these platforms to detect and blocks toxic content. However, due to the complexity of content moderation models and the difficulty of understanding information across multiple modalities, existing content moderation software can fail to detect toxic content, which often leads to extremely negative impacts. We introduce Semantic Fusion, a general, effective methodology for validating multimedia content moderation software. Our key idea is to fuse two or more existing single-modal inputs (e.g., a textual sentence and an image) into a new input that combines the semantics of its ancestors in a novel manner and has toxic nature by construction. This fused input is then used for validating multimedia content moderation software. We realized Semantic Fusion as DUO, a practical content moderation software testing tool. In our evaluation, we employ DUO to test five commercial content moderation software and two state-of-the-art models against three kinds of toxic content. The results show that DUO achieves up to 100% error finding rate (EFR) when testing moderation software. In addition, we leverage the test cases generated by DUO to retrain the two models we explored, which largely improves model robustness while maintaining the accuracy on the original test set.Comment: Accepted by ISSTA 202

arXiv.org e-Print Archive

Online Incremental Machine Translation

Author: Rottmann Kay
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2015
Field of study

In this thesis we investigate the automatic improvements of statistical machine translation systems at runtime based on user feedback. We also propose a framework to use the proposed algorithms in large scale translation settings

KITopen