9 research outputs found

    A Visual Analytics System for evaluating dataset of Neural Machine Translation

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 2. 서진욱.Neural Machine Translation (신경망을 이용한 기계 번역) 모델을 학습시키는데 있어서 가장 영향을 많이 끼치는 요소는 학습 데이터인 병렬 말뭉치(Parallel Corpora)의 품질이다. 따라서 병렬 말뭉치의 품질 개선이 필수적이며 지금까지 다양한 정제(Refinement) 작업이 많이 도입되었으나 여전히 개선할 부분이 많다. 이 논문은 기계 번역 학습시 필요한 병렬 말뭉치의 품질 개선 작업에 도움이 될 수 있는 시각적 분석 시스템을 소개한다. 우리 시스템은 병렬 말뭉치의 Noise를 빠르게 발견하고 선별하기 위해 머신러닝 기술을 활용하여 다양한 지표 (Metric)를 추출하고 이를 기반으로 상호작용이 가능한 시각적 분석 기법을 제공한다. 사용자는 우리의 시스템을 통해 Noise Data를 손쉽게 파악하고 이에 대한 상세한 내용을 확인 후 제거가 가능하다. 본 시스템의 효율성 및 유용함을 증명하기 위해 4명의 전문가를 포함한 총 8명의 사용자에게 사용성 평가를 진행하였으며, 마지막에 평가 결과를 바탕으로 개선해야 할 점에 대한 논의점도 언급한다.The most important part of training a Neural Machine Translation model maintains good quality of parallel corpora, which are composed of pairs of different languages, Therefore, various refinement tasks have been introduced to improve the quality of parallel corpora, but there is still much room for improvement. This paper introduces a visual analysis system which helps the good quality of parallel corpora for machine translation learning. Our system provides nine different metrics in order to discover and select noise of parallel corpora. Based on our metric and visualization technics, users can find and check noise parallel corpora easily. Our systems effectiveness and usefulness are demonstrated through a qualitative user study with a total of eight users including four experts.제 1 장 서 론 1 제 2 장 관련연구 4 제 3 장 디자인 요구사항 7 제 4 장 데이터 전처리 과정 10 제 5 장 시각화 디자인 14 제 1 절 Distribution View 14 제 2 절 Ranking View 15 제 3 절 Text Compare View 18 제 4 절 Ruleset View 20 제 6 장 사용성 평가 22 제 1 절 결과 23 제 2 절 사후 인터뷰 25 제 7 장 논 의 28 제 8 장 결 론 31 참고문헌 32 Abstract 36석

    A visual analytics approach for explainability of deep neural networks

    Get PDF
    Deep Learning has advanced the state-of-the-art in many fields, including machine translation, where Neural Machine Translation (NMT) has become the dominant approach in recent years. However, NMT still faces many challenges such as domain adaption, over- and under-translation, and handling long sentences, making the need for human translators apparent. Additionally, NMT systems pose the problems of explainability, interpretability, and interaction with the user, creating a need for better analytics systems. This thesis introduces NMTVis, an integrated Visual Analytics system for NMT aimed at translators. The system supports users in multiple tasks during translation: finding, filtering and selecting machine-generated translations that possibly contain translation errors, interactive post-editing of machine translations, and domain adaption from user corrections to improve the NMT model. Multiple metrics are proposed as a proxy for translation quality to allow users to quickly find sentences for correction using a parallel coordinates plot. Interactive, dynamic graph visualizations are used to enable exploration and post-editing of translation hypotheses by visualizing beam search and attention weights generated by the NMT model. A web-based user study showed that a majority of participants rated the system positively regarding functional effectiveness, ease of interaction and intuitiveness of visualizations. The user study also revealed a preference for NMTVis over traditional text-based translation systems, especially for large documents. Additionally, automated experiments were conducted which showed that using the system can reduce post-editing effort and improve translation quality for domain-specific documents.Deep Learning hat den Stand der Technik in vielen Bereichen, einschließlich der maschinellen Sprachübersetzung, vorangetrieben. In den letzten Jahren ist Neural Machine Translation (NMT) zu dem dominanten Ansatz für maschinelle Sprachübersetzung geworden. Es existiert jedoch noch immer eine Vielzahl von Herausforderungen in NMT, wie beispielsweise Domänenanpassung, Über- und Unterübersetzung, sowie der Umgang mit langen Sätzen. Außerdem haben NMTSysteme die Probleme der Erklärbarkeit, Interpretierbarkeit und Interaktion mit Endnutzern, was zu einem Bedarf an besseren Analysesysteme führt. In dieser Arbeit wird NMTVis vorgestellt, ein Visual Analytics System für NMT, das an Übersetzer gerichtet ist. Das System unterstützt Nutzer in einer Vielzahl von Aufgaben: dem Finden, Filtern, und Auswählen von fehlerhaften maschinellen Übersetzungen, der interaktiven Nachbearbeitung von Übersetzungen, und der Domänenanpassung des NMT-Modells durch Nutzerkorrekturen. Mehrere Metriken werden eingesetzt, um fehlerhafte Übersetzungen zu detektieren, und mit Parallelen Koordinaten visualisiert. Interaktive, dynamische Graphen-Visualisierungenwerden zur Analyse von Übersetzungshypothesen und zur Nachbearbeitung eingesetzt, wobei Beam-Search und Attention-Gewichte des NMT Modells visualisiert werden. Eine web-basierte Nutzerstudie zeigte, dass eine Mehrzahl der Teilnehmer das System positiv in Hinblick auf Effektivität, Benutzbarkeit und Intuitivität der Visualisierungen bewerten. Die Nutzerstudie zeigte zusätzlich eine Präferenz für NMTVis gegenüber traditionellen textbasierten Übersetzungssystemen, insbesondere für große Dokumente. Mehrere automatisierte Experimente belegten außerdem, dass das System zu einer Reduzierung des Arbeitsaufwands in der Nachbearbeitung und Verbesserung der Übersetzungsqualität für domänenspezifische Dokumente führen kann

    Visualizing Neural Machine Translation Attention and Confidence

    No full text
    In this article, we describe a tool for visualizing the output and attention weights of neural machine translation systems and for estimating confidence about the output based on the attention. Our aim is to help researchers and developers better understand the behaviour of their NMT systems without the need for any reference translations. Our tool includes command line and web-based interfaces that allow to systematically evaluate translation outputs from various engines and experiments. We also present a web demo of our tool with examples of good and bad translations: http://ej.uz/nmt-attentio
    corecore