1,743 research outputs found

    Dictionary-based Data Generation for Fine-Tuning Bert for Adverbial Paraphrasing Tasks

    Get PDF
    Recent advances in natural language processing technology have led to the emergence of large and deep pre-trained neural networks. The use and focus of these networks are on transfer learning. More specifically, retraining or fine-tuning such pre-trained networks to achieve state of the art performance in a variety of challenging natural language processing/understanding (NLP/NLU) tasks. In this thesis, we focus on identifying paraphrases at the sentence level using the network Bidirectional Encoder Representations from Transformers (BERT). It is well understood that in deep learning the volume and quality of training data is a determining factor of performance. The objective of this thesis is to develop a methodology for algorithmic generation of high-quality training data for paraphrasing task, an important NLU task, as well as the evaluation of the resulting training data on fine-tuning BERT to identify paraphrases. Here we will focus on elementary adverbial paraphrases, but the methodology extends to the general case. In this work, training data for adverbial paraphrasing was generated utilizing an Oxfordiii synonym dictionary, and we used the generated data to re-train BERT for the paraphrasing task with strong results, achieving a validation accuracy of 96.875%

    Multi-modal post-editing of machine translation

    Get PDF
    As MT quality continues to improve, more and more translators switch from traditional translation from scratch to PE of MT output, which has been shown to save time and reduce errors. Instead of mainly generating text, translators are now asked to correct errors within otherwise helpful translation proposals, where repetitive MT errors make the process tiresome, while hard-to-spot errors make PE a cognitively demanding activity. Our contribution is three-fold: first, we explore whether interaction modalities other than mouse and keyboard could well support PE by creating and testing the MMPE translation environment. MMPE allows translators to cross out or hand-write text, drag and drop words for reordering, use spoken commands or hand gestures to manipulate text, or to combine any of these input modalities. Second, our interviews revealed that translators see value in automatically receiving additional translation support when a high CL is detected during PE. We therefore developed a sensor framework using a wide range of physiological and behavioral data to estimate perceived CL and tested it in three studies, showing that multi-modal, eye, heart, and skin measures can be used to make translation environments cognition-aware. Third, we present two multi-encoder Transformer architectures for APE and discuss how these can adapt MT output to a domain and thereby avoid correcting repetitive MT errors.Angesichts der stetig steigenden Qualität maschineller Übersetzungssysteme (MÜ) post-editieren (PE) immer mehr Übersetzer die MÜ-Ausgabe, was im Vergleich zur herkömmlichen Übersetzung Zeit spart und Fehler reduziert. Anstatt primär Text zu generieren, müssen Übersetzer nun Fehler in ansonsten hilfreichen Übersetzungsvorschlägen korrigieren. Dennoch bleibt die Arbeit durch wiederkehrende MÜ-Fehler mühsam und schwer zu erkennende Fehler fordern die Übersetzer kognitiv. Wir tragen auf drei Ebenen zur Verbesserung des PE bei: Erstens untersuchen wir, ob andere Interaktionsmodalitäten als Maus und Tastatur das PE unterstützen können, indem wir die Übersetzungsumgebung MMPE entwickeln und testen. MMPE ermöglicht es, Text handschriftlich, per Sprache oder über Handgesten zu verändern, Wörter per Drag & Drop neu anzuordnen oder all diese Eingabemodalitäten zu kombinieren. Zweitens stellen wir ein Sensor-Framework vor, das eine Vielzahl physiologischer und verhaltensbezogener Messwerte verwendet, um die kognitive Last (KL) abzuschätzen. In drei Studien konnten wir zeigen, dass multimodale Messung von Augen-, Herz- und Hautmerkmalen verwendet werden kann, um Übersetzungsumgebungen an die KL der Übersetzer anzupassen. Drittens stellen wir zwei Multi-Encoder-Transformer-Architekturen für das automatische Post-Editieren (APE) vor und erörtern, wie diese die MÜ-Ausgabe an eine Domäne anpassen und dadurch die Korrektur von sich wiederholenden MÜ-Fehlern vermeiden können.Deutsche Forschungsgemeinschaft (DFG), Projekt MMP

    Educational leadership: A description of Saudi female principals in the Eastern Province of Saudi Arabia

    Get PDF
    Scope and Method of Study:This is a qualitative study using a descriptive case study methodology. 12 Saudi female principals in the Eastern Province of Saudi Arabia were interviewed and asked to describe their leadership role and their perspectives of that role.Findings and Conclusions:Saudi female principals' leadership role and perspectives of that role are highly impacted by the Saudi national religion, Islam, and influenced by the Saudi Ministry of Education. Western leadership theories do not adequately explain the leadership of these Eastern educators; societal culture contributes to the leadership role and perspectives of the role held by these Saudi principals

    Graph spectral domain feature learning with application to in-air hand-drawn number and shape recognition

    Get PDF
    This paper addresses the problem of recognition of dynamic shapes by representing the structure in a shape as a graph and learning the graph spectral domain features. Our proposed method includes pre-processing for converting the dynamic shapes into a fully connected graph, followed by analysis of the eigenvectors of the normalized Laplacian of the graph adjacency matrix for forming the feature vectors. The method proposes to use the eigenvector corresponding to the lowest eigenvalue for formulating the feature vectors as it captures the details of the structure of the graph. The use of the proposed graph spectral domain representation has been demonstrated in an in-air hand-drawn number and symbol recognition applications. It has achieved average accuracy rates of 99.56% and 99.44%, for numbers and symbols, respectively, outperforming the existing methods for all datasets used. It also has the added benefits of fast real-time operation and invariance to rotation and flipping, making the recognition system robust to different writing and drawing variations

    Message in the Bottle

    Get PDF

    Low-Resource Unsupervised NMT:Diagnosing the Problem and Providing a Linguistically Motivated Solution

    Get PDF
    Unsupervised Machine Translation hasbeen advancing our ability to translatewithout parallel data, but state-of-the-artmethods assume an abundance of mono-lingual data. This paper investigates thescenario where monolingual data is lim-ited as well, finding that current unsuper-vised methods suffer in performance un-der this stricter setting. We find that theperformance loss originates from the poorquality of the pretrained monolingual em-beddings, and we propose using linguis-tic information in the embedding train-ing scheme. To support this, we look attwo linguistic features that may help im-prove alignment quality: dependency in-formation and sub-word information. Us-ing dependency-based embeddings resultsin a complementary word representationwhich offers a boost in performance ofaround 1.5 BLEU points compared to stan-dardWORD2VECwhen monolingual datais limited to 1 million sentences per lan-guage. We also find that the inclusion ofsub-word information is crucial to improv-ing the quality of the embedding
    corecore