Search CORE

9 research outputs found

Adapting Sequence Models for Sentence Correction

Author: Kim Yoon
Rush Alexander M.
Schmaltz Allen
Shieber Stuart M.
Publication venue
Publication date: 01/01/2017
Field of study

In a controlled experiment of sequence-to-sequence approaches for the task of sentence correction, we find that character-based models are generally more effective than word-based models and models that encode subword information via convolutions, and that modeling the output data as a series of diffs improves effectiveness over standard approaches. Our strongest sequence-to-sequence model improves over our strongest phrase-based statistical machine translation model, with access to the same data, by 6 M2 (0.5 GLEU) points. Additionally, in the data environment of the standard CoNLL-2014 setup, we demonstrate that modeling (and tuning against) diffs yields similar or better M2 scores with simpler models and/or significantly less data than previous sequence-to-sequence approaches.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Graph-to-Sequence Learning using Gated Graph Neural Networks

Author: Beck Daniel
Cohn Trevor
Haffari Gholamreza
Publication venue
Publication date: 01/01/2018
Field of study

Many NLP applications can be framed as a graph-to-sequence learning problem. Previous work proposing neural architectures on this setting obtained promising results compared to grammar-based approaches but still rely on linearisation heuristics and/or standard recurrent networks to achieve the best performance. In this work, we propose a new model that encodes the full structural information contained in the graph. Our architecture couples the recently proposed Gated Graph Neural Networks with an input transformation that allows nodes and edges to have their own hidden representations, while tackling the parameter explosion problem present in previous work. Experimental results show that our model outperforms strong baselines in generation from AMR graphs and syntax-based neural machine translation.Comment: ACL 201

arXiv.org e-Print Archive

Crossref

Monash University Research Portal

Differences between Human and Machine-generated Institutional Translations: A comparative analysis using quantitative methods

Author: Bourou Maria
Μπούρου Μαρία
Publication venue
Publication date: 01/01/2019
Field of study

Η μηχανική μετάφραση αποτελεί δημοφιλή επιλογή τα τελευταία χρόνια. Παρόλ’ αυτά, υστερεί συγκριτικά με τον ανθρώπινο τρόπο γραφής σε ποιότητα και φυσικότητα. Η παρούσα εργασία αποσκοπεί στη διερεύνηση των διαφορών μεταξύ αυτόματης και μη-αυτοματοποιημένης μετάφρασης Ελληνικών κειμένων θεσμικού χαρακτήρα, συγκρίνοντας ποσοτικά γλωσσικά χαρακτηριστικά των δύο τύπων μετάφρασης στα αγγλικά κείμενα-στόχους. Όπως προέκυψε από έλεγχο σημαντικότητας ανεξάρτητων δειγμάτων (t) τα δύο σώματα κειμένων διέφεραν σε μια σειρά γλωσσικών χαρακτηριστικών: γενικές πληροφορίες (π.χ. μήκος λέξεων), κατηγορίες λέξεων (π.χ. μέρη του λόγου, συχνότητα), λεξιλογικό πλούτο, συντακτική δομή και κειμενική συνοχή. Ωστόσο, ο βαθμός της διαφοροποίησης στα δύο δείγματα δεν ήταν εντυπωσιακός. Ένα δεύτερο πείραμα βασιζόμενο στο Multilayer Perceptron Νευρωτικό Δίκτυο αποκάλυψε πως το μηχάνημα ήταν σε θέση να κατηγοριοποιήσει με ακρίβεια το 82% των κειμένων ως προερχόμενα από ανθρώπινο ή αυτόματο μεταφραστή. Με βάση αυτά τα αποτελέσματα προκύπτει ότι οι διαφορές μεταξύ της ανθρώπινης και της μηχανικής μετάφρασης, όσον αφορά το παρόν κειμενικό είδος, είναι ανιχνεύσιμες με τη χρήση μεθόδων μηχανικής μάθησης, όμως οι διαφοροποίηση δεν είναι τόσο ξεκάθαρη όσο στο βαθμό που αναμενόταν. Περαιτέρω διερεύνηση είναι απαραίτητη για να διευκρινιστεί εάν τα γλωσσικά χαρακτηριστικά που διαφοροποιούν τους δύο τύπους μετάφρασης μπορούν να αξιοποιηθούν μελλοντικά ως δείκτες μεταφραστικής ποιότητας.Machine translation, commonly referred to as MT, has gained popularity over the recent years; however, it has not yet reached the quality and naturalness of human writing. The present thesis aims to explore how human and automatic English translations of Greek institutional texts differ by comparing quantitative characteristics of the two translation types. Statistical analysis using independent samples t-tests revealed that the two corpora differed in a range of linguistic features including descriptive characteristics (e.g. word length), word information (e.g. parts of speech, word frequency), lexical diversity, syntax and cohesion; however, the degree of variation was not striking. In a follow-up examination, using Multilayer Perceptron neural network, the machine was able to classify correctly almost 82% of the texts as automatic or human-produced. These results suggest that the differences between HT and MT regarding the subgenre in question are detectable using machine learning techniques, but the distinction is not as clear-cut as expected. Further research is needed to determine whether the text properties that differ most in the two corpora can be used effectively as predictors of translation quality

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Randomized Significance Tests in Machine Translation

Author
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

Crossref