Search CORE

902 research outputs found

Developing a Chunk-based Grammar Checker for Translated English Sentences

Author: Lin Nay Yee
Soe Khin Mar
Thein Ni Lar
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Improving subject-verb agreement in SMT

Author: Du Jinhua
Vanmassenhove Eva
Way Andy
Publication venue
Publication date: 01/06/2016
Field of study

Ensuring agreement between the subject and the main verb is crucial for the correctness of the information that a sentence conveys. While generating correct subject-verb agreement is relatively straightforward in rule-based approaches to Machine Translation (RBMT), today’s leading statistical Machine Translation (SMT) systems often fail to generate correct subject-verb agreements, especially when the target language is morphologically richer than the source language. The main problem is that one surface verb form in the source language corresponds to many surface verb forms in the target language. To deal with subject-verb agreement we built a hybrid SMT system that augments source verbs with extra linguistic information drawn from their source-language context. This information, in the form of labels attached to verbs that indicate person and number, creates a closer association between a verb from the source and a verb in the target language. We used our preprocessing approach on English as source language and built an SMT system for translation to French. In a range of experiments, the results show improvements in translation quality for our augmented SMT system over a Moses baseline engine, on both automatic and manual evaluations, for the majority of cases where the subject-verb agreement was previously incorrectly translated

Irish Universities

DCU Online Research Access Service

Reordering of Source Side for a Factored English to Manipuri SMT System

Author: Maibam Indika
Purkayastha Bipul Syam
Publication venue: Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek
Publication date: 01/01/2023
Field of study

Similar languages with massive parallel corpora are readily implemented by large-scale systems using either Statistical Machine Translation (SMT) or Neural Machine Translation (NMT). Translations involving low-resource language pairs with linguistic divergence have always been a challenge. We consider one such pair, English-Manipuri, which shows linguistic divergence and belongs to the low resource category. For such language pairs, SMT gets better acclamation than NMT. However, SMT’s more prominent phrase- based model uses groupings of surface word forms treated as phrases for translation. Therefore, without any linguistic knowledge, it fails to learn a proper mapping between the source and target language symbols. Our model adopts a factored model of SMT (FSMT3*) with a part-of-speech (POS) tag as a factor to incorporate linguistic information about the languages followed by hand-coded reordering. The reordering of source sentences makes them similar to the target language allowing better mapping between source and target symbols. The reordering also converts long-distance reordering problems to monotone reordering that SMT models can better handle, thereby reducing the load during decoding time. Additionally, we discover that adding a POS feature data enhances the system’s precision. Experimental results using automatic evaluation metrics show that our model improved over phrase-based and other factored models using the lexicalised Moses reordering options. Our FSMT3* model shows an increase in the automatic scores of translation result over the factored model with lexicalised phrase reordering (FSMT2) by an amount of 11.05% (Bilingual Evaluation Understudy), 5.46% (F1), 9.35% (Precision), and 2.56% (Recall), respectively

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Correcting input noise in SMT as a char-based translation problem

Author: Formiga Fanals Lluís
Rodríguez Fonollosa José Adrián
Publication venue
Publication date: 01/01/2012
Field of study

Misspelled words have a direct impact on the final quality obtained by Statistical Machine Translation (SMT) systems as the input becomes noisy and unpredictable. This paper presents some improvement strategies for translating real-life noisy input. The proposed strategies are based on a preprocessing step consisting in a character-based translator.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An Empirical Comparison of Parsing Methods for Stanford Dependencies

Author: Kong Lingpeng
Smith Noah A.
Publication venue
Publication date: 16/04/2014
Field of study

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems. In light of the evolving definition of the Stanford dependencies and developments in statistical dependency parsing algorithms, this paper revisits the question of Cer et al. (2010): what is the tradeoff between accuracy and speed in obtaining Stanford dependencies in particular? We also explore the effects of input representations on this tradeoff: part-of-speech tags, the novel use of an alternative dependency representation as input, and distributional representaions of words. We find that direct dependency parsing is a more viable solution than it was found to be in the past. An accompanying software release can be found at: http://www.ark.cs.cmu.edu/TBSDComment: 13 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

Using parse features for preposition selection and error detection

Author: Chodorow Martin
Foster Jennifer
Tetreault Joel
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2010
Field of study

We evaluate the effect of adding parse features to a leading model of preposition usage. Results show a significant improvement in the preposition selection task on native speaker text and a modest increment in precision and recall in an ESL error detection task. Analysis of the parser output indicates that it is robust enough in the face of noisy non-native writing to extract useful information

CiteSeerX

Irish Universities

DCU Online Research Access Service

Recommended from our members

Neural Sequence-Labelling Models for Grammatical Error Correction

Author: Andersen OE
Giannakoudaki E
Rei M
Yuan Zheng
Publication venue: Proceedings of the 2017 Conference on Empirical Methods in natural Language Processing
Publication date: 30/09/2017
Field of study

We propose an approach to N-best list reranking using neural sequence-labelling models. We train a compositional model for error detection that calculates the probability of each token in a sentence being correct or incorrect, utilising the full sentence as context. Using the error detection model, we then re-rank the N best hypotheses generated by statistical machine translation systems. Our approach achieves state-of-the-art results on error correction for three different datasets, and it has the additional advantage of only using a small set of easily computed features that require no linguistic input

Apollo (Cambridge)