Search CORE

173,468 research outputs found

Integrating optical character recognition and machine translation of historical documents

Author: Afli Haithem
Way Andy
Publication venue: COLING 2016 Organizing Committee Committee
Publication date: 01/12/2016
Field of study

Machine Translation (MT) plays a critical role in expanding capacity in the translation industry. However, many valuable documents, including digital documents, are encoded in non-accessible formats for machine processing (e.g., Historical or Legal documents). Such documents must be passed through a process of Optical Character Recognition (OCR) to render the text suitable for MT. No matter how good the OCR is, this process introduces recognition errors, which often renders MT ineffective. In this paper, we propose a new OCR to MT framework based on adding a new OCR error correction module to enhance the overall quality of translation. Experimentation shows that our new system correction based on the combination of Language Modeling and Translation methods outperforms the baseline system by nearly 30% relative improvement

Irish Universities

DCU Online Research Access Service

Theorizing EFL Teachers’ Perspectives and Rationales on Providing Corrective Feedback

Author: Ostovar-Namaghi Seyyed Ali
Shakiba Kamal
Publication venue: NSUWorks
Publication date: 01/06/2015
Field of study

Researchers condemn teachers by saying that tradition, rather than research findings, derive their practice while teachers condemn researchers by saying that their research findings are universal generalizations that fail in practice. To turn mutual distrust to mutual trust, this data-driven study aims at theorizing practice, rather than enlighten practice through theory-driven research. The theoretical sampling of twenty EFL teachers’ perspectives concerning corrective feedback, together with the rigorous coding schemes of grounded theory yielded some context-sensitive corrective feedback techniques: direct feedback; indirect feedback such as recast, providing an alternative, asking other students, pausing before the error, providing the rule, using the correct structure and showing surprise; feedback through other language skills including writing and listening; and no correction on cognitive, affective and information processing grounds. Moreover analysis uncovered a set of specifications on when, where, and why to use these techniques. Not only do the findings help practitioners get in-sights and improve their providing feedback, but also they help researchers modify their hypotheses before testing them through the quantitative research that aims at generalization

CiteSeerX

NSU Works

Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP

Author: Domingo Judit
Marquina Montse
Melero Maite
Quixal Martí
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2012
Field of study

We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC