3,180 research outputs found

    Towards a collocation writing assistant for learners of Spanish

    Get PDF
    This paper describes the process followed in creating a tool aimed at helping learners produce collocations in Spanish. First we present the Diccionario de colocaciones del español (DiCE), an online collocation dictionary, which represents the first stage of this process. The following section focuses on the potential user of a collocation learning tool: we examine the usability problems DiCE presents in this respect, and explore the actual learner needs through a learner corpus study of collocation errors. Next, we review how collocation production problems of English language learners can be solved using a variety of electronic tools devised for that language. Finally, taking all the above into account, we present a new tool aimed at assisting learners of Spanish in writing texts, with particular attention being paid to the use of collocations in this language

    Systems Combination for Grammatical Error Correction

    Get PDF
    Master'sMASTER OF SCIENC

    Analysis of errors in the automatic translation of questions for translingual QA systems

    Get PDF
    Purpose – This study aims to focus on the evaluation of systems for the automatic translation of questions destined to translingual question-answer (QA) systems. The efficacy of online translators when performing as tools in QA systems is analysed using a collection of documents in the Spanish language. Design/methodology/approach – Automatic translation is evaluated in terms of the functionality of actual translations produced by three online translators (Google Translator, Promt Translator, and Worldlingo) by means of objective and subjective evaluation measures, and the typology of errors produced was identified. For this purpose, a comparative study of the quality of the translation of factual questions of the CLEF collection of queries was carried out, from German and French to Spanish. Findings – It was observed that the rates of error for the three systems evaluated here are greater in the translations pertaining to the language pair German-Spanish. Promt was identified as the most reliable translator of the three (on average) for the two linguistic combinations evaluated. However, for the Spanish-German pair, a good assessment of the Google online translator was obtained as well. Most errors (46.38 percent) tended to be of a lexical nature, followed by those due to a poor translation of the interrogative particle of the query (31.16 percent). Originality/value – The evaluation methodology applied focuses above all on the finality of the translation. That is, does the resulting question serve as effective input into a translingual QA system? Thus, instead of searching for “perfection”, the functionality of the question and its capacity to lead one to an adequate response are appraised. The results obtained contribute to the development of improved translingual QA systems

    Detecting grammatical errors with treebank-induced, probabilistic parsers

    Get PDF
    Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

    The Speech-Language Interface in the Spoken Language Translator

    Full text link
    The Spoken Language Translator is a prototype for practically useful systems capable of translating continuous spoken language within restricted domains. The prototype system translates air travel (ATIS) queries from spoken English to spoken Swedish and to French. It is constructed, with as few modifications as possible, from existing pieces of speech and language processing software. The speech recognizer and language understander are connected by a fairly conventional pipelined N-best interface. This paper focuses on the ways in which the language processor makes intelligent use of the sentence hypotheses delivered by the recognizer. These ways include (1) producing modified hypotheses to reflect the possible presence of repairs in the uttered word sequence; (2) fast parsing with a version of the grammar automatically specialized to the more frequent constructions in the training corpus; and (3) allowing syntactic and semantic factors to interact with acoustic ones in the choice of a meaning structure for translation, so that the acoustically preferred hypothesis is not always selected even if it is within linguistic coverage.Comment: 9 pages, LaTeX. Published: Proceedings of TWLT-8, December 199
    corecore