1,380 research outputs found

    POS tagging for German : how important is the right context?

    Get PDF
    Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%

    SINICA CORPUS : Design Methodology for Balanced Corpora

    Get PDF

    SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

    Get PDF
    In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

    VP\u3csup\u3e2\u3c/sup\u3e: The Role of User Modeling in Correcting Errors in Second Language Learning

    Get PDF
    This paper describes a system, VP2, that has been implemented to tutor non-native speakers in English. The system applies Artificial Intelligence techniques developed in Natural Language research. In particular, it differs from standard approaches by employing a model of its users to customize instruction based on knowledge of the student\u27s native language. The system focuses on the acquisition of English verb-particle and verb-prepositional phrase constructions. It diagnoses errors that students make due to interference of their native language. VP2 recognizes syntactic variation in English sentences, allowing freer translation. VP2 is a modular system: its model of a user\u27s native language can easily be replaced by a model of another language. Its correction strategy is based upon comparison of the native language model with a model of English. The problems and solutions presented in this paper are related to the more general question of how modeling previous knowledge facilitates instruction in a new skill

    Detecting grammatical errors with treebank-induced, probabilistic parsers

    Get PDF
    Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

    An Analysis of Grammatical Errors in Speech Joko Widodo, Presidential of Indonesia, at APEC CEO Summit ‘’YouTube Video by APEC’’

    Get PDF
    Language is a means of communication for every human being. Also, several types of languages ​​such as regional languages, state languages ​​and international languages ​​is English. English in general has existed since elementary school to college. Many adults are still wrong in the use of good and correct English so that it requires more extra learning. One method of improving English through vocabulary is speaking like a speech. In the speech, in this study, researchers took sources from speech. In this study, the researcher aims to analyse grammatical errors and speech focus on grammatical errors in presenters, which have been obtained using qualitative descriptive methods. The method of presentation is using descriptive presentation in the form of words or sentences that do not have a percentage or value in the form of numbers, where the researcher analyses according to the error class category of the data. From the results of this study, the researcher found 20 data where the data consisted of errors of auxiliary verbs and tenses

    Addressing the grammar needs of Chinese EAP students: an account of a CALL materials development project

    Get PDF
    This study investigated the grammar needs of Chinese EAP Foundation students and developed electronic self-access grammar materials for them. The research process consisted of three phases. In the first phase, a corpus linguistics based error analysis was conducted, in which 50 student essays were compiled and scrutinized for formal errors. A tagging system was specially devised and employed in the analysis. The EA results, together with an examination of Foundation tutors’ perceptions of error frequency and gravity led me to prioritise article errors for treatment; in the second phase, remedial materials were drafted based on the EA results and insights drawn from my investigations into four research areas (article pedagogy, SLA theory, grammar teaching approaches and CALL methodologies) and existing grammar materials; in the third phase, the materials were refined and evaluated for their effectiveness as a means of improving the Chinese Foundation students’ use of the article. Findings confirm the claim that L2 learner errors are systematic in nature and lend support to the value of Error Analysis. L1 transfer appears to be one of the main contributing factors in L2 errors. The salient errors identified in the Chinese Foundation corpus show that mismanagement of the article system is the most frequent cause of grammatical errors; Foundation tutors, however, perceive article errors to be neither frequent nor serious. An examination of existing materials reveals that the article is given low priority in ELT textbooks and treatments provided in pedagogical grammar books are inappropriate in terms of presentation, language and exercise types. The devised remedial materials employ both consciousness-raising activities and production exercises, using EAP language and authentic learner errors. Preliminary evaluation results suggest that the EA-informed customised materials have the potential to help learners to perform better in proofreading article errors in academic texts
    corecore