Search CORE

1,380 research outputs found

POS tagging for German : how important is the right context?

Author: Ivanova Steliana
Kübler Sandra
Publication venue
Publication date: 01/01/2008
Field of study

Part-of-Speech tagging is generally performed by Markov models, based on bigram or trigram models. While Markov models have a strong concentration on the left context of a word, many languages require the inclusion of right context for correct disambiguation. We show for German that the best results are reached by a combination of left and right context. If only left context is available, then changing the direction of analysis and going from right to left improves the results. In a version of MBT (Daelemans et al., 1996) with default parameter settings, the inclusion of the right context improved POS tagging accuracy from 94.00% to 96.08%, thus corroborating our hypothesis. The version with optimized parameters reaches 96.73%

Hochschulschriftenserver - Universität Frankfurt am Main

SINICA CORPUS : Design Methodology for Balanced Corpora

Author: Chang Li-Ping
Chen Keh-Jiann
Hsu Hui-Li
Huang Chu-Ren
Publication venue: 'Institute for the Study of Language and Information, Kyung Hee University'
Publication date: 01/01/1996
Field of study

Waseda University Repository

SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks

Author: Weber Volker
Wermter Stefan
Publication venue
Publication date: 31/12/1996
Field of study

In this paper, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the SCREEN system which is based on this new robust, learned and flat analysis. In this paper, we focus on a detailed description of SCREEN's architecture, the flat syntactic and semantic analysis, the interaction with a speech recognizer, and a detailed evaluation analysis of the robustness under the influence of noisy or incomplete input. The main result of this paper is that flat representations allow more robust processing of spontaneous spoken language than deeply structured representations. In particular, we show how the fault-tolerance and learning capability of connectionist networks can support a flat analysis for providing more robust spoken-language processing within an overall hybrid symbolic/connectionist framework.Comment: 51 pages, Postscript. To be published in Journal of Artificial Intelligence Research 6(1), 199

arXiv.org e-Print Archive

CiteSeerX

Universaar

Acronym

VP\u3csup\u3e2\u3c/sup\u3e: The Role of User Modeling in Correcting Errors in Second Language Learning

Author: Schuster Ethel
Publication venue: ScholarlyCommons
Publication date: 01/12/1984
Field of study

This paper describes a system, VP2, that has been implemented to tutor non-native speakers in English. The system applies Artificial Intelligence techniques developed in Natural Language research. In particular, it differs from standard approaches by employing a model of its users to customize instruction based on knowledge of the student\u27s native language. The system focuses on the acquisition of English verb-particle and verb-prepositional phrase constructions. It diagnoses errors that students make due to interference of their native language. VP2 recognizes syntactic variation in English sentences, allowing freer translation. VP2 is a modular system: its model of a user\u27s native language can easily be replaced by a model of another language. Its correction strategy is based upon comparison of the native language model with a model of English. The problems and solutions presented in this paper are related to the more general question of how modeling previous knowledge facilitates instruction in a new skill

ScholarlyCommons@Penn

Detecting grammatical errors with treebank-induced, probabilistic parsers

Author: Wagner Joachim
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2012
Field of study

Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

Irish Universities

DCU Online Research Access Service

An Analysis of Grammatical Errors in Speech Joko Widodo, Presidential of Indonesia, at APEC CEO Summit ‘’YouTube Video by APEC’’

Author: Handayani Nurma Dhona
Hutagaol Jelly
Silaen Elperinda
Publication venue: 'Universitas Pendidikan Muhammadiyah (UNIMUDA) Sorong'
Publication date: 10/10/2022
Field of study

Language is a means of communication for every human being. Also, several types of languages such as regional languages, state languages and international languages is English. English in general has existed since elementary school to college. Many adults are still wrong in the use of good and correct English so that it requires more extra learning. One method of improving English through vocabulary is speaking like a speech. In the speech, in this study, researchers took sources from speech. In this study, the researcher aims to analyse grammatical errors and speech focus on grammatical errors in presenters, which have been obtained using qualitative descriptive methods. The method of presentation is using descriptive presentation in the form of words or sentences that do not have a percentage or value in the form of numbers, where the researcher analyses according to the error class category of the data. From the results of this study, the researcher found 20 data where the data consisted of errors of auxiliary verbs and tenses

Scientific & Charity Journal UNIMUDA (Universitas Pendidikan Muhammadiyah Sorong)

Addressing the grammar needs of Chinese EAP students: an account of a CALL materials development project

Author: Chuang Fei-Yu
Publication venue
Publication date
Field of study

This study investigated the grammar needs of Chinese EAP Foundation students and developed electronic self-access grammar materials for them. The research process consisted of three phases. In the first phase, a corpus linguistics based error analysis was conducted, in which 50 student essays were compiled and scrutinized for formal errors. A tagging system was specially devised and employed in the analysis. The EA results, together with an examination of Foundation tutors’ perceptions of error frequency and gravity led me to prioritise article errors for treatment; in the second phase, remedial materials were drafted based on the EA results and insights drawn from my investigations into four research areas (article pedagogy, SLA theory, grammar teaching approaches and CALL methodologies) and existing grammar materials; in the third phase, the materials were refined and evaluated for their effectiveness as a means of improving the Chinese Foundation students’ use of the article. Findings confirm the claim that L2 learner errors are systematic in nature and lend support to the value of Error Analysis. L1 transfer appears to be one of the main contributing factors in L2 errors. The salient errors identified in the Chinese Foundation corpus show that mismanagement of the article system is the most frequent cause of grammatical errors; Foundation tutors, however, perceive article errors to be neither frequent nor serious. An examination of existing materials reveals that the article is given low priority in ELT textbooks and treatments provided in pedagogical grammar books are inappropriate in terms of presentation, language and exercise types. The devised remedial materials employ both consciousness-raising activities and production exercises, using EAP language and authentic learner errors. Preliminary evaluation results suggest that the EA-informed customised materials have the potential to help learners to perform better in proofreading article errors in academic texts

Warwick Research Archives Portal Repository

Recommended from our members

Extracting Arabic composite names using genitive principles of Arabic grammar

Author: Khalil H
Miltan M
Osman T
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

Named Entity Recognition (NER) is a basic prerequisite of using Natural Language Processing (NLP) for information retrieval. Arabic NER is especially challenging as the language is morphologically rich and has short vowels with no capitalisation convention. This article presents a novel rule-based approach that uses linguistic grammar-based techniques to extract Arabic composite names from Arabic text. Our approach uniquely exploits the genitive Arabic grammar rules; in particular, the rules regarding the identification of definite nouns (معرفة) and indefinite nouns (نكرة) to support the process of extracting composite names. Based on domain knowledge and Arabic Genitive Rules (AGR), the developed approach formalises a set of syntactical rules and linguistic patterns that initially use genitive patterns to classify definiteness within phrases and then extracts proper composite names from the unstructured text. The developed novel approach does not place any constraints on the length of the Arabic composite name and our initial experimentation demonstrated high recall and precision results when the NER algorithm was applied to a financial domain corpus

Nottingham Trent Institutional Repository (IRep)