32,915 research outputs found

    Metalinguistic Knowledge and Language Ability in University-Level L2 Learners

    Get PDF
    Existing research indicates that instructed learners' L2 proficiency and their metalinguistic knowledge are moderately correlated. However, the operationalization of the construct of metalinguistic knowledge has varied somewhat across studies. Metalinguistic knowledge has typically been operationalized as learners' ability to correct, describe, and explain L2 errors. More recently, this operationalization has been extended to additionally include learners' L1 language-analytic ability as measured by tests traditionally used to assess components of language learning aptitude. This article reports on a study which employed a narrowly focused measure of L2 proficiency and incorporated L2 language-analytic ability into a measure of metalinguistic knowledge. It was found that the linguistic and metalinguistic knowledge of advanced university-level L1 English learners of L2 German correlated strongly. Moreover, the outcome of a principal components analysis suggests that learners' ability to correct, describe, and explain highlighted L2 errors and their L2 language-analytic ability may constitute components of the same construct. The theoretical implications of these findings for the concept of metalinguistic knowledge in L2 learning are considered. © Oxford University Press 2007

    Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

    Get PDF
    The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
    corecore