9,010 research outputs found

    Learning from Analysis of Japanese EFL Texts

    Get PDF
    Japan has a long tradition of teaching English as a foreign language (EFL). A common feature of EFL courses is reliance on specific textbooks as a basis for graded teaching, and periods in Japanese EFL history are marked by the introduction of different textbook series. These sets of textbooks share the common goal of taking students from beginners through to able English language users, so one would expect to find common characteristics across such series. As part of an on-going research programme in which Japanese EFL textbooks from different historical periods are compared and contrasted, we have recently focussed our efforts on using textual analysis tools to highlight distinctive characteristics of such textbooks. The present paper introduces one such analysis tool and describes some of the results from its application to three textbook series from distinct periods in Japanese EFL history. In so doing, we aim to encourage the use of textual analysis and seek to expose salient features of EFL texts which would likely remain hidden without such analytical techniques

    A role for the developing lexicon in phonetic category acquisition

    Get PDF
    Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning

    Committee-Based Sample Selection for Probabilistic Classifiers

    Full text link
    In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

    カイワ ダイアログ アンショウ ニ ジュウジ サセル ガイコクゴ シドウホウ ガ スピーキングジ ノ テイケイ ヒョウゲン ノ シヨウ ト アンキ ガクシュウ ニ オヨボス エイキョウ ニ カンスル キソ ケンキュウ

    Get PDF
    PDF/A formatsAccess: via World Wide Web東京外国語大学大学院総合国際学研究科博士 (学術) 論文 (2016年4月)Author's thesis (Ph.D)--Tokyo University of Foreign Studies, 2016博甲第214号Bibliography: p. 183-195Summary in English and Japanese東京外国語大学 (Tokyo University of Foreign Studies)博士 (学術

    Formulaic Sequences as Fluency Devices in the Oral Production of Native Speakers of Polish

    Get PDF
    In this paper we attempt to determine the nature and strength of the relationship between the use of formulaic sequences and productive fluency of native speakers of Polish. In particular, we seek to validate the claim that speech characterized by a higher incidence of formulaic sequences is produced more rapidly and with fewer hesitation phenomena. The analysis is based on monologic speeches delivered by 45 speakers of L1 Polish. The data include both the recordings and their transcriptions annotated for a number of objective fluency measures. In the first part of the study the total of formulaic sequences is established for each sample. This is followed by determining a set of temporal measures of the speakers’ output (speech rate, articulation rate, mean length of runs, mean length of pauses, phonation time ratio). The study provides some preliminary evidence of the fluency-enhancing role of formulaic language. Our results show that the use of formulaic sequences is positively and significantly correlated with speech rate, mean length of runs and phonation time ratio. This suggests that a higher concentration of formulaic material in output is associated with faster speed of speech, longer stretches of speech between pauses and an increased amount of time filled with speech

    A comparison of parsing technologies for the biomedical domain

    Get PDF
    This paper reports on a number of experiments which are designed to investigate the extent to which current nlp resources are able to syntactically and semantically analyse biomedical text. We address two tasks: parsing a real corpus with a hand-built widecoverage grammar, producing both syntactic analyses and logical forms; and automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g., hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that exible and yet constrained `preprocessing ' techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to `package up' complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the xml-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-o between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers
    corecore