160 research outputs found

    Corpus-Based Research on Chinese Language and Linguistics

    Get PDF
    This volume collects papers presenting corpus-based research on Chinese language and linguistics, from both a synchronic and a diachronic perspective. The contributions cover different fields of linguistics, including syntax and pragmatics, semantics, morphology and the lexicon, sociolinguistics, and corpus building. There is now considerable emphasis on the reliability of linguistic data: the studies presented here are all grounded in the tenet that corpora, intended as collections of naturally occurring texts produced by a variety of speakers/writers, provide a more robust, statistically significant foundation for linguistic analysis. The volume explores not only the potential of using corpora as tools allowing access to authentic language material, but also the challenges involved in corpus interrogation, analysis, and building

    言語学的特徴を用いた述部の正規化と同義性判定

    Get PDF
    京都大学0048新制・課程博士博士(情報学)甲第17991号情博第513号新制||情||91(附属図書館)80835京都大学大学院情報学研究科知能情報学専攻(主査)教授 黒橋 禎夫, 教授 石田 亨, 教授 河原 達也学位規則第4条第1項該当Doctor of InformaticsKyoto UniversityDFA

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available

    Can humain association norm evaluate latent semantic analysis?

    Get PDF
    This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

    Phraseology in Corpus-based transaltion studies : stylistic study of two contempoarary Chinese translation of Cervantes's Don Quijote

    No full text
    The present work sets out to investigate the stylistic profiles of two modern Chinese versions of Cervantes???s Don Quijote (I): by Yang Jiang (1978), the first direct translation from Castilian to Chinese, and by Liu Jingsheng (1995), which is one of the most commercially successful versions of the Castilian literary classic. This thesis focuses on a detailed linguistic analysis carried out with the help of the latest textual analytical tools, natural language processing applications and statistical packages. The type of linguistic phenomenon singled out for study is four-character expressions (FCEXs), which are a very typical category of Chinese phraseology. The work opens with the creation of a descriptive framework for the annotation of linguistic data extracted from the parallel corpus of Don Quijote. Subsequently, the classified and extracted data are put through several statistical tests. The results of these tests prove to be very revealing regarding the different use of FCEXs in the two Chinese translations. The computational modelling of the linguistic data would seem to indicate that among other findings, while Liu???s use of archaic idioms has followed the general patterns of the original and also of Yang???s work in the first half of Don Quijote I, noticeable variations begin to emerge in the second half of Liu???s more recent version. Such an idiosyncratic use of archaisms by Liu, which may be defined as style shifting or style variation, is then analyzed in quantitative terms through the application of the proposed context-motivated theory (CMT). The results of applying the CMT-derived statistical models show that the detected stylistic variation may well point to the internal consistency of the translator in rendering the second half of Part I of the novel, which reflects his freer, more creative and experimental style of translation. Through the introduction and testing of quantitative research methods adapted from corpus linguistics and textual statistics, this thesis has made a major contribution to methodological innovation in the study of style within the context of corpus-based translation studies.Imperial Users onl
    corecore