46 research outputs found

    New Perspectives in Sinographic Language Processing Through the Use of Character Structure

    Full text link
    Chinese characters have a complex and hierarchical graphical structure carrying both semantic and phonetic information. We use this structure to enhance the text model and obtain better results in standard NLP operations. First of all, to tackle the problem of graphical variation we define allographic classes of characters. Next, the relation of inclusion of a subcharacter in a characters, provides us with a directed graph of allographic classes. We provide this graph with two weights: semanticity (semantic relation between subcharacter and character) and phoneticity (phonetic relation) and calculate "most semantic subcharacter paths" for each character. Finally, adding the information contained in these paths to unigrams we claim to increase the efficiency of text mining methods. We evaluate our method on a text classification task on two corpora (Chinese and Japanese) of a total of 18 million characters and get an improvement of 3% on an already high baseline of 89.6% precision, obtained by a linear SVM classifier. Other possible applications and perspectives of the system are discussed.Comment: 17 pages, 5 figures, presented at CICLing 201

    Language variation: Papers on variation and change in the Sinosphere and in the Indosphere in honour of James A. Matisoff

    Get PDF

    Všichni jsme metamoderní: Úvod do metamodernismu

    Get PDF
    This work will provide a genealogy of the metamodern condition. The postmodern sentiment is by no means gone, but it has transformed so much at this point that the vague postmodern label has become obsolete. Indeed, the cultural sphere is now dealing with a whole set of urgent social, political and economic realities, spurred on by climatic, financial and geopolitical crises, that postmodernism could never seriously handle. In the course of this paper, I will examine the modernist and post-modernist streams of thought that have resulted in the emergence of metamodernism, which somehow counter-intuitively combines modernist sincerity with post-postmodernist irony. Moreover, I will assess this stream of thought through the lens of Actor-network theory, as outlined by Bruno Latour. In the end, I hope to demonstrate that metamodernism offers something that postmodernism, disappointed from the failure of modernist projects, never could: hope. Key Words Metamodernism, Actor-network theory, Post-modernism, Nordic School, Dutch School, IronyTato práce poskytne genealogii metamoderního stavu. Ačkoliv postmoderní sentiment ještě nevymizel, jeho podoba se proměnila do takové míry, že vágní nálepka postmodernismu se dnes již jeví jako zastaralá. Kulturní sféra se dnes zabývá celou škálou urgentních sociálních, politických a ekonomický skutečností, které jsou dále poháněny klimatickými, finančními a geopolitickými krizemi, na které již postmodernismus nemá odpověď. V průběhu této práce se zaměřím na modernistické a postmodernistické myšlenky, které vyústily v metamodernismus, v němž se možná neintuitivně skloubí modernistická upřímnost a postmodernistická ironie. Dále se zaměřím na rozbor těchto myšlenek prostřednictvím teorie sítí aktérů, navrženou Brunem Latourem. V závěru se pokusím demonstrovat, že metamodernismus poskytuje něco, co postmodernismus ve svém zklamání z modernistických projektů nikdy nabídnout nedokázal: naději. Klíčová slova Metamodernismus, Teorie sítí aktérů, Postmodernismus, Nordická škola, Holandská škola, IronieDepartment of Anglophone Literatures and CulturesÚstav anglofonních literatur a kulturFaculty of ArtsFilozofická fakult

    Indo-European vocabulary in Old Chinese : a new thesis on the emergence of Chinese language and civilization in the late Neolithic age

    Get PDF
    This study is a much expanded version of the paper I read at the XXXII International Congress for Asian and North African Studies on August 28, 1986 in Hamburg (Germany). Contents 1. Recent developments in the field of historical linguistics 2. Monosyllabic structure of Chinese words and Indo-European stems 3. Tonal accents of Middle Chinese 4. Preliminaries on the comparison of consonants and vowels 5. Some IE stems corresponding to Chinese words of entering tone 6. Middle Chinese tones and final consonants of IE stems 7. Some IE stems corresponding to Chinese words of rising tone 8. Some IE stems corresponding to Chinese words of vanishing tone 9. Some IE stems corresponding to Chinese words of level tone 10. Reconstruction of Middle Chinese vocalism according to Yün-ching 11. Old Chinese vocalism 12. Vocalic correspondences between Chinese and IE 13. Initials of Old Chinese 14. Initial consonant clusters in Old Chinese as seen from IE-stems 15. Proximity of Chinese to Germanic 16. Relation of Old Chinese to neighboring languages 17. Emergence of Chinese Empire and language in the middle of the third millennium B.C. Appendix * Abbrevations * Bibliography * Rhyme Tables of Early Middle Chinese (600) * Rhyme Tables of Early Mandarin (1300) * Word Index o English o Pinyin In 1786, just over two hundred years ago, comparative historical linguistics was born, when Sir William Jones (1746-1794) discovered the relationship between Old-Indian Sanskrit, Greek, and Latin. Since then, the emerging Indo-European philology has thrown much light on the early history of mankind in Eurasia. During the past two hundred years, many suggestions were also made in regard to relationships of Indo-European to other languages such as Semitic, Altaic, Austronesian, Korean etc., but Indo-Europeanists commonly rejected such attempts for want of convincing evidence. As to Chinese, Joseph Edkins was the first to advance the thesis of its proximity to Indo-European. In his work China's Place in Philology. An Attempt to show that the Language of Europe and Asia have a Common Origin (1871) he presented a number of Chinese words similar to those of Indo-European. In his time, Edkins' thesis seemed bold and extravagant. But today, more than a hundred years later, we are in a much better position to carry out a comprehensive and well-founded comparative study. Since the end of the nineteenth century, many Sinologists have been engaged in reconstruction of the mediaeval and archaic readings of Chinese characters. Among them, Karlgren (1889-1978) was the most successful, and in 1940 he published a comprehensive phonological and etymological dictionary entitled Grammata Serica. In the meantime, the Indo-Europeanists Alois Walde (1869-1924) and Julius Pokorny (1887-1970) were devoting themselves to the compilation of a useful etymological dictionary. The result was the Indogermanisches Etymologisches Wörterbuch by Pokorny (1959) which provides a solid basis for our lexical comparisons. Soon thereafter, some Sinologists made use of the two dictionaries by Karlgren and Pokorny to compare Chinese and Indo-European words. In 1967, an unaffiliated German scholar, Jan Ulenbrook, published an article "Einige Übereinstirnrnungen zwischen dem Chinesischen und dem Indogermanischen", in which he claimed that 57 words are related. Shortly afterwards, Tor Ulving of the University of Goteborg, Sweden, wrote a review of this article framing the title as a question: "Indo-European elements in Chinese?" While working on his thesis on word families in Chinese, Ulving compiled for his own use two dictionaries: "Archaic Chinese - English" and "English - Archaic Chinese", and discovered thereby 238 Chinese words similar to Indo-European roots. In spite of this considerable number of word equivalents, however, Mr. Ulving became discouraged and, as he told me in his letter of April, 1986, has given up his researches in this field. The skepticism, common among Indo-Europeanists in regard to comparative studies with other languages, is largely based on the dogmatic opinion that only morphology is relevant but not vocabulary. Since the typology of Chinese seems to preclude a cognate relation to Indo-European, they are inclined to discard any lexical correspondences as merely accidental or onomatopoetic. Besides, prehistorical contacts and mixtures between these languages seem not conceivable, as the Indo-Europeans are supposed to have originated in Northern Europe or at best in the Central Asian steppe, thousands of miles away from East Asia. Hence, any research into a relationship between Old Chinese and Indo-European languages would be but futile from the outset. Yet there are also opposing views among Indo-Europeanists. Investigations into Germanic languages and the oldest Indo-European language, Hittite, led some of them to a critical revision of the prevailing conception about a Proto-Indo-European. Hermann Hirt (1934) for instance states: "Inflexion of Indo-European languages is due to a relatively late development, and its correct comprehension can be achieved only by proceeding from the time of non-inflexion." And Carl Karstien (1936) holds the opinion that "Chinese corresponds most ideally to the hypothetic prototype of Indo-European." Regarding vocabulary, there are striking similarities in the monosyllabic structure of the basic words. In modern German and English, all the words of everyday speech are monosyllabic and their stereotypical structure is: initial consonant(s) + vowel(s) + final consonant(s). The same word structure is valid for Chinese as well. It is fundamentally different from the disyllabic structure of Altaic words and from the triconsonantal-disyllabic structure of Semitic words. Characteristic of the monosyllabic word structure is, besides, the complexity of the syllable nucleus, which consists of different vowels and vowel clusters in contrast to the monophthongal vocalism of polysyllabic words. Another objection raised to comparisons between Chinese and Indo-European is the existence of tonal accents in Chinese. Since most modern Indo-European languages have only expiratory accents, Chinese is considered to be a highly exotic language. Yet, even in Chinese, the use of tonal accents as a means of lexical differentiation is a result of comparatively recent development in the long history of Chinese language, the earliest monuments of which date back to 1300 B.C. (cf. Chang 1970, p.21). Unknown to Old Chinese, the existence of tonal accents was for the first time mentioned in the 5th century by Shen Yüeh (441-513). In Middle Chinese (Mch.) there were four tone categories: A P'ing-sheng 平 a level tone (which developed into Mandarin tone 1 or 2). B Shang-sheng 上 a rising tone (Mandarin tone 3). C Ch'u-sheng 去 a vanishing, i.e. falling tone (Mandarin tone 4). D Ju-sheng 入 an entering tone with a staccato effect, the word being abruptly stopped by a final consonant -p, -t, -k. (In Early Mandarin the words of this tone lost their final consonant and were distributed among the tones 2, 3 and 4, respectively according to the phonation of initials). In Middle Chinese, words of the entering tone were the only group which still preserved the final stops and therefore a close syllabic structure. So they are most appropriate for convincing comparisons with monosyllabic Indo-European word stems. The final stops -p, -t, -k of the entering tone are nowadays still extant in daily speech of several dialects in South China as well as in Chinese borrowings in Japanese, Vietnamese and Korean. As a speaker of a Taiwan dialect of Minnan origin, I could immediately identify some Indo-European stems with corresponding Chinese words. Besides, the command of Japanese and German was also a great help for this study. In the following lists I have chosen a number of Indo-European stems which are phonetically and semantically equivalent to Chinese words. Correspondences in initial and final consonants refer to the points of articulation, thus we have equations: IE labials = Old Chinese labials, IE dentals = dentals, IE l, r = dentals (cf. p. 31); Ø, i (final and medial) IE velars = velars and laryngeals, and occasionally (the so-called "satem"-forms) IE velars = dental sibilants and affricates. Regarding the manner of articulation, there are no regular correspondences between Indo-European and Chinese consonants like Grimm's law which is valid among Indo-European dialects to a certain extent. But this is not astonishing, since in Old Chinese the alternation of initials in voicing was a conventional means of creating new words from one basic form. The rules of vocalic correpondences among Indo-European dialects are quite complex. Vowels permanently change their qualities from one language to another, and from time to time within one language also, as is well known from the history of English pronunciations. Generally, the vocalism of Old Greek is taken as the standard for Proto-Indo-European. Old Chinese vowels corresponds nearly (cf. p. 30), but the details about the reconstruction of Middle and Old Chinese vocalism will be treated later (pp. 26-30). For the moment, it is necessary to notice in advance that the stem of ablauting Germanic verbs is the form of preterite or noun, rather than that of infinitive as assumed hitherto. Therefore, in some cases I must slightly modify the basic vowel of verbal stems given in Pokorny, in order to get better basis for comparison. As Old Chinese verbs were non-flexional, they might probably have preserved the original vowel the best

    Tocharské výpůjčky v čínštině

    Get PDF
    Tato práce byla vytvořena za účelem revize důkazů lexikálního vypůjčování z tocharských jazyků do jazyků čínských. Užitá metodologie spočívá na lexikálních seznamech, předchozích etymologických zjištěních, lingvistické typologii a antropologických informacích. Pro předzpracování dat byla vytvořena sada poloautomatických skriptů. Předkládán je kvalitativní výzkum založený na předchozích zjištěních, podpořený přímými daty. Výstupem této práce by měla být testovatelná, která lze extrahovat do počítačem zpracovatelné formy.This work was created to review the evidence for lexical borrowing from the Tocharian languages to the Chinese languages. The used methodology relies on lexical lists, previous etymological findings, linguistic typology and anthropological input. For preparatory data manipulation, a set of semi- automatic scripts has been created. Presented is a qualitative research based on previous findings assisted by raw data. The outcome of this work should be testable findings which could be extracted to a computer processable form.Ústav obecné lingvistikyInstitute of LinguisticsFilozofická fakultaFaculty of Art

    Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

    No full text
    International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

    Investigating Semantic Alignment in Character Learning of Chinese as a Foreign Language: The Use and Effect of the Imagery Based Encoding Strategy

    Get PDF
    For learners of Chinese as a foreign language (CFL), character learning is frustrating. This research postulated that this difficulty may mainly come from a lack of semantic understanding of character-denoted meanings. Language theories support that when a learner’s semantic meaning increases, the orthographic structures that represent the underlying meanings also improve. This study aimed to reveal CFL learners’ cognitive abilities and processes in visual-semantic learning of Chinese characters. Particularly, this study investigated the process by which English-speaking adolescent CFL learners, at the beginning to intermediate level, made mental images of character-denoted meanings to visually encode and retrieve character forms. Quantitative and qualitative data were gathered from image making questionnaires, writing, and reading tests, after learning characters in three commonly-used teaching methods (i.e., English, pictorial, and verbal). The data were analyzed based on a triangulation of the literature from Neuro-Semantic Language Learning Theory, scientific findings in cognitive psychology, and neuroscience. The study found that participants’ semantic abilities to understand character-denoted meanings emerged, but were still restricted in familiar orthographic forms. The use of the imagery strategy as a semantic ability predicted better performances, most evidently in writing; however, the ability in using the imagery strategy to learn characters was still underdeveloped, and needed to be supported with sufficient contextual information. Implications and further research in visual-semantic learning and teaching characters were suggested

    A aplicação do jogo Jukugeemu : é possível aprender vocabulário e ideogramas de língua japonesa jogando?

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Letras, Departamento de Línguas Estrangeiras e Tradução, Programa de Pós-Graduação em Linguística Aplicada, 2019.Esta dissertação consiste em um estudo quali-quantitativo da aplicabilidade do jogo jukugeemu como ferramenta de aquisição de ideogramas e vocabulário em língua japonesa. Buscando para isso coletar dados quantitativos referentes à experiência do jogar com o jogo e dados qualitativos acerca de sua experiência, percepções e crenças acerca da aplicabilidade do jogo. Para isso um total de 41 participantes se prontificaram em diferentes etapas de coleta, resultando num total de 199 partidas observadas, bem como as opiniões qualitativas de 31 jogadores. Os objetivos desta investigação foram verificar se existe uma correlação entre o desempenho no jogo e tempo de estudo da língua japonesa, ou a familiaridade com jogos eletrônicos. Também foi observada a correlação entre a percepção de satisfação com o jogo e o tempo de estudo no idioma. Além disso foram observadas as crenças e percepções acerca do jogo e sua aplicabilidade. O aporte teórico foi dividido em dois grandes grupos. O de aquisição de um novo sistema de escrita (BASSETTI; COOK, 2005; BASSETTI, 2006) e a aquisição do sistema de escrita de ideogramas japonês (FILHO, 2006; FRELLESVIG, 2010; IWAKAMI, 1992; KESS; MIYAMOTO, 1999; OLIVEIRA, 2013; OGASSAWARA, 2006), e o aprendizado mediado por jogos eletrônicos (GEE, 2003; SALEN; ZIMMERMAN, 2003; PETRY, 2016; PRESNKY, 2001), sua possibilidade de analogia com fatores humanos e sociais (HUIZINGA, 1999) e sua aplicabilidade à aquisição do idioma japonês (DEHAAN, 2005a; 2005b; 2013; DEHAAN; REED; KUWADA, 2010). Para a análise de dados utilizou-se um aporte teórico tanto quantitativo (FONSECA, 2002; GATTI, 2004) quanto qualitativo (MINAYO, DESLANDES, et al., 2002) lançando mão dos seguintes instrumentos de coleta de dados a) o próprio jogo jukugeemu e sua criação de históricos de partida, b) observação não participante, c) questionário online e d) entrevista semiestruturada. Os resultados obtidos indicam a existência de uma correlação fraca entre o desempenho no jogo e o tempo de estudo em língua japonesa ou a familiaridade com jogos eletrônicos. As análises também indicam uma fraca correlação entre a percepção de satisfação e o tempo de estudo em língua japonesa. Além disso a pesquisa mostra que o jogo aparenta ser benéfico à aquisição de ideogramas e palavras em língua japonesa, especialmente ideogramas, por propiciar um ambiente de testagem de hipóteses com feedback imediato que pode possibilitar o desenvolvimento do pensamento crítico.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).This essay consists in a quali-quantitative study of the applicability of the game jukugeemu as a tool for Japanese language’s vocabulary and ideogram acquisition. Looking into collecting quantitative data regarding the experience of playing the game and qualitative data about the experience, perception and beliefs related to the applicability of the game. For such a total of 41 participants volunteered in different steps of the data collection, resulting in a total of 199 observed matches, as well as the qualitative insights of 31 players. The goals of this investigation were to identify if there is a correlation between the performance in game and the time of studies in Japanese language, or the familiarity with electronic games. It was also noted the correlation between the perception of enjoyment with the game and the time of studies in the language, likewise it was observed the beliefs and perceptions regarding the game and its applicability. The theoretical input was divided in two major groups. One for the acquisition of a new writing system (BASSETTI; COOK, 2005; BASSETTI, 2006) and the acquisition of the Japanese ideogram-based system of writing (FILHO, 2006; FRELLESVIG, 2010; IWAKAMI, 1992; KESS; MIYAMOTO, 1999; OLIVEIRA, 2013; OGASSAWARA, 2006), and another for the learning mediated through electronic gaming (GEE, 2003; SALEN; ZIMMERMAN, 2003; PETRY, 2016; PRESNKY, 2001), its possible analogy to social and human factors (HUIZINGA, 1999), and its applicability towards the acquiring of the Japanese language (DEHAAN, 2005a; 2005b; 2013; DEHAAN; REED; KUWADA, 2010). For the data analysis the theoretical input used was both quantitative (FONSECA, 2002; GATTI, 2004) and qualitative (MINAYO, DESLANDES, et al., 2002) by means of the following data collecting tools a) the jukugeemu itself through the creation of a match-history library, b) bystander observation, c) online questionnaire and d) semi-structured interview. The obtained results indicate that there is a weak correlation among the performance in game and the time of studies in Japanese language or familiarity with electronic games. The analysis have also indicated a weak connection between the perceived enjoyment and the time of studies in Japanese language. Besides that the research has shown that the game seems to be beneficial to the acquisition of Japanese ideogram and words, especially ideogram, due to providing an environment of hypothesis testing, with immediate feedback which can allow the development of critical thinking
    corecore