244 research outputs found

    Japanese word prediction

    Get PDF
    This report deals with the implementation of a Japanese word prediction engine written by the author. As this type of software does not seem to exist for Japanese at the time of writing, it could prove useful in Japanese augmentative and alternative communication (AAC) as a software tool used to improve typing speed and reduce the amount of keystrokes needed to produce text. Word prediction, in contrast to the word completion software commonly found in mobile phones and word processor intellisense engines etc. is a technique for suggesting a followup word after a word has just been completed. This is usually done by providing a list of the most probable words to the user, sorted by commonality (general and user-specific frequency). Combined with good word completion software and a responsive user interface, word prediction is one of the most powerful assistive tools available to movement impaired users today. The main goals of the thesis will be to: 1. Answer as many of the questions raised by the language differences as possible. 2. Investigate further avenues of research in the subject. 3. Make a functional word prediction prototype for Japanese. All project code is in the public domain and is currently hosted at: http://www.mediafire.com/?rrhqtqsgp6ei6m

    Acquired dyslexia in Japanese : implications for reading theory

    Get PDF
    Acquired dyslexia research has been conducted mainly on English neurological patients. A limited number of dyslexia studies on non-alphabetic orthographies are available. Classical case studies for acquired dyslexia in Japanese, which has two distinctive scripts (morphographic Kanji and phonographic Kana), reported 'script-dependent' dyslexia patterns. Although recent case studies showed 'script-independent' dyslexia patterns for surface and phonological dyslexia, a 'script-independent' deep dyslexia pattern in Japanese has not yet been reported. This study examined four Japanese aphasic patients, using psycholinguistically well-manipulated reading stimuli for both Kanji and Kana strings. YT, with phonological impairment, demonstrated the same effects of psycholinguistic variables as observed in English deep dyslexia, but semantic errors rarely occurred in Kana word reading. YT's concomitant deep dyslexia for Kanji, and phonological dyslexia for Kana fit the phonological impairment hypothesis, and this can be treated as a unique characteristic of Japanese deep dyslexia. HW, with semantic impairment, demonstrated a 'script-independent' surface dyslexia pattern. SO, with severe semantic impairment, demonstrated a surface dyslexia pattern in Kanji word reading, but showed substantial difficulty with Kanji nonword reading. ME, with phonological impairment and a visuo-spatial deficit, showed both lexicality and length effects on reading aloud Kana strings, thus suggesting phonological dyslexia for Kana. That is, the double dissociation between Kanji and Kana nonword reading was observed in SO and ME. These results suggest that Japanese acquired dyslexia patterns are not dependent on script-type, but are also not totally independent of script-type. These outcomes of this study are discussed in terms of universality and orthographic-specificity in acquired dyslexia. Moreover, possible workings of the Japanese version of the DRC model (Coltheart et al., 2001) and the triangle model Plaut, et al., 1996; Harm & Seidenberg, 2004) are presented in order to explain acquired dyslexia patterns in Japanese.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Acquired Dyslexia in Japanese: Implications for Reading Theory

    Get PDF
    Acquired dyslexia research has been conducted mainly on English neurological patients. A limited number of dyslexia studies on non-alphabetic orthographies are available. Classical case studies for acquired dyslexia in Japanese, which has two distinctive scripts (morphographic Kanji and phonographic Kana), reported 'script-dependent' dyslexia patterns. Although recent case studies showed 'script-independent' dyslexia patterns for surface and phonological dyslexia, a 'script-independent' deep dyslexia pattern in Japanese has not yet been reported. This study examined four Japanese aphasic patients, using psycholinguistically well-manipulated reading stimuli for both Kanji and Kana strings. YT, with phonological impairment, demonstrated the same effects of psycholinguistic variables as observed in English deep dyslexia, but semantic errors rarely occurred in Kana word reading. YT's concomitant deep dyslexia for Kanji, and phonological dyslexia for Kana fit the phonological impairment hypothesis, and this can be treated as a unique characteristic of Japanese deep dyslexia. HW, with semantic impairment, demonstrated a 'script-independent' surface dyslexia pattern. SO, with severe semantic impairment, demonstrated a surface dyslexia pattern in Kanji word reading, but showed substantial difficulty with Kanji nonword reading. ME, with phonological impairment and a visuo-spatial deficit, showed both lexicality and length effects on reading aloud Kana strings, thus suggesting phonological dyslexia for Kana. That is, the double dissociation between Kanji and Kana nonword reading was observed in SO and ME. These results suggest that Japanese acquired dyslexia patterns are not dependent on script-type, but are also not totally independent of script-type. These outcomes of this study are discussed in terms of universality and orthographic-specificity in acquired dyslexia. Moreover, possible workings of the Japanese version of the DRC model (Coltheart et al., 2001) and the triangle model Plaut, et al., 1996; Harm & Seidenberg, 2004) are presented in order to explain acquired dyslexia patterns in Japanese

    Automatic Scaling of Text for Training Second Language Reading Comprehension

    Get PDF
    For children learning their first language, reading is one of the most effective ways to acquire new vocabulary. Studies link students who read more with larger and more complex vocabularies. For second language learners, there is a substantial barrier to reading. Even the books written for early first language readers assume a base vocabulary of nearly 7000 word families and a nuanced understanding of grammar. This project will look at ways that technology can help second language learners overcome this high barrier to entry, and the effectiveness of learning through reading for adults acquiring a foreign language. Through the implementation of Dokusha, an automatic graded reader generator for Japanese, this project will explore how advancements in natural language processing can be used to automatically simplify text for extensive reading in Japanese as a foreign language

    Pronunciation Ambiguities in Japanese Kanji

    Full text link
    Japanese writing is a complex system, and a large part of the complexity resides in the use of kanji. A single kanji character in modern Japanese may have multiple pronunciations, either as native vocabulary or as words borrowed from Chinese. This causes a problem for text-to-speech synthesis (TTS) because the system has to predict which pronunciation of each kanji character is appropriate in the context. The problem is called homograph disambiguation. In Japanese TTS technology, the trick in any case is to know which is the right reading, which makes reading Japanese text a challenge. To solve the problem, this research provides a new annotated Japanese single kanji character pronunciation data set and describes an experiment using logistic regression (LR) classifier. A baseline is computed to compare with the LR classifier accuracy. The LR classifier improves the modeling performance by 16%. This experiment provides the first experimental research in Japanese single kanji homograph disambiguation. The annotated Japanese data is freely released to the public to support further work

    Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

    Full text link
    Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.Comment: Accepted by NeurIPS 202
    corecore