308 research outputs found

    Pronunciation Ambiguities in Japanese Kanji

    Full text link
    Japanese writing is a complex system, and a large part of the complexity resides in the use of kanji. A single kanji character in modern Japanese may have multiple pronunciations, either as native vocabulary or as words borrowed from Chinese. This causes a problem for text-to-speech synthesis (TTS) because the system has to predict which pronunciation of each kanji character is appropriate in the context. The problem is called homograph disambiguation. In Japanese TTS technology, the trick in any case is to know which is the right reading, which makes reading Japanese text a challenge. To solve the problem, this research provides a new annotated Japanese single kanji character pronunciation data set and describes an experiment using logistic regression (LR) classifier. A baseline is computed to compare with the LR classifier accuracy. The LR classifier improves the modeling performance by 16%. This experiment provides the first experimental research in Japanese single kanji homograph disambiguation. The annotated Japanese data is freely released to the public to support further work

    Natural language software registry (second edition)

    Get PDF

    Combined ERP/fMRI Evidence for Early Word Recognition Effects in the Posterior Inferior Temporal Gyrus

    Get PDF
    Two brain regions with established roles in reading are the posterior middle temporal gyrus and the posterior fusiform gyrus. Lesion studies have also suggested that the region located between them, the posterior inferior temporal gyrus (pITG), plays a central role in word recognition. However, these lesion results could reflect disconnection effects since neuroimaging studies have not reported consistent lexicality effects in pITG. Here we tested whether these reported pITG lesion effects are due to disconnection effects or not using parallel ERP/fMRI studies. We predicted that the Recognition Potential (RP), a left-lateralized ERP negativity that peaks at about 200–250 ms, might be the electrophysiological correlate of pITG activity and that conditions that evoke the RP (perceptual degradation) might therefore also evoke pITG activity. In Experiment 1, twenty-three participants performed a lexical decision task (temporally flanked by supraliminal masks) while having high-density 129-channel ERP data collected. In Experiment 2, a separate group of fifteen participants underwent the same task while having fMRI data collected in a 3T scanner. Examination of the ERP data suggested that a canonical Recognition Potential effect was produced. The strongest corresponding effect in the fMRI data was in the vicinity of the pITG. In addition, results indicated stimulus-dependent functional connectivity between pITG and a region of the posterior fusiform gyrus near the visual word form area (VWFA) during word compared to nonword processing. These results provide convergent spatiotemporal evidence that the pITG contributes to early lexical access through interaction with the VWFA

    Max Planck Institute for Psycholinguistics: Annual report 1996

    No full text

    A communicative approach to computer-assisted-learning in teaching Japanese as a foreign language

    No full text
    This study looks at the use of CAL (Computer-AssistedLearning) for TJFL (Teaching Japanese as a Foreign Language). An Appropriate model of CAL is sought based on language teaching and learning theories. The model consists of teachers' and students' aspects. Core ideas of language teaching, factors of learning, and an educational aspect are blended into a theoretically ideal CAL syllabus. Existing course (soft) ware systems are classified based on this model and are examined. Suggestions for improvements and ideas for CAL in TJFL are presented

    Description and Analysis of the Structural Symbolism of a Buddhist Ritual

    Get PDF

    Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

    No full text
    International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

    JTEC panel report on machine translation in Japan

    Get PDF
    The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems

    Learner Modelling for Individualised Reading in a Second Language

    Get PDF
    Extensive reading is an effective language learning technique that involves fast reading of large quantities of easy and interesting second language (L2) text. However, graded readers used by beginner learners are expensive and often dull. The alternative is text written for native speakers (authentic text), which is generally too difficult for beginners. The aim of this research is to overcome this problem by developing a computer-assisted approach that enables learners of all abilities to perform effective extensive reading using freely-available text on the web. This thesis describes the research, development and evaluation of a complex software system called FERN that combines learner modelling and iCALL with narrow reading of electronic text. The system incorporates four key components: (1) automatic glossing of difficult words in texts, (2) individualised search engine for locating interesting texts of appropriate difficulty, (3) supplementary exercises for introducing key vocabulary and reviewing difficult words and (4) reliably monitoring reading and reporting progress. FERN was optimised for English speakers learning Spanish, but is easily adapted for learners of others languages. The suitability of the FERN system was evaluated through corpus analysis, machine translation analysis and a year-long study with second year university Spanish class. The machine translation analysis combined with the classroom study demonstrated that the word and phrase error rate generated in FERN is low enough to validate the use of machine translation to automatically generate glosses, but is high enough that a translation dictionary is required as a backup. The classroom study demonstrated that when aided by glosses students can read at over 100 words per minute if they know 95% of the words, whereas compared to the 98% word knowledge required for effective unaided extensive reading. A corpus analysis demonstrated that beginner learners of Spanish can do effective narrow reading of news articles using FERN after learning only 200–300 high-frequency word families, in addition to familiarity with English-Spanish cognates and proper nouns. FERN also reliably monitors reading speeds and word counts, and provides motivating progress reports, which enable teachers to set concrete reading goals that dramatically increase the quantity that students read, as demonstrated in the user study

    Cross-sensory correspondences and symbolism in spoken and written language

    Get PDF
    Lexical sound symbolism in language appears to exploit the feature associations embedded in cross-sensory correspondences. For example, words incorporating relatively high acoustic frequencies (i.e., front/close rather than back/open vowels) are deemed more appropriate as names for concepts associated with brightness, lightness in weight, sharpness, smallness, speed and thinness, because higher pitched sounds appear to have these cross-sensory features. Correspondences also support prosodic sound symbolism. For example, speakers might raise the fundamental frequency of their voice to emphasise the smallness of the concept they are naming. The conceptual nature of correspondences and their functional bi-directionality indicate they should also support other types of symbolism, including a visual equivalent of prosodic sound symbolism. For example, the correspondence between auditory pitch and visual thinness predicts that a typeface with relatively thin letter strokes will reinforce a word's reference to a relatively high pitch sound (e.g., squeal). An initial rating study confirms that the thinness-thickness of a typeface's letter strokes accesses the same cross-sensory correspondences observed elsewhere. A series of speeded word classification experiments then confirms that the thinness-thickness of letter strokes can facilitate a reader's comprehension of the pitch of a sound named by a word (thinner letter strokes being appropriate for higher pitch sounds), as can the brightness of the text (e.g., white-on-grey text being appropriate for the names of higher pitch sounds). It is proposed that the elementary visual features of text are represented in the same conceptual system as word meaning, allowing cross-sensory correspondences to support visual symbolism in language
    corecore