Search CORE

5 research outputs found

An infrastructure for Turkish prosody generation in text-to-speech synthesis

Author: Kulekci M. Oguzhan
Külekçi M. Oğuzhan
Oflazer Kemal
Publication venue
Publication date: 01/06/2006
Field of study

Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We propose a phonological phrase detection scheme based on syntactic analysis for Turkish and assign one of three intonation levels to words in detected phrases. Empirical observations on 100 sentences show that the proposed scheme works with approximately 85% accuracy

Sabanci University Research Database

Ling browser: a NLP based browser for linguistic information

Author: Armagan Onsel
Armağan Önsel
Publication venue
Publication date: 01/01/2008
Field of study

Linguistic students and researchers need practical tools providing information about elements of a language to understand its properties and conduct research on that language. Many computer assisted language learning tools have been developed since the emerging of computers. However, none of these tools aim to satisfy the needs of advanced learners. In this thesis, we introduce a tool, LingBrowser, which is an intelligent hyper-text browser that employs natural language processing technology to provide an interactive environment for advanced language learners to access all kinds of linguistic information about the words in a Turkish text. LingBrowser provides immediate information about morphological, segmental, pronunciation and semantic properties about the words in any text. Also, with a search interface, LingBrowser can locate examples of many linguistic phonemena in the source text

Sabanci University Research Database

Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

Author: Kulekci Oguzhan M.
Külekci Oğuzhan M.
Publication venue
Publication date: 01/01/2006
Field of study

The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

Sabanci University Research Database

The architecture and the implementation of a finite state pronunciation lexicon for Turkish

Author: Inkelas Sharon
Oflazer Kemal
Publication venue: Elsevier
Publication date: 01/01/2006
Field of study

This paper describes the architecture and the implementation of a full-scale pronunciation lexicon for Turkish using finite state technology. The system produces at its output, a parallel representation of the pronunciation and the morphological analysis of the word form so that further disambiguation processes can be used to disambiguate pronunciation. The pronunciation representation is based on the SAMPA standard and also encodes the position of the primary stress. The computation of the position of the primary stress depends on an interplay of any exceptional stress in root words and stress properties of certain morphemes, and requires that a full morphological analysis be done. The system has been implemented using XRCE Finite State Toolkit

Sabanci University Research Database