Search CORE

13 research outputs found

An infrastructure for Turkish prosody generation in text-to-speech synthesis

Author: Kulekci M. Oguzhan
Külekçi M. Oğuzhan
Oflazer Kemal
Publication venue
Publication date: 01/06/2006
Field of study

Text-to-speech engines benefit from natural language processing while generating the appropriate prosody. In this study, we investigate the natural language processing infrastructure for Turkish prosody generation in three steps as pronunciation disambiguation, phonological phrase detection and intonation level assignment. We focus on phrase boundary detection and intonation assignment. We propose a phonological phrase detection scheme based on syntactic analysis for Turkish and assign one of three intonation levels to words in detected phrases. Empirical observations on 100 sentences show that the proposed scheme works with approximately 85% accuracy

Sabanci University Research Database

Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

Author: Kulekci Oguzhan M.
Külekci Oğuzhan M.
Publication venue
Publication date: 01/01/2006
Field of study

The statistical morphological disambiguation of agglutinative languages suffers from data sparseness. In this study, we introduce the notion of distinguishing tag sets (DTS) to overcome the problem. The morphological analyses of words are modeled with DTS and the root major part-of-speech tags. The disambiguator based on the introduced representations performs the statistical morphological disambiguation of Turkish with a recall of as high as 95.69 percent. In text-to-speech systems and in developing transcriptions for acoustic speech data, the problem occurs in disambiguating the pronunciation of a token in context, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. We apply the morphological disambiguator to this problem of pronunciation disambiguation and achieve 99.54 percent recall with 97.95 percent precision. Most text-to-speech systems perform phrase level accentuation based on content word/function word distinction. This approach seems easy and adequate for some right headed languages such as English but is not suitable for languages such as Turkish. We then use a a heuristic approach to mark up the phrase boundaries based on dependency parsing on a basis of phrase level accentuation for Turkish TTS synthesizers

Sabanci University Research Database

Robustness of Massively Parallel Sequencing Platforms

Author: Aksu Soner
Alkan Can
Güngör Tunga
Hach Faraz
Kavak Pınar
Kulekci M. Oguzhan
Sağıroğlu Mahmut Şamil
Turkish Human Genome Project
Yüksel Bayram
Şahinalp S. Cenk
Publication venue
Publication date: 01/01/2015
Field of study

The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications

Directory of Open Access Journals

Simon Fraser University Institutional Repository

PubMed Central

FigShare

Pronunciation disambiguation in Turkish

Author: Kulekci M. Oguzhan
Külekçi M. Oğuzhan
Oflazer Kemal
Publication venue: Springer Berlin / Heidelberg
Publication date: 01/10/2005
Field of study

In text-to-speech systems and in developing transcriptions for acoustic speech data, one is faced with the problem of disambiguating the pronunciation of a token in the context it is used, so that the correct pronunciation can be produced or the transcription uses the correct set of phonemes. In this paper we investigate the problem of pronunciation disambiguation in Turkish as a natural language processing problem and present preliminary results using a morphological disambiguation technique based on the notion of distinguishing tag sets

Sabanci University Research Database

Comparisons of total and novel SNP and indel intersections of B1 vs. T1 and B2 vs. T2. B1, T1:pooled S1 calls from BGI and TÜBİTAK datasets using HaplotypeCaller; B2, T2:pooled S2 calls from BGI and TÜBİTAK datasets, respectively.

Author: Bayram Yüksel (797766)
Can Alkan (9984)
Faraz Hach (797770)
M. Oguzhan Kulekci (797768)
Mahmut Şamil Sağıroğlu (797771)
Pınar Kavak (797765)
S. Cenk Şahinalp (5665246)
Soner Aksu (797767)
Tunga Güngör (797769)
Publication venue
Publication date
Field of study

Comparisons of total and novel SNP and indel intersections of B1 vs. T1 and B2 vs. T2. B1, T1:pooled S1 calls from BGI and TÜBİTAK datasets using HaplotypeCaller; B2, T2:pooled S2 calls from BGI and TÜBİTAK datasets, respectively.</p

FigShare

Detailed view of novel SNP and indel distributions of S2 that map to common repeats.

Author: Bayram Yüksel (797766)
Can Alkan (9984)
Faraz Hach (797770)
M. Oguzhan Kulekci (797768)
Mahmut Şamil Sağıroğlu (797771)
Pınar Kavak (797765)
S. Cenk Şahinalp (5665246)
Soner Aksu (797767)
Tunga Güngör (797769)
Publication venue
Publication date
Field of study

Detailed view of novel SNP and indel distributions of S2 that map to common repeats.</p

FigShare

Summary of the sequence datasets.

Author: Bayram Yüksel (797766)
Can Alkan (9984)
Faraz Hach (797770)
M. Oguzhan Kulekci (797768)
Mahmut Şamil Sağıroğlu (797771)
Pınar Kavak (797765)
S. Cenk Şahinalp (5665246)
Soner Aksu (797767)
Tunga Güngör (797769)
Publication venue
Publication date
Field of study

Basic statistics of the two samples (S1, S2) sequenced at two different centers. S1T refers to sample S1 sequenced at TÜBİTAK, where the dataset S1B was generated from the same sample at BGI. Similarly, datasets from sample S2 are denoted as S2T and S2B.Summary of the sequence datasets.</p

FigShare

Detailed view of novel SNP and indel distributions of S1 that map to common repeats.

Author: Bayram Yüksel (797766)
Can Alkan (9984)
Faraz Hach (797770)
M. Oguzhan Kulekci (797768)
Mahmut Şamil Sağıroğlu (797771)
Pınar Kavak (797765)
S. Cenk Şahinalp (5665246)
Soner Aksu (797767)
Tunga Güngör (797769)
Publication venue
Publication date
Field of study

Detailed view of novel SNP and indel distributions of S1 that map to common repeats.</p

FigShare

Underlying sequence content of novel SNP and indel calls.

Author: Bayram Yüksel (797766)
Can Alkan (9984)
Faraz Hach (797770)
M. Oguzhan Kulekci (797768)
Mahmut Şamil Sağıroğlu (797771)
Pınar Kavak (797765)
S. Cenk Şahinalp (5665246)
Soner Aksu (797767)
Tunga Güngör (797769)
Publication venue
Publication date
Field of study

A) SNPs and B) indels in the genome of S1. C) SNPs and D) indels in the genome of S2.</p

FigShare

An infrastructure for Turkish prosody generation in text-to-speech synthesis

Statistical morphological disambiguation with application to disambiguation of pronunciations in Turkish /

Robustness of Massively Parallel Sequencing Platforms

Pronunciation disambiguation in Turkish

Detailed view of novel SNP and indel distributions of <i>S</i><sub>2</sub> that map to common repeats.

Summary of the sequence datasets.

Detailed view of novel SNP and indel distributions of <i>S</i><sub>1</sub> that map to common repeats.

Underlying sequence content of novel SNP and indel calls.