10 research outputs found

    Phonetics of segmental FO and machine recognition of Korean speech

    Get PDF

    Onsetsu hyoki no kyotsusei ni motozuita Ajia moji nyuryoku intafesu ni kansuru kenkyu

    Get PDF
    制度:新 ; 報告番号:甲3450号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2011/10/26 ; 早大学位記番号:新577

    Applying dynamic Bayesian networks in transliteration detection and generation

    Get PDF
    Peter Nabende promoveert op methoden die programma’s voor automatisch vertalen kunnen verbeteren. Hij onderzocht twee systemen voor het genereren en vergelijken van transcripties: een DBN-model (Dynamische Bayesiaanse Netwerken) waarin Pair Hidden Markovmodellen zijn geïmplementeerd en een DBN-model dat op transductie is gebaseerd. Nabende onderzocht het effect van verschillende DBN-parameters op de kwaliteit van de geproduceerde transcripties. Voor de evaluatie van de DBN-modellen gebruikte hij standaard dataverzamelingen van elf taalparen: Engels-Arabisch, Engels-Bengaals, Engels-Chinees, Engels-Duits, Engels-Frans, Engels-Hindi, Engels-Kannada, Engels-Nederlands, Engels-Russisch, Engels-Tamil en Engels-Thai. Tijdens het onderzoek probeerde hij om verschillende modellen te combineren. Dat bleek een goed resultaat op te leveren

    Unsupervised learning for text-to-speech synthesis

    Get PDF
    This thesis introduces a general method for incorporating the distributional analysis of textual and linguistic objects into text-to-speech (TTS) conversion systems. Conventional TTS conversion uses intermediate layers of representation to bridge the gap between text and speech. Collecting the annotated data needed to produce these intermediate layers is a far from trivial task, possibly prohibitively so for languages in which no such resources are in existence. Distributional analysis, in contrast, proceeds in an unsupervised manner, and so enables the creation of systems using textual data that are not annotated. The method therefore aids the building of systems for languages in which conventional linguistic resources are scarce, but is not restricted to these languages. The distributional analysis proposed here places the textual objects analysed in a continuous-valued space, rather than specifying a hard categorisation of those objects. This space is then partitioned during the training of acoustic models for synthesis, so that the models generalise over objects' surface forms in a way that is acoustically relevant. The method is applied to three levels of textual analysis: to the characterisation of sub-syllabic units, word units and utterances. Entire systems for three languages (English, Finnish and Romanian) are built with no reliance on manually labelled data or language-specific expertise. Results of a subjective evaluation are presented

    Statistical and explicit learning of graphotactic patterns with no phonological counterpart: Evidence from artificial lexicon studies with 6– to 7-year-olds and adults

    Get PDF
    Children are powerful statistical spellers: They can learn novel written patterns with phonological counterparts under experimental conditions, via implicit learning processes, akin to “statistical learning” processes established for spoken language acquisition. Can these mechanisms fully account for children’s knowledge of written patterns? How does this ability relate to literacy measures? How does it compare to explicit learning? This thesis addresses these questions in a series of artificial lexicon experiments, inducing graphotactic learning under incidental and explicit conditions, and comparing it with measures of literacy. The first experiment adapted an existing design (Samara & Caravolas, 2014), with the goal of searching for stronger effects. Subsequent experiments address a further limitation: Previous studies assessed learning of spelling rules which have counterparts in spoken language; however, while this is also the case for some naturalistic spelling rules (e.g., English phonotactics prohibit word initial /ŋ/ and accordingly, written words cannot begin with ng), there are also purely visual constraints (graphotactics) (e.g., gz is an illegal spelling of a frequent word-final sound combination in English: *bagz). Can children learn patterns unconfounded from correlated phonotactics? In further experiments, developing and skilled spellers were exposed to patterns replete of phonotactic cues. In post-tests, participants generalized over both positional constraints embedded in semiartificial strings, and contextual constraints created using homophonic non-word stimuli. This was demonstrated following passive exposure and even under meaningful (word learning) conditions, and success in learning graphotactics was not hindered by learning word meanings. However, the effect sizes across this thesis remained small, and the hypothesized positive associations between learning performance under incidental conditions and literacy measures were never observed. This relationship was only found under explicit conditions, when pattern generalization benefited. Investigation of age effects revealed that adults and children show similar patterns of learning but adults learn faster from matched text

    Lexical segmentation and word recognition in fluent aphasia

    Get PDF
    The current thesis reports a psycholinguistic study of lexical segmentation and word recognition in fluent aphasia.When listening to normal running speech we must identify individual words from a continuous stream before we can extract a linguistic message from it. Normal listeners are able to resolve the segmentation problem without any noticeable difficulty. In this thesis I consider how fluent aphasic listeners perform the process of lexical segmentation and whether any of their impaired comprehension of spoken language has its provenance in the failure to segment speech normally.The investigation was composed of a series of 5 experiments which examined the processing of both explicit acoustic and prosodic cues to word juncture and features which affect listeners' segmentation of the speech stream implicitly, through inter-lexical competition of potential word matchesThe data collected show that lexical segmentation of continuous speech is compromised in fluent aphasia. Word hypotheses do not always accrue appropriate activational information from all of the available sources within the time frame in which segmentation problem is normally resolved. The fluent aphasic performance, although quantitatively impaired compared to normal, reflects an underlying normal competence; their processing seldom displays a totally qualitatively different processing profile to normal. They are able to engage frequency, morphological structure, and imageability as modulators of activation. Word class, a feature found to be influential in the normal resolution of segmentation is not used by the fluent aphasic studied. In those cases of occasional failure to adequately resolve segmentation by automatic frequency mediated activation, fluent aphasics invoke the metalinguistic influence of real world plausibility of alternative parses
    corecore