10 research outputs found
Onsetsu hyoki no kyotsusei ni motozuita Ajia moji nyuryoku intafesu ni kansuru kenkyu
制度:新 ; 報告番号:甲3450号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2011/10/26 ; 早大学位記番号:新577
Applying dynamic Bayesian networks in transliteration detection and generation
Peter Nabende promoveert op methoden die programma’s voor automatisch vertalen kunnen verbeteren. Hij onderzocht twee systemen voor het genereren en vergelijken van transcripties: een DBN-model (Dynamische Bayesiaanse Netwerken) waarin Pair Hidden Markovmodellen zijn geïmplementeerd en een DBN-model dat op transductie is gebaseerd. Nabende onderzocht het effect van verschillende DBN-parameters op de kwaliteit van de geproduceerde transcripties. Voor de evaluatie van de DBN-modellen gebruikte hij standaard dataverzamelingen van elf taalparen: Engels-Arabisch, Engels-Bengaals, Engels-Chinees, Engels-Duits, Engels-Frans, Engels-Hindi, Engels-Kannada, Engels-Nederlands, Engels-Russisch, Engels-Tamil en Engels-Thai. Tijdens het onderzoek probeerde hij om verschillende modellen te combineren. Dat bleek een goed resultaat op te leveren
Recommended from our members
Deep Learning for Automatic Assessment and Feedback of Spoken English
Growing global demand for learning a second language (L2), particularly English, has led to
considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications.
This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One
of the challenges in automatic spoken language assessment is giving candidates feedback on
particular aspects, or views, of their spoken language proficiency, in addition to the overall
holistic score normally provided. Another is detecting pronunciation and other types of errors
at the word or utterance level and feeding them back to the learner in a useful way.
It is usually difficult to obtain accurate training data with separate scores for different
views and, as examiners are often trained to give holistic grades, single-view scores can
suffer issues of consistency. Conversely, holistic scores are available for various standard
assessment tasks such as Linguaskill. An investigation is thus conducted into whether
assessment scores linked to particular views of the speaker’s ability can be obtained from
systems trained using only holistic scores.
End-to-end neural systems are designed with structures and forms of input tuned to single
views, specifically each of pronunciation, rhythm, intonation and text. By training each
system on large quantities of candidate data, individual-view information should be possible
to extract. The relationships between the predictions of each system are evaluated to examine
whether they are, in fact, extracting different information about the speaker. Three methods
of combining the systems to predict holistic score are investigated, namely averaging their
predictions and concatenating and attending over their intermediate representations. The
combined graders are compared to each other and to baseline approaches.
The tasks of error detection and error tendency diagnosis become particularly challenging
when the speech in question is spontaneous and particularly given the challenges posed by
the inconsistency of human annotation of pronunciation errors. An approach to these tasks is
presented by distinguishing between lexical errors, wherein the speaker does not know how a
particular word is pronounced, and accent errors, wherein the candidate’s speech exhibits
consistent patterns of phone substitution, deletion and insertion. Three annotated corpora
x
of non-native English speech by speakers of multiple L1s are analysed, the consistency of
human annotation investigated and a method presented for detecting individual accent and
lexical errors and diagnosing accent error tendencies at the speaker level
Unsupervised learning for text-to-speech synthesis
This thesis introduces a general method for incorporating the distributional analysis
of textual and linguistic objects into text-to-speech (TTS) conversion systems.
Conventional TTS conversion uses intermediate layers of representation to bridge
the gap between text and speech. Collecting the annotated data needed to produce
these intermediate layers is a far from trivial task, possibly prohibitively so
for languages in which no such resources are in existence. Distributional analysis,
in contrast, proceeds in an unsupervised manner, and so enables the creation of
systems using textual data that are not annotated. The method therefore aids
the building of systems for languages in which conventional linguistic resources
are scarce, but is not restricted to these languages.
The distributional analysis proposed here places the textual objects analysed
in a continuous-valued space, rather than specifying a hard categorisation of those
objects. This space is then partitioned during the training of acoustic models for
synthesis, so that the models generalise over objects' surface forms in a way that
is acoustically relevant.
The method is applied to three levels of textual analysis: to the characterisation
of sub-syllabic units, word units and utterances. Entire systems for three
languages (English, Finnish and Romanian) are built with no reliance on manually
labelled data or language-specific expertise. Results of a subjective evaluation
are presented
Statistical and explicit learning of graphotactic patterns with no phonological counterpart: Evidence from artificial lexicon studies with 6– to 7-year-olds and adults
Children are powerful statistical spellers: They can learn novel written patterns with phonological counterparts under experimental conditions, via implicit learning processes, akin to “statistical learning” processes established for spoken language acquisition. Can these mechanisms fully account for children’s knowledge of written patterns? How does this ability relate to literacy measures? How does it compare to explicit learning? This thesis addresses these questions in a series of artificial lexicon experiments, inducing graphotactic learning under incidental and explicit conditions, and comparing it with measures of literacy. The first experiment adapted an existing design (Samara & Caravolas, 2014), with the goal of searching for stronger effects. Subsequent experiments address a further limitation: Previous studies assessed learning of spelling rules which have counterparts in spoken language; however, while this is also the case for some naturalistic spelling rules (e.g., English phonotactics prohibit word initial /ŋ/ and accordingly, written words cannot begin with ng), there are also purely visual constraints (graphotactics) (e.g., gz is an illegal spelling of a frequent word-final sound combination in English: *bagz). Can children learn patterns unconfounded from correlated phonotactics? In further experiments, developing and skilled spellers were exposed to patterns replete of phonotactic cues. In post-tests, participants generalized over both positional constraints embedded in semiartificial strings, and contextual constraints created using homophonic non-word stimuli. This was demonstrated following passive exposure and even under meaningful (word learning) conditions, and success in learning graphotactics was not hindered by learning word meanings. However, the effect sizes across this thesis remained small, and the hypothesized positive associations between learning performance under incidental conditions and literacy measures were never observed. This relationship was only found under explicit conditions, when pattern generalization benefited. Investigation of age effects revealed that adults and children show similar patterns of learning but adults learn faster from matched text
Lexical segmentation and word recognition in fluent aphasia
The current thesis reports a psycholinguistic study of lexical segmentation and word recognition in fluent aphasia.When listening to normal running speech we must identify individual words from a continuous stream before we can extract a linguistic message from it. Normal listeners are able to resolve the segmentation problem without any noticeable difficulty. In this thesis I consider how fluent aphasic listeners perform the process of lexical segmentation and whether any of their impaired comprehension of spoken language has its provenance in the failure to segment speech normally.The investigation was composed of a series of 5 experiments which examined the processing of both explicit acoustic and prosodic cues to word juncture and features which affect listeners' segmentation of the speech stream implicitly, through inter-lexical competition of potential word matchesThe data collected show that lexical segmentation of continuous speech is compromised in fluent aphasia. Word hypotheses do not always accrue appropriate activational information from all of the available sources within the time frame in which segmentation problem is normally resolved. The fluent aphasic performance, although quantitatively impaired compared to normal, reflects an underlying normal competence; their processing seldom displays a totally qualitatively different processing profile to normal. They are able to engage frequency, morphological structure, and imageability as modulators of activation. Word class, a feature found to be influential in the normal resolution of segmentation is not used by the fluent aphasic studied. In those cases of occasional failure to adequately resolve segmentation by automatic frequency mediated activation, fluent aphasics invoke the metalinguistic influence of real world plausibility of alternative parses