485 research outputs found
An investigation of speaker independent phrase break models in End-to-End TTS systems
This paper presents our work on phrase break prediction in the context of
end-to-end TTS systems, motivated by the following questions: (i) Is there any
utility in incorporating an explicit phrasing model in an end-to-end TTS
system?, and (ii) How do you evaluate the effectiveness of a phrasing model in
an end-to-end TTS system? In particular, the utility and effectiveness of
phrase break prediction models are evaluated in in the context of childrens
story synthesis, using listener comprehension. We show by means of perceptual
listening evaluations that there is a clear preference for stories synthesized
after predicting the location of phrase breaks using a trained phrasing model,
over stories directly synthesized without predicting the location of phrase
breaks.Comment: Submitted for review to IEEE Acces
Metalinguistic awareness in literate and illiterate children and adults: a psycholinguistic study
One of the major goals of psycholinguistic research is to be
able to account for those mental operations which enable
native speakers not only to perform the basic linguistic
capacities such as comprehending and producing an illimited
number of utterances, but also to exercise such
metalinguistic abilities as to judge utterances, segment
words, identify sounds and detect ambiguities.
The primary concern of this thesis was to elucidate the
processes underlying certain aspects of metalinguistic
awareness and to trace their relationship to advances in
maturation and acquisition of literacy. The guiding
principle has been to determine how much of what has been
considered normal cognitive development is in fact an
age-bound developmental phenomenon, or to what extent it
reflects the result of experiences associated with the
degree and extent of literacy. The need for this is
apparent on examining previous research which, as we
demonstrate, has confounded such theoretically important
variables as Age, Literacy and peculiarities of the native
language.
The aim of the methodology employed here was to deconf ound
such variables and add more insight as to the nature of
metalinguistic abilities. First, by employing literate and
illiterate children and adults, the design optimizes the
likelihood of tapping a precise relationship between
maturation, literacy and metalinguistic awareness. Second,
by using native speakers of Arabic, the general design
offers the opportunity to add insight from language yet another typologically different from English in which most
previous research was conducted. Third, by employing more
than one type of linguistic measure for the same population,
the design also hopes to answer one empirical question,
namely',, whether metalinguistic awareness can be
conceptualised as either multidimensional or unitary in
nature.
The Subjects who participated in the study were 120 Moroccan
Arabic speaking literate and illiterate children and adults
drawn from a relatively homogeneous socio-economic
background. A total of seven experiments -- some with
subtasks -- were used.
Six chapters make up the study. In Chapter 1 we have tried
to provide an introduction to the theoretical issues which
we think are of central importance to the topic under
investigation. Because our approach is essentially
psycholinguistic, Chapter 2 describes and discusses the
methodology employed to gather the necessary data for the
study. It is also concerned with the procedures used to
evaluate these data.
Chapters 3,4, and 5 form the main bulk of the research.
Using various experiments, they examine the extent to which
Ss deploy their metalinguistic knowledge in the process of
attending to and manipulating the following linguistic
units: (i) words (Chapter 3); (ii) syllables (Chapter 4);
(iii) segments (Chapter 5). Typically, each one of these
chapters considers various hypotheses and research questions
which concern the specific linguistic unit.
Finally, Chapter 6 draws general conclusions from the
general study and addresses some implications for linguistic
theory, psycholinguistic research and, although not
extensively, education research
Unsupervised learning for text-to-speech synthesis
This thesis introduces a general method for incorporating the distributional analysis
of textual and linguistic objects into text-to-speech (TTS) conversion systems.
Conventional TTS conversion uses intermediate layers of representation to bridge
the gap between text and speech. Collecting the annotated data needed to produce
these intermediate layers is a far from trivial task, possibly prohibitively so
for languages in which no such resources are in existence. Distributional analysis,
in contrast, proceeds in an unsupervised manner, and so enables the creation of
systems using textual data that are not annotated. The method therefore aids
the building of systems for languages in which conventional linguistic resources
are scarce, but is not restricted to these languages.
The distributional analysis proposed here places the textual objects analysed
in a continuous-valued space, rather than specifying a hard categorisation of those
objects. This space is then partitioned during the training of acoustic models for
synthesis, so that the models generalise over objects' surface forms in a way that
is acoustically relevant.
The method is applied to three levels of textual analysis: to the characterisation
of sub-syllabic units, word units and utterances. Entire systems for three
languages (English, Finnish and Romanian) are built with no reliance on manually
labelled data or language-specific expertise. Results of a subjective evaluation
are presented
Function of intonation in task-oriented dialogue
This thesis addresses the question of how intonation functions in conversation.
It examines the intonation and discourse function of single-word utterances in
spontaneous and read-aloud task-oriented dialogue (HCRC Map Task Corpus
containing Scottish English; see Anderson et al., 1991). To avoid some of the
pitfalls of previous studies in which such comparisons of intonation and discourse
structure tend to lack balance and focus more heavily on one analysis at
the expense of the other, it employs independently developed analyses. They
are the Conversational Games Analysis (as introduced in Kowtko, Isard and
Doherty, 1992) and a simple target level representation of intonation. Correlations
between categories of intonation and of discourse function in spontaneous
dialogue suggest that intonation reflects the function of an utterance. Contrary
to what one might expect from reading the literature, these categories are in
some cases categories of exclusion rather than inclusion.
Similar patterns result from the study of read-aloud dialogue. Discourse
function and intonation categories show a measure of correlation. One difference
that does appear between patterns across speech modes is that in many
instances of discourse function intonation categories shift toward tunes ending
low in the speaker's pitch range (e. g. a falling tune) for the read-aloud version.
This result is in accord with other contemporary studies (e. g. Blaauw, 1995).
The difference between spontaneous and read results suggests that read-aloud
dialogue - even that based on scripts which include hesitations and false starts
- is not a substitute for eliciting the same intonation strategies that are found
in spontaneous dialogue
Prosody and speech perception
The major concern of this thesis is with
models of speech perception. Following Gibson's
(1966) work on visual perception, it seeks to establish
whether there are sources of information in the speech
signal which can be responded to directly and which
specify the units of information of speech. The
treatment of intonation follows that of Halliday (1967)
and rhythm that of Abercrombie (1967) . By "prosody"
is taken to mean both the intonational and the
rhythmic aspects of speech.Experiments one to four show the
interdependence of prosody and grammar in the
perception of speech, although they leave open the
question of which sort of information is responded
to first. Experiments five and six, employing a
short-term memory paradigm and Morton's (1970)
"suffix effect" explanation, demonstrate that prosody
could well be responded to before grammar. Since
the previous experiments suggested a close connection
between the two, these results suggest that information
about grammatical structures may well be given
directly by prosody. In qthe final two experiments
the amount of prosodic information in fluent speech
that can be perceived independently of grammar and
meaning is investigated. Although tone -group
division seems to be given clearly enough by acoustic
cues, there are problems of interpretation with the
data on syllable stress assignments.In the concluding chapter, a three-stage
model of speech perception is proposed, following
never (1970), but incorporating prosodic analysis as
an integral part of the processing. The obtained
experimental results are integrated within this
model
- …