5,823 research outputs found
Native and Non-Native Speaker Judgements on the Quality of Synthesized Speech
The difference between native speakers' and non-native speak- ers' naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimen- sional scaling analysis and compared to Mean opinion scores. It is shown that although the two groups generally behave in a similar manner the variance of non-native speaker judgements is generally higher. While both groups of subject can clearly distinguish natural speech from the best synthetic examples, the groups' responses to different artefacts present in the synthetic speech can vary
Automatic labeling of contrastive word pairs from spontaneous spoken English
This paper addresses the problem of automatically labeling contrast in spontaneous spoken speech, where contrast here is meant as a relation that ties two words that explicitly contrast with each other. Detection of contrast is certainly relevant in the analysis of discourse and information structure and also, because of the prosodic correlates of contrast, could play an important role in speech applications, such as text-to-speech synthesis, that need an accurate and discourse context related modeling of prosody. With this prospect we investigate the feasibility of automatic contrast labeling by training and evaluating on the Switchboard corpus a novel contrast tagger, based on Support Vector Machines (SVM), that combines lexical features, syntactic dependencies and WordNet semantic relations
Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech
Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of evaluation. In a series of experiments based on data from the Blizzard Challenge 2008 the relations between Weighted Euclidean Distance Scaling and Simple Euclidean Distance Scaling is investigated to understand how aggregating data affects the MDS configuration. These results are compared to those collected as mean opinion scores (MOS). The ranks correspond, and MOS can be predicted from an object's space in the MDS generated stimulus space. The big advantage of MDS over MOS is its diagnostic value; dimensions along which stimuli vary are not correlated, as is the case in modular evaluation using MOS. Finally, it will be attempted to generalize from the MDS representations of the thoroughly tested subset to the aggregated data of the larger-scale Blizzard Challenge
A Multi-Level Representation of f0 using the Continuous Wavelet Transform and the Discrete Cosine Transform
We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Trans-form (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level represen-tations, thus modeling low-frequencies at higher levels and high-frequencies at lower-levels. Objective results indicate that this representation improves f0 prediction over tradi-tional short-term approaches. Subjective results show that improvements are seen over the typical MSD-HMM and are comparable to the recently proposed CWT-HMM, while us-ing less parameters. These results are discussed and future lines of research are proposed. Index Terms â prosody, HMM-based synthesis, f0 mod-eling, continuous wavelet transform, discrete cosine trans-form 1
In-situ measurements of the optical absorption of dioxythiophene-based conjugated polymers
Conjugated polymers can be reversibly doped by electrochemical means. This
doping introduces new sub-bandgap optical absorption bands in the polymer while
decreasing the bandgap absorption. To study this behavior, we have prepared an
electrochemical cell allowing measurements of the optical properties of the
polymer. The cell consists of a thin polymer film deposited on gold-coated
Mylar behind which is another polymer that serves as a counterelectrode. An
infrared transparent window protects the upper polymer from ambient air. By
adding a gel electrolyte and making electrical connections to the
polymer-on-gold films, one may study electrochromism in a wide spectral range.
As the cell voltage (the potential difference between the two electrodes)
changes, the doping level of the conjugated polymer films is changed
reversibly. Our experiments address electrochromism in
poly(3,4-ethylene-dioxy-thiophene) (PEDOT) and
poly(3,4-dimethyl-propylene-dioxy-thiophene) (PProDOT-Me). This closed
electrochemical cell allows the study of the doping induced sub-bandgap
features (polaronic and bipolaronic modes) in these easily oxidized and highly
redox switchable polymers. We also study the changes in cell spectra as a
function of polymer thickness and investigate strategies to obtain cleaner
spectra, minimizing the contributions of water and gel electrolyte features
Hybrid photonic circuit for multiplexed heralded single photons
A key resource for quantum optics experiments is an on-demand source of
single and multiple photon states at telecommunication wavelengths. This letter
presents a heralded single photon source based on a hybrid technology approach,
combining high efficiency periodically poled lithium niobate waveguides,
low-loss laser inscribed circuits, and fast (>1 MHz) fibre coupled
electro-optic switches. Hybrid interfacing different platforms is a promising
route to exploiting the advantages of existing technology and has permitted the
demonstration of the multiplexing of four identical sources of single photons
to one output. Since this is an integrated technology, it provides scalability
and can immediately leverage any improvements in transmission, detection and
photon production efficiencies.Comment: 5 pages, double column, 3 figure
Language acquisition and implication for language change: A computational model.
Computer modeling techniques, when applied to language
acquisition problems, give an often unrealized
insight into the diachronic change that occurs in language
over successive generations. This paper shows
that using assumptions about language acquisition to
model successive generations of learners in a computer
simulation, can have a drastic effect on the long
term changes that occur in a language. More importantly,
it shows that slight changes in the acquisition
model can have drastic effects on language change
Generating Synthetic Pitch Contours Using Prosodic Structure.
This thesis addresses the problem of generating a range of natural sounding pitch
contours for speech synthesis to convey the specific meanings of different intonation
patterns.
Where other models can synthesise intonation adequately for short sentences,
longer sentences often sound unnatural as phrasing is only really considered at
the sentence level. We build models within a framework of prosodic structure
derived from the linguistic analysis of a corpus of speech. We show that the use
of appropriate prosodic structure allows us to produce better contours for longer
sentences and allows us to capture the original style of the corpus. The resulting
model is also sufficiently flexible to be adapted to suitable styles for use in other
domains.
To convey specific meanings we need to be able to generate different accent
types. We find that the infrequency of some accent and boundary types makes
them hard to model from the corpus alone. We address this issue by developing
a model which allows us to isolate the parameters which control specific accent
type shapes, so that we can reestimate these parameters based on other data
Using prosodic structure to improve pitch range variation in text to speech synthesis.
The intonation produced by current text-to-speech systems is often
either flat or artificial sounding. Pitch range is one of the contributing
factors which could be improved by more detailed linguistic
knowledge.
In this study, a corpus of read speech is analysed to provide
information about prosodic structure and pitch range, which can be
used to improve the intonation models for speech synthesis.
The results show how the pitch range variation is most apparent
at a tone group level of prosodic structure, and how phrase initial
and phrase final tone groups have significantly different pitch
ranges from tone groups which are phrase medial
- âŠ