Search CORE

5,823 research outputs found

Native and Non-Native Speaker Judgements on the Quality of Synthesized Speech

Author: Clark Robert A. J.
Janska Anna C.
Publication venue
Publication date: 01/01/2010
Field of study

The difference between native speakers' and non-native speak- ers' naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimen- sional scaling analysis and compared to Mean opinion scores. It is shown that although the two groups generally behave in a similar manner the variance of non-native speaker judgements is generally higher. While both groups of subject can clearly distinguish natural speech from the best synthetic examples, the groups' responses to different artefacts present in the synthetic speech can vary

CiteSeerX

Edinburgh Research Explorer

Edinburgh Research Archive

Automatic labeling of contrastive word pairs from spontaneous spoken English

Author: Clark Robert A J
Leonardo Badino
Publication venue
Publication date: 01/01/2008
Field of study

This paper addresses the problem of automatically labeling contrast in spontaneous spoken speech, where contrast here is meant as a relation that ties two words that explicitly contrast with each other. Detection of contrast is certainly relevant in the analysis of discourse and information structure and also, because of the prosodic correlates of contrast, could play an important role in speech applications, such as text-to-speech synthesis, that need an accurate and discourse context related modeling of prosody. With this prospect we investigate the feasibility of automatic contrast labeling by training and evaluating on the Switchboard corpus a novel contrast tagger, based on Support Vector Machines (SVM), that combines lexical features, syntactic dependencies and WordNet semantic relations

CiteSeerX

Edinburgh Research Archive

Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech

Author: Clark Robert A J
Janska Anna C.
Publication venue
Publication date: 01/01/2010
Field of study

Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of evaluation. In a series of experiments based on data from the Blizzard Challenge 2008 the relations between Weighted Euclidean Distance Scaling and Simple Euclidean Distance Scaling is investigated to understand how aggregating data affects the MDS configuration. These results are compared to those collected as mean opinion scores (MOS). The ranks correspond, and MOS can be predicted from an object's space in the MDS generated stimulus space. The big advantage of MDS over MOS is its diagnostic value; dimensions along which stimuli vary are not correlated, as is the case in modular evaluation using MOS. Finally, it will be attempted to generalize from the MDS representations of the thoroughly tested subset to the aggregated data of the larger-scale Blizzard Challenge

Edinburgh Research Archive

Edinburgh Research Explorer

A Multi-Level Representation of f0 using the Continuous Wavelet Transform and the Discrete Cosine Transform

Author: Clark Robert A. J.
Ribeiro Manuel Sam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/08/2015
Field of study

We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Trans-form (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level represen-tations, thus modeling low-frequencies at higher levels and high-frequencies at lower-levels. Objective results indicate that this representation improves f0 prediction over tradi-tional short-term approaches. Subjective results show that improvements are seen over the typical MSD-HMM and are comparable to the recently proposed CWT-HMM, while us-ing less parameters. These results are discussed and future lines of research are proposed. Index Terms — prosody, HMM-based synthesis, f0 mod-eling, continuous wavelet transform, discrete cosine trans-form 1

CiteSeerX

Edinburgh Research Explorer

In-situ measurements of the optical absorption of dioxythiophene-based conjugated polymers

Author: Argun A.
Clark Robert J.
Cornick Matthew T.
Hwang J.
Ihas B. C.
Nikolou M.
Reynolds J. R.
Schwendeman I.
Tanner D. B.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/02/2011
Field of study

Conjugated polymers can be reversibly doped by electrochemical means. This doping introduces new sub-bandgap optical absorption bands in the polymer while decreasing the bandgap absorption. To study this behavior, we have prepared an electrochemical cell allowing measurements of the optical properties of the polymer. The cell consists of a thin polymer film deposited on gold-coated Mylar behind which is another polymer that serves as a counterelectrode. An infrared transparent window protects the upper polymer from ambient air. By adding a gel electrolyte and making electrical connections to the polymer-on-gold films, one may study electrochromism in a wide spectral range. As the cell voltage (the potential difference between the two electrodes) changes, the doping level of the conjugated polymer films is changed reversibly. Our experiments address electrochromism in poly(3,4-ethylene-dioxy-thiophene) (PEDOT) and poly(3,4-dimethyl-propylene-dioxy-thiophene) (PProDOT-Me

_2

). This closed electrochemical cell allows the study of the doping induced sub-bandgap features (polaronic and bipolaronic modes) in these easily oxidized and highly redox switchable polymers. We also study the changes in cell spectra as a function of polymer thickness and investigate strategies to obtain cleaner spectra, minimizing the contributions of water and gel electrolyte features

arXiv.org e-Print Archive

DSpace@MIT

Hybrid photonic circuit for multiplexed heralded single photons

Author: Alibart Olivier
Clark Alex S.
Collins Matthew J.
Eggleton Benjamin J.
Meany Thomas
Ngah Lutfi A.
Steel M. J.
Tanzilli Sébastien
Williams Robert J.
Withford Michael J.
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

A key resource for quantum optics experiments is an on-demand source of single and multiple photon states at telecommunication wavelengths. This letter presents a heralded single photon source based on a hybrid technology approach, combining high efficiency periodically poled lithium niobate waveguides, low-loss laser inscribed circuits, and fast (>1 MHz) fibre coupled electro-optic switches. Hybrid interfacing different platforms is a promising route to exploiting the advantages of existing technology and has permitted the demonstration of the multiplexing of four identical sources of single photons to one output. Since this is an integrated technology, it provides scalability and can immediately leverage any improvements in transmission, detection and photon production efficiencies.Comment: 5 pages, double column, 3 figure

arXiv.org e-Print Archive

HAL-UNICE

Macquarie University ResearchOnline

Language acquisition and implication for language change: A computational model.

Author: Clark Robert A J
Publication venue
Publication date: 01/01/1997
Field of study

Computer modeling techniques, when applied to language acquisition problems, give an often unrealized insight into the diachronic change that occurs in language over successive generations. This paper shows that using assumptions about language acquisition to model successive generations of learners in a computer simulation, can have a drastic effect on the long term changes that occur in a language. More importantly, it shows that slight changes in the acquisition model can have drastic effects on language change

Edinburgh Research Archive

Generating Synthetic Pitch Contours Using Prosodic Structure.

Author: Clark Robert A J
Publication venue: The University of Edinburgh: College of Humanities and Social Science: School of Philosophy, Psychology and Language Sciences
Publication date: 01/06/2003
Field of study

This thesis addresses the problem of generating a range of natural sounding pitch contours for speech synthesis to convey the specific meanings of different intonation patterns. Where other models can synthesise intonation adequately for short sentences, longer sentences often sound unnatural as phrasing is only really considered at the sentence level. We build models within a framework of prosodic structure derived from the linguistic analysis of a corpus of speech. We show that the use of appropriate prosodic structure allows us to produce better contours for longer sentences and allows us to capture the original style of the corpus. The resulting model is also sufficiently flexible to be adapted to suitable styles for use in other domains. To convey specific meanings we need to be able to generate different accent types. We find that the infrequency of some accent and boundary types makes them hard to model from the corpus alone. We address this issue by developing a model which allows us to isolate the parameters which control specific accent type shapes, so that we can reestimate these parameters based on other data

Edinburgh Research Archive

Using prosodic structure to improve pitch range variation in text to speech synthesis.

Author: Clark Robert A J
Publication venue: International Congress of Phonetic Sciences
Publication date: 01/01/1999
Field of study

The intonation produced by current text-to-speech systems is often either flat or artificial sounding. Pitch range is one of the contributing factors which could be improved by more detailed linguistic knowledge. In this study, a corpus of read speech is analysed to provide information about prosodic structure and pitch range, which can be used to improve the intonation models for speech synthesis. The results show how the pitch range variation is most apparent at a tone group level of prosodic structure, and how phrase initial and phrase final tone groups have significantly different pitch ranges from tone groups which are phrase medial

Edinburgh Research Archive