5,823 research outputs found

    Native and Non-Native Speaker Judgements on the Quality of Synthesized Speech

    Get PDF
    The difference between native speakers' and non-native speak- ers' naturalness judgements of synthetic speech is investigated. Similar/difference judgements are analysed via a multidimen- sional scaling analysis and compared to Mean opinion scores. It is shown that although the two groups generally behave in a similar manner the variance of non-native speaker judgements is generally higher. While both groups of subject can clearly distinguish natural speech from the best synthetic examples, the groups' responses to different artefacts present in the synthetic speech can vary

    Automatic labeling of contrastive word pairs from spontaneous spoken English

    Get PDF
    This paper addresses the problem of automatically labeling contrast in spontaneous spoken speech, where contrast here is meant as a relation that ties two words that explicitly contrast with each other. Detection of contrast is certainly relevant in the analysis of discourse and information structure and also, because of the prosodic correlates of contrast, could play an important role in speech applications, such as text-to-speech synthesis, that need an accurate and discourse context related modeling of prosody. With this prospect we investigate the feasibility of automatic contrast labeling by training and evaluating on the Switchboard corpus a novel contrast tagger, based on Support Vector Machines (SVM), that combines lexical features, syntactic dependencies and WordNet semantic relations

    Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech

    Get PDF
    Multidimensional scaling (MDS) has been suggested as a useful tool for the evaluation of the quality of synthesized speech. However, it has not yet been extensively tested for its applica- tion in this specific area of evaluation. In a series of experiments based on data from the Blizzard Challenge 2008 the relations between Weighted Euclidean Distance Scaling and Simple Euclidean Distance Scaling is investigated to understand how aggregating data affects the MDS configuration. These results are compared to those collected as mean opinion scores (MOS). The ranks correspond, and MOS can be predicted from an object's space in the MDS generated stimulus space. The big advantage of MDS over MOS is its diagnostic value; dimensions along which stimuli vary are not correlated, as is the case in modular evaluation using MOS. Finally, it will be attempted to generalize from the MDS representations of the thoroughly tested subset to the aggregated data of the larger-scale Blizzard Challenge

    A Multi-Level Representation of f0 using the Continuous Wavelet Transform and the Discrete Cosine Transform

    Get PDF
    We propose a representation of f0 using the Continuous Wavelet Transform (CWT) and the Discrete Cosine Trans-form (DCT). The CWT decomposes the signal into various scales of selected frequencies, while the DCT compactly represents complex contours as a weighted sum of cosine functions. The proposed approach has the advantage of combining signal decomposition and higher-level represen-tations, thus modeling low-frequencies at higher levels and high-frequencies at lower-levels. Objective results indicate that this representation improves f0 prediction over tradi-tional short-term approaches. Subjective results show that improvements are seen over the typical MSD-HMM and are comparable to the recently proposed CWT-HMM, while us-ing less parameters. These results are discussed and future lines of research are proposed. Index Terms — prosody, HMM-based synthesis, f0 mod-eling, continuous wavelet transform, discrete cosine trans-form 1

    In-situ measurements of the optical absorption of dioxythiophene-based conjugated polymers

    Full text link
    Conjugated polymers can be reversibly doped by electrochemical means. This doping introduces new sub-bandgap optical absorption bands in the polymer while decreasing the bandgap absorption. To study this behavior, we have prepared an electrochemical cell allowing measurements of the optical properties of the polymer. The cell consists of a thin polymer film deposited on gold-coated Mylar behind which is another polymer that serves as a counterelectrode. An infrared transparent window protects the upper polymer from ambient air. By adding a gel electrolyte and making electrical connections to the polymer-on-gold films, one may study electrochromism in a wide spectral range. As the cell voltage (the potential difference between the two electrodes) changes, the doping level of the conjugated polymer films is changed reversibly. Our experiments address electrochromism in poly(3,4-ethylene-dioxy-thiophene) (PEDOT) and poly(3,4-dimethyl-propylene-dioxy-thiophene) (PProDOT-Me2_2). This closed electrochemical cell allows the study of the doping induced sub-bandgap features (polaronic and bipolaronic modes) in these easily oxidized and highly redox switchable polymers. We also study the changes in cell spectra as a function of polymer thickness and investigate strategies to obtain cleaner spectra, minimizing the contributions of water and gel electrolyte features

    Hybrid photonic circuit for multiplexed heralded single photons

    Get PDF
    A key resource for quantum optics experiments is an on-demand source of single and multiple photon states at telecommunication wavelengths. This letter presents a heralded single photon source based on a hybrid technology approach, combining high efficiency periodically poled lithium niobate waveguides, low-loss laser inscribed circuits, and fast (>1 MHz) fibre coupled electro-optic switches. Hybrid interfacing different platforms is a promising route to exploiting the advantages of existing technology and has permitted the demonstration of the multiplexing of four identical sources of single photons to one output. Since this is an integrated technology, it provides scalability and can immediately leverage any improvements in transmission, detection and photon production efficiencies.Comment: 5 pages, double column, 3 figure

    Language acquisition and implication for language change: A computational model.

    Get PDF
    Computer modeling techniques, when applied to language acquisition problems, give an often unrealized insight into the diachronic change that occurs in language over successive generations. This paper shows that using assumptions about language acquisition to model successive generations of learners in a computer simulation, can have a drastic effect on the long term changes that occur in a language. More importantly, it shows that slight changes in the acquisition model can have drastic effects on language change

    Generating Synthetic Pitch Contours Using Prosodic Structure.

    Get PDF
    This thesis addresses the problem of generating a range of natural sounding pitch contours for speech synthesis to convey the specific meanings of different intonation patterns. Where other models can synthesise intonation adequately for short sentences, longer sentences often sound unnatural as phrasing is only really considered at the sentence level. We build models within a framework of prosodic structure derived from the linguistic analysis of a corpus of speech. We show that the use of appropriate prosodic structure allows us to produce better contours for longer sentences and allows us to capture the original style of the corpus. The resulting model is also sufficiently flexible to be adapted to suitable styles for use in other domains. To convey specific meanings we need to be able to generate different accent types. We find that the infrequency of some accent and boundary types makes them hard to model from the corpus alone. We address this issue by developing a model which allows us to isolate the parameters which control specific accent type shapes, so that we can reestimate these parameters based on other data

    Using prosodic structure to improve pitch range variation in text to speech synthesis.

    Get PDF
    The intonation produced by current text-to-speech systems is often either flat or artificial sounding. Pitch range is one of the contributing factors which could be improved by more detailed linguistic knowledge. In this study, a corpus of read speech is analysed to provide information about prosodic structure and pitch range, which can be used to improve the intonation models for speech synthesis. The results show how the pitch range variation is most apparent at a tone group level of prosodic structure, and how phrase initial and phrase final tone groups have significantly different pitch ranges from tone groups which are phrase medial
    • 

    corecore