101 research outputs found

    Expressive speech synthesis using sentiment embeddings

    Get PDF
    In this paper we present a DNN based speech synthesis system trained on an audiobook including sentiment features predicted by the Stanford sentiment parser. The baseline system uses DNN to predict acoustic parameters based on conventional linguistic features, as they have been used in statistical parametric speech synthesis. The predicted parameters are transformed into speech using a conventional high-quality vocoder. In this paper, the conventional linguistic features are enriched using sentiment features. Different sentiment representations have been considered, combining sentiment probabilities with hierarchical distance and context. After preliminary analysis a listening experiment is conducted, where participants evaluate the different systems. The results show the usefulness of the proposed features and reveal differences between expert and non-expert TTS user.Peer ReviewedPostprint (published version

    Reconstructing Coastal Sediment Budgets From Beach‐ and Foredune‐Ridge Morphology: A Coupled Field and Modeling Approach

    Get PDF
    Preserved beach and foredune ridges may serve as proxies for coastal change, reflecting alterations in sea level, wave energy, or past sediment fluxes. In particular, time‐varying shoreface sediment budgets have been inferred from the relative size of foredune ridges through application of radiocarbon and optically stimulated luminescence dating to these systems over the last decades. However, geochronological control requires extensive field investigation and analysis. Purely field‐based studies might also overlook relationships between the mechanics of sediment delivery to the shoreface and foredune ridges, missing insights about sensitivity to changes in sediment budget. We therefore propose a simple geomorphic model of beach/foredune‐ridge and swale morphology to quantify the magnitude of changes in cross‐shore sediment budget, employing field measurements of ridge volume, ridge spacing, elevation, and shoreline progradation. Model behaviors are constrained by the partitioning of sediment fluxes to the shoreface and foredune ridge and can be used to reproduce several cross‐shore patterns observed in nature. These include regularly spaced ridges (“washboards”), large singular ridges, and wide swales with poorly developed ridges. We evaluate our model against well‐preserved ridge and swale systems at two sites along the Virginia Eastern Shore (USA): Fishing Point, for which historical records provide a detailed history of shoreline progradation and ridge growth, and Parramore Island, for which a relatively more complex morphology developed over a poorly constrained period of prehistoric growth. Our results suggest this new model could be used to infer the sensitivity of field sites across the globe to variations in sediment delivery

    Evaluation of a transplantation algorithm for expressive speech synthesis

    Get PDF
    When designing human-machine interfaces it is important to consider not only the bare bones functionality but also the ease of use and accessibility it provides. When talking about voice-based inter- faces, it has been proven that imbuing expressiveness into the synthetic voices increases signi?cantly its perceived naturalness, which in the end is very helpful when building user friendly interfaces. This paper proposes an adaptation based expressiveness transplantation system capable of copying the emotions of a source speaker into any desired target speaker with just a few minutes of read speech and without requiring the record- ing of additional expressive data. This system was evaluated through a perceptual test for 3 speakers showing up to an average of 52% emotion recognition rates relative to the natural voice recognition rates, while at the same time keeping good scores in similarity and naturality

    Towards speaking style transplantation in speech synthesis

    Get PDF
    One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended

    Anthropogenic controls on overwash deposition: Evidence and consequences

    Get PDF
    Accelerated sea level rise and the potential for an increase in frequency of the most intense hurricanes due to climate change threaten the vitality and habitability of barrier islands by lowering their relative elevation and altering frequency of overwash. High-density development may further increase island vulnerability by restricting delivery of overwash to the subaerial island. We analyzed pre-Hurricane Sandy and post-Hurricane Sandy (2012) lidar surveys of the New Jersey coast to assess human influence on barrier overwash, comparing natural environments to two developed environments (commercial and residential) using shore-perpendicular topographic profiles. The volumes of overwash delivered to residential and commercial environments are reduced by 40% and 90%, respectively, of that delivered to natural environments. We use this analysis and an exploratory barrier island evolution model to assess long-term impacts of anthropogenic structures. Simulations suggest that natural barrier islands may persist under a range of likely future sea level rise scenarios (7-13mm/yr), whereas developed barrier islands will have a long-term tendency toward drowning

    Towards glottal source controllability in expressive speech synthesis

    Get PDF
    In order to obtain more human like sounding humanmachine interfaces we must first be able to give them expressive capabilities in the way of emotional and stylistic features so as to closely adequate them to the intended task. If we want to replicate those features it is not enough to merely replicate the prosodic information of fundamental frequency and speaking rhythm. The proposed additional layer is the modification of the glottal model, for which we make use of the GlottHMM parameters. This paper analyzes the viability of such an approach by verifying that the expressive nuances are captured by the aforementioned features, obtaining 95% recognition rates on styled speaking and 82% on emotional speech. Then we evaluate the effect of speaker bias and recording environment on the source modeling in order to quantify possible problems when analyzing multi-speaker databases. Finally we propose a speaking styles separation for Spanish based on prosodic features and check its perceptual significance

    Towards Speaking Style Transplantation in Speech Synthesis

    Get PDF
    One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended. Index Terms: expressive speech synthesis, speaking styles, adaptation, expressiveness transplantatio

    Towards an unsupervised speaking style voice building framework: multi-style speaker diarization

    Get PDF
    Current text-to-speech systems are developed using studio-recorded speech in a neutral style or based on acted emotions. However, the proliferation of media sharing sites would allow developing a new generation of speech-based systems which could cope with spontaneous and styled speech. This paper proposes an architecture to deal with realistic recordings and carries out some experiments on unsupervised speaker diarization. In order to maximize the speaker purity of the clusters while keeping a high speaker coverage, the paper evaluates the F-measure of a diarization module, achieving high scores (>85%) especially when the clusters are longer than 30 seconds, even for the more spontaneous and expressive styles (such as talk shows or sports)

    Anthropogenic controls on overwash deposition: Evidence and consequences

    Get PDF
    Accelerated sea level rise and the potential for an increase in frequency of the most intense hurricanes due to climate change threaten the vitality and habitability of barrier islands by lowering their relative elevation and altering frequency of overwash. High-density development may further increase island vulnerability by restricting delivery of overwash to the subaerial island. We analyzed pre-Hurricane Sandy and post-Hurricane Sandy (2012) lidar surveys of the New Jersey coast to assess human influence on barrier overwash, comparing natural environments to two developed environments (commercial and residential) using shore-perpendicular topographic profiles. The volumes of overwash delivered to residential and commercial environments are reduced by 40% and 90%, respectively, of that delivered to natural environments. We use this analysis and an exploratory barrier island evolution model to assess long-term impacts of anthropogenic structures. Simulations suggest that natural barrier islands may persist under a range of likely future sea level rise scenarios (7-13 mm/yr), whereas developed barrier islands will have a long-term tendency toward drowning
    corecore