70 research outputs found

    Notes on the use of variational autoencoders for speech and audio spectrogram modeling

    Get PDF
    International audienceVariational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization

    What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS

    Full text link
    In incremental text to speech synthesis (iTTS), the synthesizer produces an audio output before it has access to the entire input sentence. In this paper, we study the behavior of a neural sequence-to-sequence TTS system when used in an incremental mode, i.e. when generating speech output for token n, the system has access to n + k tokens from the text sequence. We first analyze the impact of this incremental policy on the evolution of the encoder representations of token n for different values of k (the lookahead parameter). The results show that, on average, tokens travel 88% of the way to their full context representation with a one-word lookahead and 94% after 2 words. We then investigate which text features are the most influential on the evolution towards the final representation using a random forest analysis. The results show that the most salient factors are related to token length. We finally evaluate the effects of lookahead k at the decoder level, using a MUSHRA listening test. This test shows results that contrast with the above high figures: speech synthesis quality obtained with 2 word-lookahead is significantly lower than the one obtained with the full sentence.Comment: 5 pages, 4 figure

    Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

    Full text link
    We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input. Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers. The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances

    Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping

    Get PDF
    International audienceThis article addresses the adaptation of an acoustic-articulatory inversion model of a reference speaker to the voice of another source speaker, using a limited amount of audio-only data. In this study, the articulatory-acoustic relationship of the reference speaker is modeled by a Gaussian mixture model and inference of articulatory data from acoustic data is made by the associated Gaussian mixture regression (GMR). To address speaker adaptation, we previously proposed a general framework called Cascaded-GMR (C-GMR) which decomposes the adaptation process into two consecutive steps: spectral conversion between source and reference speaker and acoustic-articulatory inversion of converted spectral trajectories. In particular, we proposed the Integrated C-GMR technique (IC-GMR) in which both steps are tied together in the same probabilistic model. In this article, we extend the C-GMR framework with another model called Joint-GMR (J-GMR). Contrary to the IC-GMR, this model aims at exploiting all potential acoustic-articulatory relationships, including those between the source speaker's acoustics and the reference speaker's articulation. We present the full derivation of the exact Expectation-Maximization (EM) training algorithm for the J-GMR. It exploits the missing data methodology of machine learning to deal with limited adaptation data. We provide an extensive evaluation of the J-GMR on both synthetic acoustic-articulatory data and on the multi-speaker MOCHA EMA database. We compare the J-GMR performance to other models of the C-GMR framework, notably the IC-GMR, and discuss their respective merits

    Notes on the use of variational autoencoders for speech and audio spectrogram modeling

    Get PDF
    International audienceVariational autoencoders (VAEs) are powerful (deep) generative artificial neural networks. They have been recently used in several papers for speech and audio processing, in particular for the modeling of speech/audio spectrograms. In these papers, very poor theoretical support is given to justify the chosen data representation and decoder likelihood function or the corresponding cost function used for training the VAE. Yet, a nice theoretical statistical framework exists and has been extensively presented and discussed in papers dealing with nonnegative matrix factorization (NMF) of audio spectrograms and its application to audio source separation. In the present paper, we show how this statistical framework applies to VAE-based speech/audio spectrogram modeling. This provides the latter insights on the choice and interpretability of data representation and model parameterization

    A rich TILLING resource for studying gene function in Brassica rapa

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>Brassicaceae </it>family includes the model plant <it>Arabidopsis thaliana </it>as well as a number of agronomically important species such as oilseed crops (in particular <it>Brassica napus, B. juncea </it>and <it>B. rapa</it>) and vegetables (<it>eg. B. rapa </it>and <it>B. oleracea</it>).</p> <p>Separated by only 10-20 million years, <it>Brassica </it>species and <it>Arabidopsis thaliana </it>are closely related, and it is expected that knowledge obtained relating to <it>Arabidopsis </it>growth and development can be translated into Brassicas for crop improvement. Moreover, certain aspects of plant development are sufficiently different between <it>Brassica </it>and <it>Arabidopsis </it>to warrant studies to be carried out directly in the crop species. However, mutating individual genes in the amphidiploid Brassicas such as <it>B. napus </it>and <it>B. juncea </it>may, on the other hand, not give rise to expected phenotypes as the genomes of these species can contain up to six orthologues per single-copy <it>Arabidopsis </it>gene. In order to elucidate and possibly exploit the function of redundant genes for oilseed rape crop improvement, it may therefore be more efficient to study the effects in one of the diploid <it>Brassica </it>species such as <it>B. rapa</it>. Moreover, the ongoing sequencing of the <it>B. rapa </it>genome makes this species a highly attractive model for <it>Brassica </it>research and genetic resource development.</p> <p>Results</p> <p>Seeds from the diploid <it>Brassica </it>A genome species, <it>B. rapa </it>were treated with ethyl methane sulfonate (EMS) to produce a TILLING (Targeting Induced Local Lesions In Genomes) population for reverse genetics studies. We used the <it>B. rapa </it>genotype, R-o-18, which has a similar developmental ontogeny to an oilseed rape crop. Hence this resource is expected to be well suited for studying traits with relevance to yield and quality of oilseed rape. DNA was isolated from a total of 9,216 M<sub>2 </sub>plants and pooled to form the basis of the TILLING platform. Analysis of six genes revealed a high level of mutations with a density of about one per 60 kb. This analysis also demonstrated that screening a 1 kb amplicon in just one third of the population (3072 M<sub>2 </sub>plants) will provide an average of 68 mutations and a 97% probability of obtaining a stop-codon mutation resulting in a truncated protein. We furthermore calculated that each plant contains on average ~10,000 mutations and due to the large number of plants, it is predicted that mutations in approximately half of the GC base pairs in the genome exist within this population.</p> <p>Conclusions</p> <p>We have developed the first EMS TILLING resource in the diploid <it>Brassica </it>species, <it>B. rapa</it>. The mutation density in this population is ~1 per 60 kb, which makes it the most densely mutated diploid organism for which a TILLING population has been published. This resource is publicly available through the <it>RevGen</it>UK reverse genetics platform <url>http://revgenuk.jic.ac.uk</url>.</p

    Make That Sound More 'Metallic': Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

    Get PDF
    In this article, we propose a new method of sound transformation based on control parameters that are intuitive and relevant for musicians. This method uses a variational autoencoder (VAE) model that is first trained in an unsupervised manner on a large dataset of synthesizer sounds. Then, a perceptual regularization term is added to the loss function to be optimized, and a supervised fine-tuning of the model is carried out using a small subset of perceptually labeled sounds. The labels were obtained from a perceptual test of Verbal Attribute Magnitude Estimation in which listeners rated this training sound dataset along eight perceptual dimensions (French equivalents of 'metallic, warm, breathy, vibrating, percussive, resonating, evolving, aggressive'). These dimensions were identified as relevant for the description of synthesizer sounds in a first Free Verbalization test. The resulting VAE model was evaluated by objective reconstruction measures and a perceptual test. Both showed that the model was able, to a certain extent, to capture the acoustic properties of most of the perceptual dimensions and to transform sound timbre along at least two of them ('aggressive' and 'vibrating') in a perceptually relevant manner. Moreover, it was able to generalize to unseen samples even though a small set of labeled sounds was used

    Genotypic variability enhances the reproducibility of an ecological study

    Get PDF
    Many scientific disciplines are currently experiencing a “reproducibility crisis” because numerous scientific findings cannot be repeated consistently. A novel but controversial hypothesis postulates that stringent levels of environmental and biotic standardization in experimental studies reduces reproducibility by amplifying impacts of lab-specific environmental factors not accounted for in study designs. A corollary to this hypothesis is that a deliberate introduction of controlled systematic variability (CSV) in experimental designs may lead to increased reproducibility. We tested this hypothesis using a multi-laboratory microcosm study in which the same ecological experiment was repeated in 14 laboratories across Europe. Each laboratory introduced environmental and genotypic CSV within and among replicated microcosms established in either growth chambers (with stringent control of environmental conditions) or glasshouses (with more variable environmental conditions). The introduction of genotypic CSV led to lower among-laboratory variability in growth chambers, indicating increased reproducibility, but had no significant effect in glasshouses where reproducibility was generally lower. Environmental CSV had little effect on reproducibility. Although there are multiple causes for the “reproducibility crisis”, deliberately including genetic variation may be a simple solution for increasing the reproducibility of ecological studies performed in controlled environments

    Heritable symbionts in a world of varying temperature

    Get PDF
    Heritable microbes represent an important component of the biology, ecology and evolution of many plants, animals and fungi, acting as both parasites and partners. In this review, we examine how heritable symbiont–host interactions may alter host thermal tolerance, and how the dynamics of these interactions may more generally be altered by thermal environment. Obligate symbionts, those required by their host, are considered to represent a thermally sensitive weak point for their host, associated with accumulation of deleterious mutations. As such, these symbionts may represent an important determinant of host thermal envelope and spatial distribution. We then examine the varied relationship between thermal environment and the frequency of facultative symbionts that provide ecologically contingent benefits or act as parasites. We note that some facultative symbionts directly alter host thermotolerance. We outline how thermal environment will alter the benefits/costs of infection more widely, and additionally modulate vertical transmission efficiency. Multiple patterns are observed, with symbionts being cold sensitive in some species and heat sensitive in others, with varying and non-coincident thresholds at which phenotype and transmission are ablated. Nevertheless, it is clear that studies aiming to predict ecological and evolutionary dynamics of symbiont–host interactions need to examine the interaction across a range of thermal environments. Finally, we discuss the importance of thermal sensitivity in predicting the success/failure of symbionts to spread into novel species following natural/engineered introduction

    Breeding for increased nitrogen-use efficiency: a review for wheat (T. aestivum L.)

    Get PDF
    Nitrogen fertilizer is the most used nutrient source in modern agriculture and represents significant environmental and production costs. In the meantime, the demand for grain increases and production per area has to increase as new cultivated areas are scarce. In this context, breeding for an efficient use of nitrogen became a major objective. In wheat, nitrogen is required to maintain a photosynthetically active canopy ensuring grain yield and to produce grain storage proteins that are generally needed to maintain a high end-use quality. This review presents current knowledge of physiological, metabolic and genetic factors influencing nitrogen uptake and utilization in the context of different nitrogen management systems. This includes the role of root system and its interactions with microorganisms, nitrate assimilation and its relationship with photosynthesis as postanthesis remobilization and nitrogen partitioning. Regarding nitrogen-use efficiency complexity, several physiological avenues for increasing it were discussed and their phenotyping methods were reviewed. Phenotypic and molecular breeding strategies were also reviewed and discussed regarding nitrogen regimes and genetic diversity
    • 

    corecore