83,880 research outputs found

    Investigating Perceptual Congruence Between Data and Display Dimensions in Sonification

    Get PDF
    The relationships between sounds and their perceived meaning and connotations are complex, making auditory perception an important factor to consider when designing sonification systems. Listeners often have a mental model of how a data variable should sound during sonification and this model is not considered in most data:sound mappings. This can lead to mappings that are difficult to use and can cause confusion. To investigate this issue, we conducted a magnitude estimation experiment to map how roughness, noise and pitch relate to the perceived magnitude of stress, error and danger. These parameters were chosen due to previous findings which suggest perceptual congruency between these auditory sensations and conceptual variables. Results from this experiment show that polarity and scaling preference are dependent on the data:sound mapping. This work provides polarity and scaling values that may be directly utilised by sonification designers to improve auditory displays in areas such as accessible and mobile computing, process-monitoring and biofeedback

    Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

    Acoustic Space Movement Planning in a Neural Model of Motor Equivalent Vowel Production

    Full text link
    Recent evidence suggests that speakers utilize an acoustic-like reference frame for the planning of speech movements. DIVA, a computational model of speech acquisition and motor equivalent speech production, has previously been shown to provide explanations for a wide range of speech production data using a constriction-based reference frame for movement planning. This paper extends the previous work by investigating an acoustic-like planning frame in the DIVA modeling framework. During a babbling phase, the model self-organizes targets in the planning space for each of ten vowels and learns a mapping from desired movement directions in this planning space into appropriate articulator velocities. Simulation results verify that after babbling the model is capable of producing easily recognizable vowel sounds using an acoustic planning space consisting of the formants F1 and F2. The model successfully reaches all vowel targets from any initial vocal tract configuration, even in the presence of constraints such as a blocked jaw.Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015); Air Force Office of Scientific Research (F49620-92-J-0499

    Acoustic Space Learning for Sound Source Separation and Localization on Binaural Manifolds

    Get PDF
    In this paper we address the problems of modeling the acoustic space generated by a full-spectrum sound source and of using the learned model for the localization and separation of multiple sources that simultaneously emit sparse-spectrum sounds. We lay theoretical and methodological grounds in order to introduce the binaural manifold paradigm. We perform an in-depth study of the latent low-dimensional structure of the high-dimensional interaural spectral data, based on a corpus recorded with a human-like audiomotor robot head. A non-linear dimensionality reduction technique is used to show that these data lie on a two-dimensional (2D) smooth manifold parameterized by the motor states of the listener, or equivalently, the sound source directions. We propose a probabilistic piecewise affine mapping model (PPAM) specifically designed to deal with high-dimensional data exhibiting an intrinsic piecewise linear structure. We derive a closed-form expectation-maximization (EM) procedure for estimating the model parameters, followed by Bayes inversion for obtaining the full posterior density function of a sound source direction. We extend this solution to deal with missing data and redundancy in real world spectrograms, and hence for 2D localization of natural sound sources such as speech. We further generalize the model to the challenging case of multiple sound sources and we propose a variational EM framework. The associated algorithm, referred to as variational EM for source separation and localization (VESSL) yields a Bayesian estimation of the 2D locations and time-frequency masks of all the sources. Comparisons of the proposed approach with several existing methods reveal that the combination of acoustic-space learning with Bayesian inference enables our method to outperform state-of-the-art methods.Comment: 19 pages, 9 figures, 3 table

    Semi-Supervised Sound Source Localization Based on Manifold Regularization

    Full text link
    Conventional speaker localization algorithms, based merely on the received microphone signals, are often sensitive to adverse conditions, such as: high reverberation or low signal to noise ratio (SNR). In some scenarios, e.g. in meeting rooms or cars, it can be assumed that the source position is confined to a predefined area, and the acoustic parameters of the environment are approximately fixed. Such scenarios give rise to the assumption that the acoustic samples from the region of interest have a distinct geometrical structure. In this paper, we show that the high dimensional acoustic samples indeed lie on a low dimensional manifold and can be embedded into a low dimensional space. Motivated by this result, we propose a semi-supervised source localization algorithm which recovers the inverse mapping between the acoustic samples and their corresponding locations. The idea is to use an optimization framework based on manifold regularization, that involves smoothness constraints of possible solutions with respect to the manifold. The proposed algorithm, termed Manifold Regularization for Localization (MRL), is implemented in an adaptive manner. The initialization is conducted with only few labelled samples attached with their respective source locations, and then the system is gradually adapted as new unlabelled samples (with unknown source locations) are received. Experimental results show superior localization performance when compared with a recently presented algorithm based on a manifold learning approach and with the generalized cross-correlation (GCC) algorithm as a baseline

    Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech

    Get PDF
    Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models such as recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure with intermediate affine transformations tends to make them slow to train and to sample from. In this work, we explore two different mechanisms that enhance the operational efficiency of recurrent neural networks, and study their performance–speed trade-off. The first mechanism is based on the quasi-recurrent neural network, where expensive affine transformations are removed from temporal connections and placed only on feed-forward computational directions. The second mechanism includes a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder networks are competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU and GPU inference time. The best performing model is the one based on the quasi-recurrent mechanism, reaching the same level of naturalness as the recurrent neural network based model with a speedup of 11.2 on CPU and 3.3 on GPU.Peer ReviewedPostprint (published version

    The value of remote sensing techniques in supporting effective extrapolation across multiple marine spatial scales

    Get PDF
    The reporting of ecological phenomena and environmental status routinely required point observations, collected with traditional sampling approaches to be extrapolated to larger reporting scales. This process encompasses difficulties that can quickly entrain significant errors. Remote sensing techniques offer insights and exceptional spatial coverage for observing the marine environment. This review provides guidance on (i) the structures and discontinuities inherent within the extrapolative process, (ii) how to extrapolate effectively across multiple spatial scales, and (iii) remote sensing techniques and data sets that can facilitate this process. This evaluation illustrates that remote sensing techniques are a critical component in extrapolation and likely to underpin the production of high-quality assessments of ecological phenomena and the regional reporting of environmental status. Ultimately, is it hoped that this guidance will aid the production of robust and consistent extrapolations that also make full use of the techniques and data sets that expedite this process
    • …
    corecore