Search CORE

868 research outputs found

Lip Synchronization by Acoustic Inversion

Author: Berger Michael
Hofer Gregor
Richmond Korin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Crossref

Edinburgh Research Explorer

Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

Author: Berry Jeffrey J.
Ji An
Johnson Michael T.
Publication venue: e-Publications@Marquette
Publication date: 01/10/2016
Field of study

Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

epublications@Marquette

Speaker Independent Acoustic-to-Articulatory Inversion

Author: Ji An
Publication venue: e-Publications@Marquette
Publication date: 01/10/2014
Field of study

Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography - Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data

epublications@Marquette

Recommended from our members

Advances in the Direct Spectral Estimation of Acoustic Sources Using Continuous-Scan Phased Arrays

Author: Morata Carranza David
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The present study is related to the field of imaging of aeroacoustic noise sources. Traditional techniques include the use of phased microphone arrays and acoustic beamforming of the signals signals using algorithms such as the Delay-And-Sum (DAS). Over the last years, there has been an increasing interest in methods in which some of the sensors traverse in prescribed paths and motion. Some of the challenges of this approach include the treatment of the non-stationarity of the signal due to the motion of the microphone(s).An objective of this work is to review the methodology presented by D. Papamoschou, P. Shah and myself in the AIAA Journal "Inverse Acoustic Methodology for Continuous-Scan Phased Arrays" since it provides the building grounds for the thesis. The methodology accounts for the direct estimation of the spatio-spectral distribution of an acoustic source from microphone measurements that include fixed and continuously scanning sensors. The non-stationarity of the signal is addressed by means of the Wigner-Ville spectrum. Suppression of the non-stationary effects involves the division of the signal into blocks and the application of a frequency-dependent window within each block. The direct estimation approach involves the inversion of an integral that relates the modeled pressure field, the measured pressure field and the response of the array. A Bayesian-estimation that allows for efficient inversion of the integrals and performs similarly to the conjugate gradient method is reviewed.The coherence-based noise source distribution is studied in this work and the influence of the signal segmentation on its spatial resolution is analyzed. This thesis provides specific guidelines related to the signal processing. The signal is divided into blocks meeting a desired mathematical condition. A minimum and maximum size for the resulting blocks is proposed in this work, as well as a minimum and maximum block overlap. A safe region for the signal segmentation is presented as well.This work presents a methodology to synchronize the signals from the microphones (scanning or not) with the position of the scanning sensor. It also shows the methods to check the accuracy of the position scanning sensor.The methodology is applied to acoustic fields emitted by impinging jets approximating a point source and an overexpanded supersonic jet. Noise source maps that included the scanning sensor and a dense block distribution have increased spatial resolution and reduced sidelobes. The ability of the continuous scan paradigm to provide high-definition noise source maps with a lower sensor count is confirmed in this work as well. The effect of the proposed signal segmentation on sparse arrays is discussed

eScholarship - University of California

Towards Ultrasound Tongue Image prediction from EEG during speech production

Author: Arthur Frigyes Viktor
Boncz Ádám
Csapó Tamás Gábor
Nagy Péter
Publication venue
Publication date: 01/01/2023
Field of study

Previous initial research has already been carried out to propose speech-based BCI using brain signals (e.g.~non-invasive EEG and invasive sEEG / ECoG), but there is a lack of combined methods that investigate non-invasive brain, articulation, and speech signals together and analyze the cognitive processes in the brain, the kinematics of the articulatory movement and the resulting speech signal. In this paper, we describe our multimodal (electroencephalography, ultrasound tongue imaging, and speech) analysis and synthesis experiments, as a feasibility study. We extend the analysis of brain signals recorded during speech production with ultrasound-based articulation data. From the brain signal measured with EEG, we predict ultrasound images of the tongue with a fully connected deep neural network. The results show that there is a weak but noticeable relationship between EEG and ultrasound tongue images, i.e. the network can differentiate articulated speech and neutral tongue position.Comment: accepted at Interspeech 202

arXiv.org e-Print Archive

Repository of the Academy's Library

Articulatory-WaveNet: Deep Autoregressive Model for Acoustic-to-Articulatory Inversion

Author: Agha Seyed Mirza Bozorg Narjes Alsadat
Publication venue: UKnowledge
Publication date: 01/01/2020
Field of study

Acoustic-to-Articulatory Inversion, the estimation of articulatory kinematics from speech, is an important problem which has received significant attention in recent years. Estimated articulatory movements from such models can be used for many applications, including speech synthesis, automatic speech recognition, and facial kinematics for talking-head animation devices. Knowledge about the position of the articulators can also be extremely useful in speech therapy systems and Computer-Aided Language Learning (CALL) and Computer-Aided Pronunciation Training (CAPT) systems for second language learners. Acoustic-to-Articulatory Inversion is a challenging problem due to the complexity of articulation patterns and significant inter-speaker differences. This is even more challenging when applied to non-native speakers without any kinematic training data. This dissertation attempts to address these problems through the development of up-graded architectures for Articulatory Inversion. The proposed Articulatory-WaveNet architecture is based on a dilated causal convolutional layer structure that improves the Acoustic-to-Articulatory Inversion estimated results for both speaker-dependent and speaker-independent scenarios. The system has been evaluated on the ElectroMagnetic Articulography corpus of Mandarin Accented English (EMA-MAE) corpus, consisting of 39 speakers including both native English speakers and Mandarin accented English speakers. Results show that Articulatory-WaveNet improves the performance of the speaker-dependent and speaker-independent Acoustic-to-Articulatory Inversion systems significantly compared to the previously reported results

University of Kentucky

Face Active Appearance Modeling and Speech Acoustic Information to Recover Articulation

Author: A. Katsamanis
G. Papandreou
P. Maragos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref