85 research outputs found

    Representation Learning for Spoken term Detection

    Get PDF
    Spoken Term Detection (STD) is the task of searching a given spoken query word in large speech database. Applications of STD include speech data indexing, voice dialling, telephone monitoring and data mining. Performance of STD depends mainly on representation of speech signal and matching of represented signal. This work investigates methods for robust representation of speech signal, which is invariant to speaker variability, in the context of STD task. Here the representation is in the form of templates, a sequence of feature vectors. Typical representation in speech community Mel-Frequency CepstralCoe cients (MFCC) carry both speech-specific and speaker-specific information, so the need for better representation. Searching is done by matching sequence of feature vectors of query and reference utterances by using Subsequence Dynamic Time Warping (DTW). The performance of the proposed representation is evaluated on Telugu broadcast news data. In the absence of labelled data i.e., in unsupervised setting, we propose to capture joint density of acoustic space spanned by MFCCs using Gaussian Mixture Models (GMM) and Gaussian-Bernoulli Restricted Boltzmann Machines (GBRBM). Posterior features extracted from trained models are used to search the query word. It is noticed that 8% and 12% improvement in STD performance compared to MFCC by using GMM and GBRBM posterior features respectively. As transcribed data is not required, this approach is optimal solution to low-resource languages. But due to it’s intermediate performance, this method cannot be immediate solution to high resource language

    Evaluation Of Unsupervised Models Using Minimal Pair ABX Measure

    Get PDF
    The Minimal-Pair ABX (MP-ABX) task has been proposed as a method for evaluating speech features for zero resource (i.e only limited amount of labelled data) unsupervised speech technologies. MP-ABX task is an alternative to the phoneme word error rate,it is necessary to discriminate between the minimal pair of words from a language. We compared Mel Frequency Cepstral Coefficients (MFCC) with modelling parameters of these MFCC's by using unsupervised generative models like Gaussian Mixture Model (GMM) and Gaussian-Bernoulli Restricted Boltzmann Machine (GBRBM). In an MP-ABX task, the features (MFCC) a, b and x associated to three speech sounds, A, B and X are computed, where A and B are chosen to be minimally different words (e.g. dog vs doll) and X is linguistically identical to either A or B, although it can be indexically different (different talker or added noise). Then, one determines whether x is closer to a or b by computing Distance Time Wrapping algorithm (DTW) of the evaluated features. By repeating this on a representative set of A, B, X triplets, a measure of the discriminability of minimal pairs when coded with the tested featural representation is obtained. This evaluation metric is especially suitable for zero-resource setting

    Perceptual Learning of German Sounds: Evidence from Functional Load (FL) and High- Variability Phonetic Training (HVPT)

    Get PDF
    The objective of this thesis is to empirically test the practical implications of the functional load (FL) principle in German. The findings informed the selection of German phonemic contrasts for perceptual training of L2 German learners in a follow-up study. Previous research has suggested that sound contrasts carrying a high FL play a central role in conveying meaning, which closely links to the notions of intelligibility and comprehensibility of spoken utterances. Recent attention to FL in second language (L2) English pronunciation pedagogy highlights its role in selecting appropriate L2 sounds to train. In Study 1, the FL hierarchy of German and the impact of high vs. low FL segments on intelligibility and comprehensibility is tested among L1 German listeners. Results show that high FL errors have a more detrimental effect than low FL errors, but two errors are more severe than one, regardless of FL classification. Study 2 explores two types (i.e., audio and audiovisual) of high-variability phonetic training (HVPT) for challenging German sound contrasts among beginner L2 learners. HVPT employs multiple talkers and variable phonetic environments, thereby enhancing discrimination of sound contrasts. Results showed that especially audiovisual HVPT led to reduced discrimination accuracy, suggesting a need to investigate its use for training beginner learners. These findings shed light upon FL’s applicability in conjunction with word recognition models, thereby guiding future work on FL in L2 pronunciation pedagogy. They also provide insights into the theoretical implications of the HVPT technique in fostering perceptual abilities among beginner L2 learners

    Design and evaluation of mobile computer-assisted pronunciation training tools for second language learning

    Get PDF
    The quality of speech technology (automatic speech recognition, ASR, and textto- speech, TTS) has considerably improved and, consequently, an increasing number of computer-assisted pronunciation (CAPT) tools has included it. However, pronunciation is one area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of tools and games that include speech technology in the field of pronunciation training and teaching. This PhD thesis addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation. Particularly, it aims to improve learner’s L2 pronunciation at the segmental level with a specific set of methodological choices, such as learner’s first and second language connection (L1– L2), minimal pairs, a training cycle of exposure–perception–production, individualistic and social approaches, and the inclusion of ASR and TTS technology. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation. Data is automatically gathered by the CAPT systems to give an immediate specific feedback to users and to analyze all results. The protocols, metrics, algorithms, and methods necessary to statistically analyze and discuss the results are also detailed. The two main L2 tested during the experimental procedure are American English and Spanish. The different CAPT prototypes designed and validated in this thesis, and the methodological choices that they implement, allow to accurately measuring the relative pronunciation improvement of the individuals who trained with them. Both rater’s subjective scores and CAPT’s objective scores show a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs. Results also show an intensive practice supported by a significant number of activities carried out. In the case of the controlled experiments, students who worked with the CAPT tool achieved better pronunciation improvement values than their peers in the traditional in-classroom instruction group. In the case of the challenge-based CAPT learning game proposed, the most active players in the competition kept on playing until the end and achieved significant pronunciation improvement results.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Inebriated Immunity: Alcohol Affects Innate Immune Signaling in the Gut-Liver-Brain Axis

    Get PDF
    Alcohol is a commonly consumed beverage, a drug of abuse and an important molecule affecting nearly every organ-system in the body. This project seeks to investigate the interplay between alcohol’s effects on critical organ-systems making up gut-liver-brain axis. Alcohol initially interacts with the gastrointestinal tract. Our research describes the alterations seen in intestinal microbiota following alcohol consumption in an acute-on-chronic model of alcoholic hepatitis and indicates that reducing intestinal bacteria using antibiotics protects from alcohol-induced intestinal cytokine expression, alcoholic liver disease and from inflammation in the brain. Alcohol-induced liver injury can occur due to direct hepatocyte metabolic dysregulation and from leakage of bacterial products from the intestine that initiates an immune response. Here, we will highlight the importance of this immune response, focusing on the role of infiltrating immune cells in human patients with alcoholic hepatitis and alcoholic cirrhosis. Using a small molecule inhibitor of CCR2/CCR5 chemokine receptor signaling in mice, we can protect the liver from damage and alcohol-induced inflammation. In the brain, we observe that chronic alcohol leads to the infiltration of macrophages in a region-specific manner. CCR2/CCR5 inhibition reduced macrophage infiltration, alcohol-induced inflammation and microglial changes. We also report that chronic alcohol shifts excitatory/inhibitory synapses in the hippocampus, possibly through complement-mediated remodeling. Finally, we show that anti-inflammasome inhibitors altered behavior by reducing alcohol consumption in female mice. Together, these data advance our understanding of the gut-liver-brain axis in alcoholism and suggest novel avenues of therapeutic intervention to inhibit organ pathology associated with alcohol consumption and reduce drinking

    A multi-modal approach to functional neuroimaging

    Get PDF
    The work undertaken involves the use of functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) as separate but complementary non-invasive functional brain imaging modalities. The aim in combining fMRI and MEG is centred around exploitation of the high temporal resolution available in MEG, and the high spatial resolution available in fMRI. However, whilst MEG represents a direct measure of neuronal activity, BOLD fMRI is an indirect measure and this makes the two modalities truly complementary. In both cases, the imaging signals measured are relatively poorly understood and so the fundamental question asked here is: How are the neuromagnetic effects detectable using MEG related to the metabolic effects reflected in the fMRI BOLD response? Initially, a novel technique is introduced for the detection and spatial localisation of neuromagnetic effects in MEG. This technique, based on a beamforming approach to the MEG inverse problem, is shown to yield accurate results both in simulation and using experimental data. The technique introduced is applied to MEG data from a simple experiment involving stimulation of the visual cortex. A number of heterogeneous neuromagnetic effects are shown to be detectable, and furthermore, these effects are shown to be spatially and temporally correlated with the fMRI BOLD response. The limitations to comparing only two measures of brain activity are discussed, and the use of arterial spin labelling (ASL) to make quantitative measurements of physiological parameters supplementing these two initial metrics is introduced. Finally, a novel technique for accurate quantification of arterial cerebral blood volume using ASL is described and shown to produce accurate results. A concluding chapter then speculates on how these aCBV measurements might be combined with those from MEG in order to better understand the fMRI BOLD response

    A multi-modal approach to functional neuroimaging

    Get PDF
    The work undertaken involves the use of functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG) as separate but complementary non-invasive functional brain imaging modalities. The aim in combining fMRI and MEG is centred around exploitation of the high temporal resolution available in MEG, and the high spatial resolution available in fMRI. However, whilst MEG represents a direct measure of neuronal activity, BOLD fMRI is an indirect measure and this makes the two modalities truly complementary. In both cases, the imaging signals measured are relatively poorly understood and so the fundamental question asked here is: How are the neuromagnetic effects detectable using MEG related to the metabolic effects reflected in the fMRI BOLD response? Initially, a novel technique is introduced for the detection and spatial localisation of neuromagnetic effects in MEG. This technique, based on a beamforming approach to the MEG inverse problem, is shown to yield accurate results both in simulation and using experimental data. The technique introduced is applied to MEG data from a simple experiment involving stimulation of the visual cortex. A number of heterogeneous neuromagnetic effects are shown to be detectable, and furthermore, these effects are shown to be spatially and temporally correlated with the fMRI BOLD response. The limitations to comparing only two measures of brain activity are discussed, and the use of arterial spin labelling (ASL) to make quantitative measurements of physiological parameters supplementing these two initial metrics is introduced. Finally, a novel technique for accurate quantification of arterial cerebral blood volume using ASL is described and shown to produce accurate results. A concluding chapter then speculates on how these aCBV measurements might be combined with those from MEG in order to better understand the fMRI BOLD response

    Proceedings of the EAA Joint Symposium on Auralization and Ambisonics 2014

    Get PDF
    In consideration of the remarkable intensity of research in the field of Virtual Acoustics, including different areas such as sound field analysis and synthesis, spatial audio technologies, and room acoustical modeling and auralization, it seemed about time to organize a second international symposium following the model of the first EAA Auralization Symposium initiated in 2009 by the acoustics group of the former Helsinki University of Technology (now Aalto University). Additionally, research communities which are focused on different approaches to sound field synthesis such as Ambisonics or Wave Field Synthesis have, in the meantime, moved closer together by using increasingly consistent theoretical frameworks. Finally, the quality of virtual acoustic environments is often considered as a result of all processing stages mentioned above, increasing the need for discussions on consistent strategies for evaluation. Thus, it seemed appropriate to integrate two of the most relevant communities, i.e. to combine the 2nd International Auralization Symposium with the 5th International Symposium on Ambisonics and Spherical Acoustics. The Symposia on Ambisonics, initiated in 2009 by the Institute of Electronic Music and Acoustics of the University of Music and Performing Arts in Graz, were traditionally dedicated to problems of spherical sound field analysis and re-synthesis, strategies for the exchange of ambisonics-encoded audio material, and – more than other conferences in this area – the artistic application of spatial audio systems. This publication contains the official conference proceedings. It includes 29 manuscripts which have passed a 3-stage peer-review with a board of about 70 international reviewers involved in the process. Each contribution has already been published individually with a unique DOI on the DepositOnce digital repository of TU Berlin. Some conference contributions have been recommended for resubmission to Acta Acustica united with Acustica, to possibly appear in a Special Issue on Virtual Acoustics in late 2014. These are not published in this collection.European Acoustics Associatio

    Sonic interactions in virtual environments

    Get PDF
    This book tackles the design of 3D spatial interactions in an audio-centered and audio-first perspective, providing the fundamental notions related to the creation and evaluation of immersive sonic experiences. The key elements that enhance the sensation of place in a virtual environment (VE) are: Immersive audio: the computational aspects of the acoustical-space properties of Virutal Reality (VR) technologies Sonic interaction: the human-computer interplay through auditory feedback in VE VR systems: naturally support multimodal integration, impacting different application domains Sonic Interactions in Virtual Environments will feature state-of-the-art research on real-time auralization, sonic interaction design in VR, quality of the experience in multimodal scenarios, and applications. Contributors and editors include interdisciplinary experts from the fields of computer science, engineering, acoustics, psychology, design, humanities, and beyond. Their mission is to shape an emerging new field of study at the intersection of sonic interaction design and immersive media, embracing an archipelago of existing research spread in different audio communities and to increase among the VR communities, researchers, and practitioners, the awareness of the importance of sonic elements when designing immersive environments
    corecore