1,099 research outputs found

    Sub-Banded Reconstructed Phase Spaces for Speech Recognition

    Get PDF
    A novel method combining filter banks and reconstructed phase spaces is proposed for the modeling and classification of speech. Reconstructed phase spaces, which are based on dynamical systems theory, have advantages over spectral-based analysis methods in that they can capture nonlinear or higher-order statistics. Recent work has shown that the natural measure of a reconstructed phase space can be used for modeling and classification of phonemes. In this work, sub-banding of speech, which has been examined for recognition of noise-corrupted speech, is studied in combination with phase space reconstruction. This sub-banding, which is motivated by empirical psychoacoustical studies, is shown to dramatically improve the phoneme classification accuracy of reconstructed phase space-based approaches. Experiments that examine the performance of fused sub-banded reconstructed phase spaces for phoneme classification are presented. Comparisons against a cepstral-based classifier show that the proposed approach is competitive with state-of-the-art methods for modeling and classification of phonemes. Combination of cepstral-based features and the sub-band RPS features shows improvement over a cepstral-only baseline

    A Subband-Based SVM Front-End for Robust ASR

    Full text link
    This work proposes a novel support vector machine (SVM) based robust automatic speech recognition (ASR) front-end that operates on an ensemble of the subband components of high-dimensional acoustic waveforms. The key issues of selecting the appropriate SVM kernels for classification in frequency subbands and the combination of individual subband classifiers using ensemble methods are addressed. The proposed front-end is compared with state-of-the-art ASR front-ends in terms of robustness to additive noise and linear filtering. Experiments performed on the TIMIT phoneme classification task demonstrate the benefits of the proposed subband based SVM front-end: it outperforms the standard cepstral front-end in the presence of noise and linear filtering for signal-to-noise ratio (SNR) below 12-dB. A combination of the proposed front-end with a conventional front-end such as MFCC yields further improvements over the individual front ends across the full range of noise levels

    High Fidelity Bioelectric Modelling of the Implanted Cochlea

    Get PDF
    Cochlear implants are medical devices that can restore sound perception in individuals with sensorineural hearing loss (SHL). Since their inception, improvements in performance have largely been driven by advances in signal processing, but progress has plateaued for almost a decade. This suggests that there is a bottleneck at the electrode-tissue interface, which is responsible for enacting the biophysical changes that govern neuronal recruitment. Understanding this interface is difficult because the cochlea is small, intricate, and difficult to access. As such, researchers have turned to modelling techniques to provide new insights. The state-of-the-art involves calculating the electric field using a volume conduction model of the implanted cochlea and coupling it with a neural excitation model to predict the response. However, many models are unable to predict patient outcomes consistently. This thesis aims to improve the reliability of these models by creating high fidelity reconstructions of the inner ear and critically assessing the validity of the underlying and hitherto untested assumptions. Regarding boundary conditions, the evidence suggests that the unmodelled monopolar return path should be accounted for, perhaps by applying a voltage offset at a boundary surface. Regarding vasculature, the models show that large modiolar vessels like the vein of the scala tympani have a strong local effect near the stimulating electrode. Finally, it appears that the oft-cited quasi-static assumption is not valid due to the high permittivity of neural tissue. It is hoped that the study improves the trustworthiness of all bioelectric models of the cochlea, either by validating the claims of existing models, or by prompting improvements in future work. Developing our understanding of the underlying physics will pave the way for advancing future electrode array designs as well as patient-specific simulations, ultimately improving the quality of life for those with SHL

    Sparse Nonstationary Gabor Expansions - with Applications to Music Signals

    Get PDF

    Effects of errorless learning on the acquisition of velopharyngeal movement control

    Get PDF
    Session 1pSC - Speech Communication: Cross-Linguistic Studies of Speech Sound Learning of the Languages of Hong Kong (Poster Session)The implicit motor learning literature suggests a benefit for learning if errors are minimized during practice. This study investigated whether the same principle holds for learning velopharyngeal movement control. Normal speaking participants learned to produce hypernasal speech in either an errorless learning condition (in which the possibility for errors was limited) or an errorful learning condition (in which the possibility for errors was not limited). Nasality level of the participants’ speech was measured by nasometer and reflected by nasalance scores (in %). Errorless learners practiced producing hypernasal speech with a threshold nasalance score of 10% at the beginning, which gradually increased to a threshold of 50% at the end. The same set of threshold targets were presented to errorful learners but in a reversed order. Errors were defined by the proportion of speech with a nasalance score below the threshold. The results showed that, relative to errorful learners, errorless learners displayed fewer errors (50.7% vs. 17.7%) and a higher mean nasalance score (31.3% vs. 46.7%) during the acquisition phase. Furthermore, errorless learners outperformed errorful learners in both retention and novel transfer tests. Acknowledgment: Supported by The University of Hong Kong Strategic Research Theme for Sciences of Learning © 2012 Acoustical Society of Americapublished_or_final_versio

    Quantum Computing Assisted Speech Processing

    Get PDF
    Mensch-Maschine-Interaktion im Allgemeinen und Sprachverarbeitung im Besonderen sind Schlüsseldisziplinen in der heutigen Unterhaltungselektronik. Obwohl die Rechenleistung mobiler Geräte in den letzten Jahren stark zugenommen hat, sind Aufgaben wie Spracherkennung immernoch hauptsächlich auf cloudbasierte Lösungen angewiesen. Bei solchen Architekturen is nicht nur eine hohe Genauigkeit, sondern auch eine schnelle Reaktionszeit für eine reale und nutzerfreundliche Anwendung unerlässlich. Moderne Ansätze verwenden maschinelles Lernen für die Erkennung der Sprache, die hoch performante Hardware und umfassende Datensätze erfordert. Neben dem eigentlichen Training und der Inferenz solcher Modelle für das maschinelle Lernen erfordert Spracherkennung die Extraktion von akustischen Merkmalen aus der aufgenommenen Sprache. Spektrogramme haben sich hierbei als gut geeigneter Merkmalsraum erwiesen und sich in heutigen Systemen etabliert. Eine Anwendung von Quantencomputern in der Spracherkennung wurde zuvor in der Arbeit von [YQC+20b] vorgeschlagen, in welcher ein Neuronales Netz, das auf mittels von einem Quantencomputer manipulierten Spektrogrammen trainiert wurde, die Validierungsgenauigkeit des klassischen Ansatzes übertraf. Quantencomputer sind jedoch vor allem für ihre Überlegenheit gegenüber klassischen Computern im Berechnen bestimmter Algorithmen bekannt. Da die Quanten-Fourier-Transformation, das Äquivalent der klassischen Fourier-Transformation auf einem Quantencomputer, ein solcher Algorithmus ist, stellt sich die natürliche Frage und somit das Thema dieser Arbeit, ob es Möglichkeiten oder sogar Vorteile gibt, die Quanten-Fourier-Transformation für die Spektrogrammerzeugung zu nutzen. Die Untersuchung dieser Frage erfordert den Aufbau eines geeigneten Frameworks, in dem eine kurzzeit-Quanten-Fourier-Transformation entwickelt, optimiert und ggf. Rauschunterdrückung angewandt wird. Anschließend wird die Genauigkeit eines Neuronalen Netzes, trainiert auf den mittels der kurzzeit-Quanten-Fourier-Transformation erzeugten Merkmalen, evaluiert und diskutiert. Da die Sprachsynthese, als eine weitere Unterkategorie der Sprachverarbeitung, ein völlig anderes Framework erfordert und ein ganzes Set an weiteren Herausfoderungen beherbergt, wenngleich viele aus der Spracherkennung gewonnenen Erkenntnisse darin übertragen werden können, konzentriert sich diese Arbeit ausschließlich auf die Spracherkennung. Durch die Verwendung eines modularen Ansatzes können verschiedene Signaltypen sowie Transformationen schnell ausgetauscht und entweder in der Simulation oder auf realen Quantencomputern getestet werden. Für die Bewertung der Genauigkeit des Neuronalen Netzwerks, gegebenen den Merkmale aus verschiedenen Konfigurationen der kurzzeitQuanten-Fourier-Transformation, wird die in [YQC + 20b] vorgeschlagene Architektur als Ausgangspunkt verwendet und mit ihrer Genauigkeit von 95.12 % als Referenzwert verglichen. Experimente zeigen, dass Quantencomputer der “Noisy Intermediate Scale Quantum”Ära zwar in der Lage sind, die Quanten-Fourier-Transformation von stark bandbegrenzten harmonischen Schwingungen zu verarbeiten. Jedoch verbietet der beschränkte Zugang zu komplexeren Quantencomputern, die notwendig sind um den Anforderungen an die Abtastfrequenz von Sprachsignalen in Bezug auf Zeit- und Frequenzauflösung zu erfüllen, ix eine Anwendung in praktischen Spracherkennungsszenarien. Durch die Verwendung einer Simulationsumgebung mit dem Rauschmodell eines Quantencomputers in Kombination mit den in dieser Arbeit entwickelten Ansätze, ermöglicht das mit dem kurzzeit-Quanten-Fourier-Transformation erzeugte Spektrogramm dem Neuronalen Netzwerk eine Testgenauigkeit von 89.92 %, während jedoch die auf realen Geräten potentielle Geschwindigkeitssteigerung verloren geht. Obwohl die Genauigkeit nicht über der Referenz liegt und das Rauschen und die Kapazität von “Noisy Intermediate Scale Quantum”Geräten die Anwendbarkeit von Spracherkennung mit Quantenvorteil einschränkt, motivieren die Ergebnisse zu weiteren Untersuchungen in praktischen Anwendungen der Quanten-Fourier-Transformation für die Sprachverarbeitung

    A comparative evaluation for liver segmentation from spir images and a novel level set method using signed pressure force function

    Get PDF
    Thesis (Doctoral)--Izmir Institute of Technology, Electronics and Communication Engineering, Izmir, 2013Includes bibliographical references (leaves: 118-135)Text in English; Abstract: Turkish and Englishxv, 145 leavesDeveloping a robust method for liver segmentation from magnetic resonance images is a challenging task due to similar intensity values between adjacent organs, geometrically complex liver structure and injection of contrast media, which causes all tissues to have different gray level values. Several artifacts of pulsation and motion, and partial volume effects also increase difficulties for automatic liver segmentation from magnetic resonance images. In this thesis, we present an overview about liver segmentation methods in magnetic resonance images and show comparative results of seven different liver segmentation approaches chosen from deterministic (K-means based), probabilistic (Gaussian model based), supervised neural network (multilayer perceptron based) and deformable model based (level set) segmentation methods. The results of qualitative and quantitative analysis using sensitivity, specificity and accuracy metrics show that the multilayer perceptron based approach and a level set based approach which uses a distance regularization term and signed pressure force function are reasonable methods for liver segmentation from spectral pre-saturation inversion recovery images. However, the multilayer perceptron based segmentation method requires a higher computational cost. The distance regularization term based automatic level set method is very sensitive to chosen variance of Gaussian function. Our proposed level set based method that uses a novel signed pressure force function, which can control the direction and velocity of the evolving active contour, is faster and solves several problems of other applied methods such as sensitivity to initial contour or variance parameter of the Gaussian kernel in edge stopping functions without using any regularization term
    corecore