48 research outputs found

    A Literature Review on the Application of Acoustic Emission to Machine Condition Monitoring

    Get PDF
    Acoustic emission (AE) is a common physical phenomenon, in which the strain energy is released in the form of elastic wave when a material is deformed or cracked during the stress process. The condition monitoring based on AE is a relatively new method that aims to use noise/vibration anomalies to detect machine failures. However, some challenges lie ahead of its application. This thesis aims to analyze the literature in the field of AE applications to machine condition monitoring. The principles of AE technology, relevant instruments, machine monitoring and AE signal analysis, and practical examples of AE monitoring applications will be presented. More specifically, challenges, solutions and future direction in solving signal noise and attenuation challenges will be discussed. Through the example of rotating machinery, the characteristics of AE will be explained in detail. This thesis lays the foundation for the actual use of AE to monitor and analyze the state of machinery and provides guideline for future data collection and analysis of AE signals

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Neural architecture for echo suppression during sound source localization based on spiking neural cell models

    Get PDF
    Zusammenfassung Diese Arbeit untersucht die biologischen Ursachen des psycho-akustischen Präzedenz Effektes, der Menschen in die Lage versetzt, akustische Echos während der Lokalisation von Schallquellen zu unterdrücken. Sie enthält ein Modell zur Echo-Unterdrückung während der Schallquellenlokalisation, welches in technischen Systemen zur Mensch-Maschine Interaktion eingesetzt werden kann. Die Grundlagen dieses Modells wurden aus eigenen elektrophysiologischen Experimenten an der Mongolischen Wüstenrennmaus gewonnen. Die dabei erstmalig an der Wüstenrennmaus erzielten Ergebnisse, zeigen ein besonderes Verhalten spezifischer Zellen im Dorsalen Kern des Lateral Lemniscus, einer dedizierten Region des auditorischen Hirnstammes. Die dort sichtbare Langzeithemmung scheint die Grundlage für die Echounterdrückung in höheren auditorischen Zentren zu sein. Das entwickelte Model war in der Lage dieses Verhalten nachzubilden, und legt die Vermutung nahe, dass eine starke und zeitlich präzise Hyperpolarisation der zugrundeliegende physiologische Mechanismus dieses Verhaltens ist. Die entwickelte Neuronale Modellarchitektur modelliert das Innenohr und fünf wesentliche Kerne des auditorischen Hirnstammes in ihrer Verbindungsstruktur und internen Dynamik. Sie stellt einen neuen Typus neuronaler Modellierung dar, der als Spike-Interaktionsmodell (SIM) bezeichnet wird. SIM nutzen die präzise räumlich-zeitliche Interaktion einzelner Aktionspotentiale (Spikes) für die Kodierung und Verarbeitung neuronaler Informationen. Die Basis dafür bilden Integrate-and-Fire Neuronenmodelle sowie Hebb'sche Synapsen, welche um speziell entwickelte dynamische Kernfunktionen erweitert wurden. Das Modell ist in der Lage, Zeitdifferenzen von 10 mykrosekunden zu detektieren und basiert auf den Prinzipien der zeitlichen und räumlichen Koinzidenz sowie der präzisen lokalen Inhibition. Es besteht ausschließlich aus Elementen einer eigens entwickelten Neuronalen Basisbibliothek (NBL) die speziell für die Modellierung verschiedenster Spike- Interaktionsmodelle entworfen wurde. Diese Bibliothek erweitert die kommerziell verfügbare dynamische Simulationsumgebung von MATLAB/SIMULINK um verschiedene Modelle von Neuronen und Synapsen, welche die intrinsischen dynamischen Eigenschaften von Nervenzellen nachbilden. Die Nutzung dieser Bibliothek versetzt sowohl den Ingenieur als auch den Biologen in die Lage, eigene, biologisch plausible, Modelle der neuronalen Informationsverarbeitung ohne detaillierte Programmierkenntnisse zu entwickeln. Die grafische Oberfläche ermöglicht strukturelle sowie parametrische Modifikationen und ist in der Lage, den Zeitverlauf mikroskopischer Zellpotentiale aber auch makroskopischer Spikemuster während und nach der Simulation darzustellen. Zwei grundlegende Elemente der Neuronalen Basisbibliothek wurden zur Implementierung als spezielle analog-digitale Schaltungen vorbereitet. Erste Silizium Implementierungen durch das Team des DFG Graduiertenkollegs GRK 164 konnten die Möglichkeit einer vollparallelen on line Verarbeitung von Schallsignalen nachweisen. Durch Zuhilfenahme des im GRK entwickelten automatisierten Layout Generators wird es möglich, spezielle Prozessoren zur Anwendung biologischer Verarbeitungsprinzipien in technischen Systemen zu entwickeln. Diese Prozessoren unterscheiden sich grundlegend von den klassischen von Neumann Prozessoren indem sie räumlich und zeitlich verteilte Spikemuster, anstatt sequentieller binärer Werte zur Informationsrepräsentation nutzen. Sie erweitern das digitale Kodierungsprinzip durch die Dimensionen des Raumes (2 dimensionale Nachbarschaft) der Zeit (Frequenz, Phase und Amplitude) sowie der zeitlichen Dynamik analoger Potentialverläufe. Diese Dissertation besteht aus sieben Kapiteln, welche den verschiedenen Bereichen der Computational Neuroscience gewidmet sind. Kapitel 1 beschreibt die Motivation dieser Arbeit welche aus der Absicht rühren, biologische Prinzipien der Schallverarbeitung zu erforschen und für technische Systeme während der Interaktion mit dem Menschen nutzbar zu machen. Zusätzlich werden fünf Gründe für die Nutzung von Spike-Interaktionsmodellen angeführt sowie deren neuartiger Charakter beschrieben. Kapitel 2 führt die biologischen Prinzipien der Schallquellenlokalisation und den psychoakustischen Präzedenz Effekt ein. Aktuelle Hypothesen zur Entstehung dieses Effektes werden anhand ausgewählter experimenteller Ergebnisse verschiedener Forschungsgruppen diskutiert. Kapitel 3 beschreibt die entwickelte Neuronale Basisbibliothek und führt die einzelnen neuronalen Simulationselemente ein. Es erklärt die zugrundeliegenden mathematischen Funktionen der dynamischen Komponenten und beschreibt deren generelle Einsetzbarkeit zur dynamischen Simulation spikebasierter Neuronaler Netzwerke. Kapitel 4 enthält ein speziell entworfenes Modell des auditorischen Hirnstammes beginnend mit den Filterkaskaden zur Simulation des Innenohres, sich fortsetzend über mehr als 200 Zellen und 400 Synapsen in 5 auditorischen Kernen bis zum Richtungssensor im Bereich des auditorischen Mittelhirns. Es stellt die verwendeten Strukturen und Parameter vor und enthält grundlegende Hinweise zur Nutzung der Simulationsumgebung. Kapitel 5 besteht aus drei Abschnitten, wobei der erste Abschnitt die Experimentalbedingungen und Ergebnisse der eigens durchgeführten Tierversuche beschreibt. Der zweite Abschnitt stellt die Ergebnisse von 104 Modellversuchen zur Simulationen psycho-akustischer Effekte dar, welche u.a. die Fähigkeit des Modells zur Nachbildung des Präzedenz Effektes testen. Schließlich beschreibt der letzte Abschnitt die Ergebnisse der 54 unter realen Umweltbedingungen durchgeführten Experimente. Dabei kamen Signale zur Anwendung, welche in normalen sowie besonders stark verhallten Räumen aufgezeichnet wurden. Kapitel 6 vergleicht diese Ergebnisse mit anderen biologisch motivierten und technischen Verfahren zur Echounterdrückung und Schallquellenlokalisation und führt den aktuellen Status der Hardwareimplementierung ein. Kapitel 7 enthält schließlich eine kurze Zusammenfassung und einen Ausblick auf weitere Forschungsobjekte und geplante Aktivitäten. Diese Arbeit möchte zur Entwicklung der Computational Neuroscience beitragen, indem sie versucht, in einem speziellen Anwendungsfeld die Lücke zwischen biologischen Erkenntnissen, rechentechnischen Modellen und Hardware Engineering zu schließen. Sie empfiehlt ein neues räumlich-zeitliches Paradigma der dynamischen Informationsverarbeitung zur Erschließung biologischer Prinzipien der Informationsverarbeitung für technische Anwendungen.This thesis investigates the biological background of the psycho-acoustical precedence effect, enabling humans to suppress echoes during the localization of sound sources. It provides a technically feasible and biologically plausible model for sound source localization under echoic conditions, ready to be used by technical systems during man-machine interactions. The model is based upon own electro-physiological experiments in the mongolian gerbil. The first time in gerbils obtained results reveal a special behavior of specific cells of the dorsal nucleus of the lateral lemniscus (DNLL) - a distinct region in the auditory brainstem. The explored persistent inhibition effect of these cells seems to account for the base of echo suppression at higher auditory centers. The developed model proved capable to duplicate this behavior and suggests, that a strong and timely precise hyperpolarization is the basic mechanism behind this cell behavior. The developed neural architecture models the inner ear as well as five major nuclei of the auditory brainstem in their connectivity and intrinsic dynamics. It represents a new type of neural modeling described as Spike Interaction Models (SIM). SIM use the precise spatio-temporal interaction of single spike events for coding and processing of neural information. Their basic elements are Integrate-and-Fire Neurons and Hebbian synapses, which have been extended by specially designed dynamic transfer functions. The model is capable to detect time differences as small as 10 mircrosecondes and employs the principles of coincidence detection and precise local inhibition for auditory processing. It consists exclusively of elements of a specifically designed Neural Base Library (NBL), which has been developed for multi purpose modeling of Spike Interaction Models. This library extends the commercially available dynamic simulation environment of MATLAB/SIMULINK by different models of neurons and synapses simulating the intrinsic dynamic properties of neural cells. The usage of this library enables engineers as well as biologists to design their own, biologically plausible models of neural information processing without the need for detailed programming skills. Its graphical interface provides access to structural as well as parametric changes and is capable to display the time course of microscopic cell parameters as well as macroscopic firing pattern during simulations and thereafter. Two basic elements of the Neural Base Library have been prepared for implementation by specialized mixed analog-digital circuitry. First silicon implementations were realized by the team of the DFG Graduiertenkolleg GRK 164 and proved the possibility of fully parallel on line processing of sounds. By using the automated layout processor under development in the Graduiertenkolleg, it will be possible to design specific processors in order to apply theprinciples of distributed biological information processing to technical systems. These processors differ from classical von Neumann processors by the use of spatio temporal spike pattern instead of sequential binary values. They will extend the digital coding principle by the dimensions of space (spatial neighborhood), time (frequency, phase and amplitude) as well as the dynamics of analog potentials and introduce a new type of information processing. This thesis consists of seven chapters, dedicated to the different areas of computational neuroscience. Chapter 1: provides the motivation of this study arising from the attempt to investigate the biological principles of sound processing and make them available to technical systems interacting with humans under real world conditions. Furthermore, five reasons to use spike interaction models are given and their novel characteristics are discussed. Chapter 2: introduces the biological principles of sound source localization and the precedence effect. Current hypothesis on echo suppression and the underlying principles of the precedence effect are discussed by reference to a small selection of physiological and psycho-acoustical experiments. Chapter 3: describes the developed neural base library and introduces each of the designed neural simulation elements. It also explains the developed mathematical functions of the dynamic compartments and describes their general usage for dynamic simulation of spiking neural networks. Chapter 4: introduces the developed specific model of the auditory brainstem, starting from the filtering cascade in the inner ear via more than 200 cells and 400 synapses in five auditory regions up to the directional sensor at the level of the auditory midbrain. It displays the employed parameter sets and contains basic hints for the set up and configuration of the simulation environment. Chapter 5: consists of three sections, whereas the first one describes the set up and results of the own electro-physiological experiments. The second describes the results of 104 model simulations, performed to test the models ability to duplicate psycho-acoustical effects like the precedence effect. Finally, the last section of this chapter contains the results of 54 real world experiments using natural sound signals, recorded under normal as well as highly reverberating conditions. Chapter 6: compares the achieved results to other biologically motivated and technical models for echo suppression and sound source localization and introduces the current status of silicon implementation. Chapter 7: finally provides a short summary and an outlook toward future research subjects and areas of investigation. This thesis aims to contribute to the field of computational neuroscience by bridging the gap between biological investigation, computational modeling and silicon engineering in a specific field of application. It suggests a new spatio-temporal paradigm of information processing in order to access the capabilities of biological systems for technical applications

    Efficient and Robust Methods for Audio and Video Signal Analysis

    Get PDF
    This thesis presents my research concerning audio and video signal processing and machine learning. Specifically, the topics of my research include computationally efficient classifier compounds, automatic speech recognition (ASR), music dereverberation, video cut point detection and video classification.Computational efficacy of information retrieval based on multiple measurement modalities has been considered in this thesis. Specifically, a cascade processing framework, including a training algorithm to set its parameters has been developed for combining multiple detectors or binary classifiers in computationally efficient way. The developed cascade processing framework has been applied on video information retrieval tasks of video cut point detection and video classification. The results in video classification, compared to others found in the literature, indicate that the developed framework is capable of both accurate and computationally efficient classification. The idea of cascade processing has been additionally adapted for the ASR task. A procedure for combining multiple speech state likelihood estimation methods within an ASR framework in cascaded manner has been developed. The results obtained clearly show that without impairing the transcription accuracy the computational load of ASR can be reduced using the cascaded speech state likelihood estimation process.Additionally, this thesis presents my work on noise robustness of ASR using a nonnegative matrix factorization (NMF) -based approach. Specifically, methods for transformation of sparse NMF-features into speech state likelihoods has been explored. The results reveal that learned transformations from NMF activations to speech state likelihoods provide better ASR transcription accuracy than dictionary label -based transformations. The results, compared to others in a noisy speech recognition -challenge show that NMF-based processing is an efficient strategy for noise robustness in ASR.The thesis also presents my work on audio signal enhancement, specifically, on removing the detrimental effect of reverberation from music audio. In the work, a linear prediction -based dereverberation algorithm, which has originally been developed for speech signal enhancement, was applied for music. The results obtained show that the algorithm performs well in conjunction with music signals and indicate that dynamic compression of music does not impair the dereverberation performance

    Learning-Based Reference-Free Speech Quality Assessment for Normal Hearing and Hearing Impaired Applications

    Get PDF
    Accurate speech quality measures are highly attractive and beneficial in the design, fine-tuning, and benchmarking of speech processing algorithms, devices, and communication systems. Switching from narrowband telecommunication to wideband telephony is a change within the telecommunication industry which provides users with better speech quality experience but introduces a number of challenges in speech processing. Noise is the most common distortion on audio signals and as a result there have been a lot of studies on developing high performance noise reduction algorithms. Assistive hearing devices are designed to decrease communication difficulties for people with loss of hearing. As the algorithms within these devices become more advanced, it becomes increasingly crucial to develop accurate and robust quality metrics to assess their performance. Objective speech quality measurements are more attractive compared to subjective assessments as they are cost-effective and subjective variability is eliminated. Although there has been extensive research on objective speech quality evaluation for narrowband speech, those methods are unsuitable for wideband telephony. In the case of hearing-impaired applications, objective quality assessment is challenging as it has to be capable of distinguishing between desired modifications which make signals audible and undesired artifacts. In this thesis a model is proposed that allows extracting two sets of features from the distorted signal only. This approach which is called reference-free (nonintrusive) assessment is attractive as it does not need access to the reference signal. Although this benefit makes nonintrusive assessments suitable for real-time applications, more features need to be extracted and smartly combined to provide comparable accuracy as intrusive metrics. Two feature vectors are proposed to extract information from distorted signals and their performance is examined in three studies. In the first study, both feature vectors are trained on various portions of a noise reduction database for normal hearing applications. In the second study, the same investigation is performed on two sets of databases acquired through several hearing aids. Third study examined the generalizability of the proposed metrics on benchmarking four wireless remote microphones in a variety of environmental conditions. Machine learning techniques are deployed for training the models in the three studies. The studies show that one of the feature sets is robust when trained on different portions of the data from different databases and it also provides good quality prediction accuracy for both normal hearing and hearing-impaired applications

    Modelling, Simulation and Data Analysis in Acoustical Problems

    Get PDF
    Modelling and simulation in acoustics is currently gaining importance. In fact, with the development and improvement of innovative computational techniques and with the growing need for predictive models, an impressive boost has been observed in several research and application areas, such as noise control, indoor acoustics, and industrial applications. This led us to the proposal of a special issue about “Modelling, Simulation and Data Analysis in Acoustical Problems”, as we believe in the importance of these topics in modern acoustics’ studies. In total, 81 papers were submitted and 33 of them were published, with an acceptance rate of 37.5%. According to the number of papers submitted, it can be affirmed that this is a trending topic in the scientific and academic community and this special issue will try to provide a future reference for the research that will be developed in coming years

    Design of reservoir computing systems for the recognition of noise corrupted speech and handwriting

    Get PDF

    Automatic speaker recognition: modelling, feature extraction and effects of clinical environment

    Get PDF
    Speaker recognition is the task of establishing identity of an individual based on his/her voice. It has a significant potential as a convenient biometric method for telephony applications and does not require sophisticated or dedicated hardware. The Speaker Recognition task is typically achieved by two-stage signal processing: training and testing. The training process calculates speaker-specific feature parameters from the speech. The features are used to generate statistical models of different speakers. In the testing phase, speech samples from unknown speakers are compared with the models and classified. Current state of the art speaker recognition systems use the Gaussian mixture model (GMM) technique in combination with the Expectation Maximization (EM) algorithm to build the speaker models. The most frequently used features are the Mel Frequency Cepstral Coefficients (MFCC). This thesis investigated areas of possible improvements in the field of speaker recognition. The identified drawbacks of the current speaker recognition systems included: slow convergence rates of the modelling techniques and feature’s sensitivity to changes due aging of speakers, use of alcohol and drugs, changing health conditions and mental state. The thesis proposed a new method of deriving the Gaussian mixture model (GMM) parameters called the EM-ITVQ algorithm. The EM-ITVQ showed a significant improvement of the equal error rates and higher convergence rates when compared to the classical GMM based on the expectation maximization (EM) method. It was demonstrated that features based on the nonlinear model of speech production (TEO based features) provided better performance compare to the conventional MFCCs features. For the first time the effect of clinical depression on the speaker verification rates was tested. It was demonstrated that the speaker verification results deteriorate if the speakers are clinically depressed. The deterioration process was demonstrated using conventional (MFCC) features. The thesis also showed that when replacing the MFCC features with features based on the nonlinear model of speech production (TEO based features), the detrimental effect of the clinical depression on speaker verification rates can be reduced

    A Statistical Perspective of the Empirical Mode Decomposition

    Get PDF
    This research focuses on non-stationary basis decompositions methods in time-frequency analysis. Classical methodologies in this field such as Fourier Analysis and Wavelet Transforms rely on strong assumptions of the underlying moment generating process, which, may not be valid in real data scenarios or modern applications of machine learning. The literature on non-stationary methods is still in its infancy, and the research contained in this thesis aims to address challenges arising in this area. Among several alternatives, this work is based on the method known as the Empirical Mode Decomposition (EMD). The EMD is a non-parametric time-series decomposition technique that produces a set of time-series functions denoted as Intrinsic Mode Functions (IMFs), which carry specific statistical properties. The main focus is providing a general and flexible family of basis extraction methods with minimal requirements compared to those within the Fourier or Wavelet techniques. This is highly important for two main reasons: first, more universal applications can be taken into account; secondly, the EMD has very little a priori knowledge of the process required to apply it, and as such, it can have greater generalisation properties in statistical applications across a wide array of applications and data types. The contributions of this work deal with several aspects of the decomposition. The first set regards the construction of an IMF from several perspectives: (1) achieving a semi-parametric representation of each basis; (2) extracting such semi-parametric functional forms in a computationally efficient and statistically robust framework. The EMD belongs to the class of path-based decompositions and, therefore, they are often not treated as a stochastic representation. (3) A major contribution involves the embedding of the deterministic pathwise decomposition framework into a formal stochastic process setting. One of the assumptions proper of the EMD construction is the requirement for a continuous function to apply the decomposition. In general, this may not be the case within many applications. (4) Various multi-kernel Gaussian Process formulations of the EMD will be proposed through the introduced stochastic embedding. Particularly, two different models will be proposed: one modelling the temporal mode of oscillations of the EMD and the other one capturing instantaneous frequencies location in specific frequency regions or bandwidths. (5) The construction of the second stochastic embedding will be achieved with an optimisation method called the cross-entropy method. Two formulations will be provided and explored in this regard. Application on speech time-series are explored to study such methodological extensions given that they are non-stationary
    corecore