11 research outputs found

    New method for mathematical modelling of human visual speech

    Get PDF
    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather than a continuation from one to another. Consequently, there is no globally accepted standard method for representing lip movement during articulation. This thesis addresses these issues by designing a transcribed group of words, by mathematical formulas, and so introducing the concept of a visual word, allocating signatures to visual words and finally building a visual speech vocabulary database. In addition, visual speech information has been analysed in a novel way by considering both lip movements and phonemic structure of the English language. In order to extract the visual data, three visual features on the lip have been chosen; these are on the outer upper, lower and corner of the lip. The extracted visual data during articulation is called the visual speech sample set. The final visual data is obtained after processing the visual speech sample sets to correct experimented artefacts such as head tilting, which happened during articulation and visual data extraction. The ‘Barycentric Lagrange Interpolation’ (BLI) formulates the visual speech sample sets into visual speech signals. The visual word is defined in this work and consists of the variation of three visual features. Further processing on relating the visual speech signals to the uttered word leads to the allocation of signatures that represent the visual word. This work suggests the visual word signature can be used either as a ‘visual word barcode’, a ‘digital visual word’ or a ‘2D/3D representations’. The 2D version of the visual word provides a unique signature that allows the identification of the words being uttered. In addition, identification of visual words has also been performed using a technique called ‘volumetric representations of the visual words’. Furthermore, the effect of altering the amplitudes and sampling rate for BLI has been evaluated. In addition, the performance of BLI in reconstructing the visual speech sample sets has been considered. Finally, BLI has been compared to signal reconstruction approach by RMSE and correlation coefficients. The results show that the BLI is the more reliable method for the purpose of this work according to Section 7.7

    New method for mathematical modelling of human visual speech

    Get PDF
    Audio-visual speech recognition and visual speech synthesisers are used as interfaces between humans and machines. Such interactions specifically rely on the analysis and synthesis of both audio and visual information, which humans use for face-to-face communication. Currently, there is no global standard to describe these interactions nor is there a standard mathematical tool to describe lip movements. Furthermore, the visual lip movement for each phoneme is considered in isolation rather than a continuation from one to another. Consequently, there is no globally accepted standard method for representing lip movement during articulation. This thesis addresses these issues by designing a transcribed group of words, by mathematical formulas, and so introducing the concept of a visual word, allocating signatures to visual words and finally building a visual speech vocabulary database. In addition, visual speech information has been analysed in a novel way by considering both lip movements and phonemic structure of the English language. In order to extract the visual data, three visual features on the lip have been chosen; these are on the outer upper, lower and corner of the lip. The extracted visual data during articulation is called the visual speech sample set. The final visual data is obtained after processing the visual speech sample sets to correct experimented artefacts such as head tilting, which happened during articulation and visual data extraction. The ‘Barycentric Lagrange Interpolation’ (BLI) formulates the visual speech sample sets into visual speech signals. The visual word is defined in this work and consists of the variation of three visual features. Further processing on relating the visual speech signals to the uttered word leads to the allocation of signatures that represent the visual word. This work suggests the visual word signature can be used either as a ‘visual word barcode’, a ‘digital visual word’ or a ‘2D/3D representations’. The 2D version of the visual word provides a unique signature that allows the identification of the words being uttered. In addition, identification of visual words has also been performed using a technique called ‘volumetric representations of the visual words’. Furthermore, the effect of altering the amplitudes and sampling rate for BLI has been evaluated. In addition, the performance of BLI in reconstructing the visual speech sample sets has been considered. Finally, BLI has been compared to signal reconstruction approach by RMSE and correlation coefficients. The results show that the BLI is the more reliable method for the purpose of this work according to Section 7.7

    Robust automatic transcription of lectures

    Get PDF
    Automatic transcription of lectures is becoming an important task. Possible applications can be found in the fields of automatic translation or summarization, information retrieval, digital libraries, education and communication research. Ideally those systems would operate on distant recordings, freeing the presenter from wearing body-mounted microphones. This task, however, is surpassingly difficult, given that the speech signal is severely degraded by background noise and reverberation

    Robust Automatic Transcription of Lectures

    Get PDF
    Die automatische Transkription von Vorträgen, Vorlesungen und Präsentationen wird immer wichtiger und ermöglicht erst die Anwendungen der automatischen Übersetzung von Sprache, der automatischen Zusammenfassung von Sprache, der gezielten Informationssuche in Audiodaten und somit die leichtere Zugänglichkeit in digitalen Bibliotheken. Im Idealfall arbeitet ein solches System mit einem Mikrofon das den Vortragenden vom Tragen eines Mikrofons befreit was der Fokus dieser Arbeit ist

    Information theoretic perspectives on en- and decoding in audition and vision

    Get PDF
    In cognitive neuroscience, encoding and decoding models mathematically relate stimuli in the outside world to neuronal or behavioural responses. While both stimuli and responses can be multidimensional variables, these models are on their own limited to bivariate descriptions of correspondences. In order to assess the cognitive or neuroscientific significance of such correspondences, a key challenge is to set them in relation to other variables. This thesis uses information theory to contextualise encoding and decoding models in example cases of audition and vision. In the first example, encoding models based on a certain operationalisation of the stimulus are relativised by models based on other operationalisations of the same stimulus material that are conceptually simpler and shown to predict the same neuronal response variance. This highlights the ambiguity inherent in an individual model. In the second example, a methodological contribution is made to the problem of relating the bivariate dependency of stimuli and responses to the history of response components with high degrees of predictability. This perspective demonstrates that only a subset of all stimulus-correlated response variance can be expected to be genuinely caused by the stimulus, while another subset is the consequence of the response’s own dynamics. In the third and final example, complex models are used to predict behavioural responses. Their predictions are grounded in experimentally controlled stimulus variance, such that interpretations of what the models predicted responses with are facilitated. Together, these three perspectives underscore the need to go beyond bivariate descriptions of correspondences in order to understand the process of perception

    Decoding Electrophysiological Correlates of Selective Attention by Means of Circular Data

    Get PDF
    Sustaining our attention to a relevant sensory input in a complex listening environment, is of great importance for a successful auditory communication. To avoid the overload of the auditory system, the importance of the stimuli is estimated in the higher levels of the auditory system. Based on these information, the attention is drifted away from the irrelevant and unimportant stimuli. Long-term habituation, a gradual process independent from sensory adaptation, plays a major role in drifting away our attention from irrelevant stimuli. A better understanding of attention-modulated neural activity is important for shedding light on the encoding process of auditory streams. For instance, these information can have a direct impact on developing smarter hearing aid devices in which more accurate objective measures can be used to re ect the hearing capabilities of patients with hearing pathologies. As an example, an objective measures of long-term habituation with respect to di erent level of sound stimuli can be used more accurately for adjustment of hearing aid devices in comparison to verbal reports. The main goal of this thesis is to analyze the neural decoding signatures of long-term habituation and neural modulations of selective attention by exploiting circular regularities in electrophysiological (EEG) data, in which we can objectively measure the level of attentional-binding to di erent stimuli. We study, in particular, the modulations of the instantaneous phase (IP) in event related potentials (ERPs) over trials for di erent experimental settings. This is in contrast to the common approach where the ERP component of interest is computed through averaging a su ciently large number of ERP trials. It is hypothesized that a high attentional binding to a stimulus is related to a high level of IP cluster. As the attention binding reduces, IP is spread more uniformly on a unit circle. This work is divided into three main parts. In the initial part, we investigate the dynamics of long-term habituation with di erent acoustical stimuli (soft vs. loud) over ERP trials. The underlying temporal dynamics in IP and the level of phase cluster of the ERPs are assessed by tting circular probability functions (pdf) over data segments. To increase the temporal resolution of detecting times at which a signi cant change in IP occurs, an abrupt change point model at di erent pure-tone stimulations is used. In a second study, we improve upon the results and methodology by relaxing some of the constrains in order to integrate the gradual process of long-term habituation into the model. For this means, a Bayesian state-space model is proposed. In all of the aforementioned studies, we successfully classi ed between di erent stimulation levels, using solely the IP of ERPs over trials. In the second part of the thesis, the experimental setting is expanded to contain longer and more complex auditory stimuli as in real-world scenarios. Thereby, we study the neural-correlates of attention in spontaneous modulations of EEG (ongoing activity) which uses the complete temporal resolution of the signal. We show a mapping between the ERP results and the ongoing EEG activity based on IP. A Markov-based model is developed for removing spurious variations that can occur in ongoing signals. We believe the proposed method can be incorporated as an important preprocessing step for a more reliable estimation of objective measures of the level of selective attention. The proposed model is used to pre-process and classify between attending and un-attending states in a seminal dichotic tone detection experiment. In the last part of this thesis, we investigate the possibility of measuring a mapping between the neural activities of the cortical laminae with the auditory evoked potentials (AEP) in vitro. We show a strong correlation between the IP of AEPs and the neural activities at the granular layer, using mutual information.Die Aufmerksamkeit auf ein relevantes auditorisches Signal in einer komplexen H orumgebung zu lenken ist von gro er Bedeutung f ur eine erfolgreiche akustische Kommunikation. Um eine Uberlastung des H orsystems zu vermeiden, wird die Bedeutung der Reize in den h oheren Ebenen des auditorischen Systems bewertet. Basierend auf diesen Informationen wird die Aufmerksamkeit von den irrelevanten und unwichtigen Reizen abgelenkt. Dabei spielt die sog. Langzeit- Habituation, die einen graduellen Prozess darstellt der unabh angig von der sensorischen Adaptierung ist, eine wichtige Rolle. Ein besseres Verst andnis der aufmerksamkeits-modulierten neuronalen Aktivit at ist wichtig, um den Kodierungsprozess von sog. auditory streams zu beleuchten. Zum Beispiel k onnen diese Informationen einen direkten Ein uss auf die Entwicklung intelligenter H orsysteme haben bei denen genauere, objektive Messungen verwendet werden k onnen, um die H orf ahigkeiten von Patienten mit H orpathologien widerzuspiegeln. So kann beispielsweise ein objektives Ma f ur die Langzeit- Habituation an unterschiedliche Schallreize genutzt werden um - im Vergleich zu subjektiven Selbsteinsch atzungen - eine genauere Anpassung der H orsysteme zu erreichen. Das Hauptziel dieser Dissertation ist die Analyse neuronaler Dekodierungssignaturen der Langzeit- Habituation und neuronaler Modulationen der selektiver Aufmerksamkeit durch Nutzung zirkul arer Regularit aten in elektroenzephalogra schen Daten, in denen wir objektiv den Grad der Aufmerksamkeitsbindung an verschiedene Reize messen k onnen. Wir untersuchen insbesondere die Modulation der Momentanphase (engl. Instantaneous phase, IP) in ereigniskorrelierten Potenzialen (EKPs) in verschiedenen experimentellen Settings. Dies steht im Gegensatz zu dem traditionellen Ansatz, bei dem die interessierenden EKP-Komponenten durch Mittelung einer ausreichend gro en Anzahl von Einzelantworten im Zeitbereich ermittelt werden. Es wird vermutet, dass eine hohe Aufmerksamkeitsbindung an einen Stimulus mit einem hohen Grad an IP-Clustern verbunden ist. Nimmt die Aufmerksamkeitsbindung hingegen ab, so ist die Momentanphase uniform auf dem Einheitskreis verteilt. Diese Arbeit gliedert sich in drei Teile. Im ersten Teil untersuchen wir die Dynamik der Langzeit-Habituation mit verschiedenen akustischen Reizen (leise vs. laut) in EKP-Studien. Die zugrundeliegende zeitliche Dynamik der Momentanphase und die Ebene des Phasenclusters der EKPs werden durch die Anpassung von zirkul aren Wahrscheinlichkeitsfunktionen (engl. probability density function, pdf) uber Datensegmente bewertet. Mithilfe eines sog. abrupt change-point Modells wurde die zeitliche Au osung der Daten erh oht, sodass signi kante Anderungen in der Momentanphase bei verschiedenen Reintonstimulationen detektierbar sind. In einer zweiten Studie verbessern wir die Ergebnisse und die Methodik, indem wir einige der Einschr ankungen lockern, um den gradualen Prozess der Langzeit-Habituation in das abrupt changepoint Modell zu integrieren. Dazu wird ein bayes`sches Zustands-Raum-Modell vorgeschlagen. In den zuvor genannten Studien konnte erfolgreich mithilfe der Momentanphase zwischen verschiedenen Stimulationspegeln unterschieden werden. Im zweiten Teil der Arbeit wird der experimentelle Rahmen erweitert, um komplexere auditorische Reize wie in realen H orsituationen untersuchen zu k onnen. Dabei analysieren wir die neuronalen Korrelate der Aufmerksamkeit anhand spontaner Modulationen der kontinuierlichen EEG-Aktivit at, die eine zeitliche Au osung erm oglicht. Wir zeigen eine Abbildung zwischen den EKP-Ergebnissen und der kontinuierlichen EEG-Aktivit at auf Basis der Momentanphase. Ein Markov-basiertes Modell wird entwickelt, um st orende Variationen zu entfernen, die in kontinuierlichen EEG-Signalen auftreten k onnen. Wir glauben, dass die vorgeschlagene Methode als wichtiger Vorverarbeitungsschritt zur soliden objektiven Absch atzung des Aufmerksamkeitsgrades mithilfe von EEG-Daten verwendet werden kann. In einem dichotischen Tonerkennungsexperiment wird das vorgeschlagene Modell zur Vorverarbeitung der EEG-Daten und zur Klassi zierung zwischen gerichteten und ungerichteten Aufmerksamkeitszust anden erfolgreich verwendet. Im letzten Teil dieser Arbeit untersuchen wir den Zusammenhang zwischen den neuronalen Aktivit aten der kortikalen Laminae und auditorisch evozierten Potentialen (AEP) in vitro im Tiermodell. Wir zeigen eine starke Korrelation zwischen der Momentanphase der AEPs und den neuronalen Aktivit aten in der Granularschicht unter Verwendung der Transinformation

    Modeling diversity by strange attractors with application to temporal pattern recognition

    Get PDF
    This thesis belongs to the general discipline of establishing black-box models from real-word data, more precisely, from measured time-series. This is an old subject and a large amount of papers and books has been written about it. The main difficulty is to express the diversity of data that has essentially the same origin without creating confusion with data that has a different origin. Normally, the diversity of time-series is modeled by a stochastic process, such as filtered white noise. Often, it is reasonable to assume that the time series is generated by a deterministic dynamical system rather than a stochastic process. In this case, the diversity of the data is expressed by the variability of the parameters of the dynamical system. The parameter variability itself is then, once again, modeled by a stochastic process. In both cases the diversity is generated by some form of exogenous noise. In this thesis a further step has been taken. A single chaotic dynamical system is used to model the data and their diversity. Indeed, a chaotic system produces a whole family of trajectories that are different but nonetheless very similar. It is believed that chaotic dynamics not only are a convenient means to represent diversity but that in many cases the origin of diversity stems actually from chaotic dynamic. Since the approach of this thesis explores completely new grounds the most suitable kind of data is considered, namely approximately periodic signals. In nature such time-series are rather common, in particular the physiological signal of living beings, such as the electrocardiograms (ECG), parts of speech signals, electroencephalograms (EEG), etc. Since there are strong arguments in favor of the chaotic nature of these signals, they appear to be the best candidates for modeling diversity by chaos. It should be stressed however, that the modeling approach pursued in this thesis is thought to be quite general and not limited to signals produced by chaotic dynamics in nature. The intended application of the modeling effort in this thesis is temporal signal classification. The reason for this is twofold. Firstly, classification is one of the basic building block of any cognitive system. Secondly, the recently studied phenomenon of synchronization of chaotic systems suggests a way to test a signal against its chaotic model. The essential content of this work can now be formulated as follows. Thesis: The diversity of approximately periodic signals found in nature can be modeled by means of chaotic dynamics. This kind of modeling technique, together with selective properties of the synchronization of chaotic systems, can be exploited for pattern recognition purposes. This Thesis is advocated by means of the following five points. Models of randomness (Chapter 2) It is argued that the randomness observed in nature is not necessarily the result of exogenous noise, but it could be endogenally generated by deterministic chaotic dynamics. The diversity of real signals is compared with signals produced by the most common chaotic systems. Qualitative resonance (Chapter 3) The behavior of chaotic systems forced by periodic or approximately periodic input signals is studied theoretically and by numerical simulation. It is observed that the chaotic system "locks" approximately to an input signal that is related to its internal chaotic dynamic. In contrast to this, its chaotic behavior is reinforced when the input signal has nothing to do with its internal dynamics. This new phenomenon is called "qualitative resonance". Modeling and recognizing (Chapter 4) In this chapter qualitative resonance is used for pattern recognition. The core of the method is a chaotic dynamical system that is able to reproduce the class of time-series that is to be recognized. This model is excited in a suitable way by an input signal such that qualitative resonance is realized. This means that if the input signal belongs to the modeled class of time-series, the system approximately "locks" into it. If not, the trajectory of the system and the input signal remain unrelated. Automated design of the recognizer (Chapters 5 and 6) For the kind of signals considered in this thesis a systematic design method of the recognizer is presented. The model used is a system of Lur'e type, i.e. a model where the linear dynamic and nonlinear static part are separated. The identification of the model parameters from the given data proceed iteratively, adapting in turn the linear and the nonlinear part. Thus, the difficult nonlinear dynamical system identification task is decomposed into the easier problems of linear dynamical and nonlinear static system identification. The way to apply the approximately periodic input signal in order to realize qualitative resonance is chosen with the help of periodic control theory. Validation (Chapter 7) The pattern recognition method has been validated on the following examples — A synthetic example — Laboratory measurement from Colpitts oscillator — ECG — EEG — Vowels of a speech signals In the first four cases a binary classification and in the last example a classification with five classes was performed. To the best of the knowledge of the author the recognition method is original. Chaotic systems have been already used to produce pseudo-noise and to model signal diversity. Also, parameter identification of chaotic systems has been already carried out. However, the direct establishment of the model from the given data and its subsequent use for classification based on the phenomenon of qualitative resonance is entirely new

    Connected Attribute Filtering Based on Contour Smoothness

    Get PDF
    corecore