6 research outputs found

    Exploring temporal information in neonatal seizures using a dynamic time warping based SVM kernel

    Get PDF
    Seizure events in newborns change in frequency, morphology, and propagation. This contextual information is explored at the classifier level in the proposed patient-independent neonatal seizure detection system. The system is based on the combination of a static and a sequential SVM classifier. A Gaussian dynamic time warping based kernel is used in the sequential classifier. The system is validated on a large dataset of EEG recordings from 17 neonates. The obtained results show an increase in the detection rate at very low false detections per hour, particularly achieving a 12% improvement in the detection of short seizure events over the static RBF kernel based system

    Pattern Classification of Signals Using Fisher Kernels

    Get PDF
    The intention of this study is to gauge the performance of Fisher kernels for dimension simplification and classification of time-series signals. Our research work has indicated that Fisher kernels have shown substantial improvement in signal classification by enabling clearer pattern visualization in three-dimensional space. In this paper, we will exhibit the performance of Fisher kernels for two domains: financial and biomedical. The financial domain study involves identifying the possibility of collapse or survival of a company trading in the stock market. For assessing the fate of each company, we have collected financial time-series composed of weekly closing stock prices in a common time frame, using Thomson Datastream software. The biomedical domain study involves knee signals collected using the vibration arthrometry technique. This study uses the severity of cartilage degeneration for classifying normal and abnormal knee joints. In both studies, we apply Fisher Kernels incorporated with a Gaussian mixture model (GMM) for dimension transformation into feature space, which is created as a three-dimensional plot for visualization and for further classification using support vector machines. From our experiments we observe that Fisher Kernel usage fits really well for both kinds of signals, with low classification error rates

    Correlation features and a structured SVM family for phoneme classification and automatic speech recognition

    Get PDF
    Das Hauptziel dieser Arbeit ist, zur Verbesserung der Klassifikation von Phonemen und als direkte Folge davon zur Verbesserung automatischer Spracherkennung beizutragen. Die ausschlaggebende Innovation ist hierbei, dass unterschiedliche Phasen – von der Erstellung der Klassifikationsmerkmale über die innere Struktur der Klassifizierer bis hin zu deren Gesamttopologie – von ein und derselben Grundidee aus deduziert werden. Diese manifestiert sich vor allem in der Interaktion von Korrelation und der verwendeten Tristate-Modellierung von Phonemen. Basis ist dafür die Sprache eigene Charakteristik der (schwachen) Kurzzeitstationarität, repräsentiert durch Segmente mit dieser Eigenschaft und Ubergänge zwischen solchen. Die Tristate-Topologie partitioniert dabei Phoneme, oder allgemeiner Beobachtungen, in drei Bereiche, Starte, Mitte und Ende, und simuliert in Verbindung mit den bekannten Hidden Markov Modellen eben jene Zustandsfolgen von quasi statischen Momenten und Transitionen. Auf Basis der Stationarität und der Tristate Struktur entfaltet sich unser Ansatz wie folgt. Wir betrachten ein Sprachsignal als eine Realisierung eines Zufallsprozesses, welcher innerhalb kurzer Segmente o.g. Eigenschaften annimmt. Durch diese wird die Zeitunabhängigkeit der ersten beiden statistischen Momente determiniert, d.h. die Momente werden allein durch zeitliche Differenzen von Beobachtungen charakterisiert. Mit wechselnden Segmenten und Transitionen zwischen diesen ändern sich daher Auto-und Kreuzkorrelation und in infolgedessen die durch sie definierten, neu entwickelten Merkmale. In diesem Sinne analysieren wir, basierend auf herkömmlichen MFCCVektoren, in einem ersten Schritt mögliche Verbesserungen durch Verwendung von Autokorrelationsdaten und entwickeln aufgrund motivierender Resultate im Weiteren spezielle (Kreuz-) Korrelationsmerkmale. Dabei hilft die Tatsache, dass im Gegensatz zu verschiedenen MFCC-Vektorkomponenten ein und desselben Merkmalvektors (innerhalb dessen die unterschiedliche Komponenten verschiedene Frequenzbänder repräsentieren), gleiche Einträge unterschiedlicher Vektoren im Allgemeinen nicht dekorreliert sind. Im darauffolgenden Schritt geht die Operation der Korrelation direkt in die für die Phonemklassifikation benutzten Support Vektor Klassifizierer insofern ein, als dass deren (reproduzierender) Kern gewonnen wird aus besagter Transformation. Die dafür theoretischen Voraussetzungen werden hergeleitet und die notwendigen Eigenschaften des neuen reproduzierenden Kernes wird bewiesen. Einhergehend mit diesem speziellen Kern wird eine Familie aus Klassifizierern eingeführt, deren Struktur, den Features folgend, direkt an das Tristatemodel angelehnt und ebenfalls von der Korrelation beeinflusst ist. In ihrer Gesamtheit zielen die Konzepte darauf ab, die stationaritären Phasen als auch Transitionen zwischen verschiedenen Sprachsegmenten adäquater zu modellieren als bisherige Verfahren. Die Verbesserung der Erkennungsrate im Vergleich zum Standardansatz wird anschließend anhand von vergleichenden Experimenten gezeigt, und im weiteren Verlauf wird das Verfahren eingebunden in ein allgemeines automatisches Spracherkennungssystem und auf diesem ausgewertet. Vergleichende Experimente mit Standardverfahren demonstrieren dabei das Potential des neuen Ansatzes, und Vorschläge zu Verbesserungen und Weiterentwicklungen schließen die Arbeit ab.The foremost aim of this thesis is to introduce concepts targeting at improving both phoneme classification and in line with this automatic speech recognition. The most distinctive part of the herein presented, new approach is that the different stages of the analysis, from feature vector creation to classification, are all developed upon the common basis. This foundation becomes apparent by the interaction of correlation and the formal structure of a tristate phoneme model that manifests itself in short time weak stationary characteristic and transitions between such segments within phonemes. The tristate layout is a topology that partitions a phoneme, or more generally an observed frame, into three main sections, start, middle and end. In combination with the well known Hidden Markov Model (HMM) it targets at modeling the above mentioned states of transitions and stationarity. On the base of weak stationarity and the tristate structure, our approach evolves as follows. A stochastic process such as a speech signal that is short time weak stationary has first and second order moments independent of time t, they are affected only by the timespan between observations. This effect is reflected by the (auto)covariance of the process and carries over to (auto)correlation and to some degree to cross correlation. In this light, based on common MFCC feature vectors, we first analyze potential improvements when using autocorrelation data and due to motivating results introduce both new MFCC autocorrelation- and later specific cross correlation features. In this context we note that, in contrast to different components (roughly representing the different frequency bands) of a single MFCC vector, identical components across different MFCC vectors in general are not decorrelated. In a subsequent step, the cross correlation transform is integrated into support vector classifiers used for phoneme classification such that a specialized reproducing kernel utilized by the classifiers is deduced directly from the transform. The theoretical prerequisites for the new kernel to be established are derived and proven along with its necessary requirements. Concerning the support vector machines, in line with the new reproducing kernel a family of classifiers is introduced. The structure of the latter evolves around immanent aspects inherited from concepts of phoneme representation and their acoustic progression: The above mentioned tristate model. Based on the topology of the latter and the construction of the features, a specifically structured collection of classes and associated support vector classifiers is designed under additional integration of correlation. All this aims at developing a framework that represents and models both stationarity and transitions within acoustical events to a degree not achieved by recognition and classification systems hitherto. To prove the success of this approach, experiments are conducted to demonstrate the improved recognition rates resulting from the new topology. Further on, the framework is integrated into a common automatic speech recognition system and evaluated in this context. Again, experiments that compare the new approach to a standard recognition system reveal its potentials. Finally, prospects and suggestions for further potential improvements seclude the thesis

    Correlation features and a structured SVM family for phoneme classification and automatic speech recognition

    Get PDF
    Das Hauptziel dieser Arbeit ist, zur Verbesserung der Klassifikation von Phonemen und als direkte Folge davon zur Verbesserung automatischer Spracherkennung beizutragen. Die ausschlaggebende Innovation ist hierbei, dass unterschiedliche Phasen – von der Erstellung der Klassifikationsmerkmale über die innere Struktur der Klassifizierer bis hin zu deren Gesamttopologie – von ein und derselben Grundidee aus deduziert werden. Diese manifestiert sich vor allem in der Interaktion von Korrelation und der verwendeten Tristate-Modellierung von Phonemen. Basis ist dafür die Sprache eigene Charakteristik der (schwachen) Kurzzeitstationarität, repräsentiert durch Segmente mit dieser Eigenschaft und Ubergänge zwischen solchen. Die Tristate-Topologie partitioniert dabei Phoneme, oder allgemeiner Beobachtungen, in drei Bereiche, Starte, Mitte und Ende, und simuliert in Verbindung mit den bekannten Hidden Markov Modellen eben jene Zustandsfolgen von quasi statischen Momenten und Transitionen. Auf Basis der Stationarität und der Tristate Struktur entfaltet sich unser Ansatz wie folgt. Wir betrachten ein Sprachsignal als eine Realisierung eines Zufallsprozesses, welcher innerhalb kurzer Segmente o.g. Eigenschaften annimmt. Durch diese wird die Zeitunabhängigkeit der ersten beiden statistischen Momente determiniert, d.h. die Momente werden allein durch zeitliche Differenzen von Beobachtungen charakterisiert. Mit wechselnden Segmenten und Transitionen zwischen diesen ändern sich daher Auto-und Kreuzkorrelation und in infolgedessen die durch sie definierten, neu entwickelten Merkmale. In diesem Sinne analysieren wir, basierend auf herkömmlichen MFCCVektoren, in einem ersten Schritt mögliche Verbesserungen durch Verwendung von Autokorrelationsdaten und entwickeln aufgrund motivierender Resultate im Weiteren spezielle (Kreuz-) Korrelationsmerkmale. Dabei hilft die Tatsache, dass im Gegensatz zu verschiedenen MFCC-Vektorkomponenten ein und desselben Merkmalvektors (innerhalb dessen die unterschiedliche Komponenten verschiedene Frequenzbänder repräsentieren), gleiche Einträge unterschiedlicher Vektoren im Allgemeinen nicht dekorreliert sind. Im darauffolgenden Schritt geht die Operation der Korrelation direkt in die für die Phonemklassifikation benutzten Support Vektor Klassifizierer insofern ein, als dass deren (reproduzierender) Kern gewonnen wird aus besagter Transformation. Die dafür theoretischen Voraussetzungen werden hergeleitet und die notwendigen Eigenschaften des neuen reproduzierenden Kernes wird bewiesen. Einhergehend mit diesem speziellen Kern wird eine Familie aus Klassifizierern eingeführt, deren Struktur, den Features folgend, direkt an das Tristatemodel angelehnt und ebenfalls von der Korrelation beeinflusst ist. In ihrer Gesamtheit zielen die Konzepte darauf ab, die stationaritären Phasen als auch Transitionen zwischen verschiedenen Sprachsegmenten adäquater zu modellieren als bisherige Verfahren. Die Verbesserung der Erkennungsrate im Vergleich zum Standardansatz wird anschließend anhand von vergleichenden Experimenten gezeigt, und im weiteren Verlauf wird das Verfahren eingebunden in ein allgemeines automatisches Spracherkennungssystem und auf diesem ausgewertet. Vergleichende Experimente mit Standardverfahren demonstrieren dabei das Potential des neuen Ansatzes, und Vorschläge zu Verbesserungen und Weiterentwicklungen schließen die Arbeit ab.The foremost aim of this thesis is to introduce concepts targeting at improving both phoneme classification and in line with this automatic speech recognition. The most distinctive part of the herein presented, new approach is that the different stages of the analysis, from feature vector creation to classification, are all developed upon the common basis. This foundation becomes apparent by the interaction of correlation and the formal structure of a tristate phoneme model that manifests itself in short time weak stationary characteristic and transitions between such segments within phonemes. The tristate layout is a topology that partitions a phoneme, or more generally an observed frame, into three main sections, start, middle and end. In combination with the well known Hidden Markov Model (HMM) it targets at modeling the above mentioned states of transitions and stationarity. On the base of weak stationarity and the tristate structure, our approach evolves as follows. A stochastic process such as a speech signal that is short time weak stationary has first and second order moments independent of time t, they are affected only by the timespan between observations. This effect is reflected by the (auto)covariance of the process and carries over to (auto)correlation and to some degree to cross correlation. In this light, based on common MFCC feature vectors, we first analyze potential improvements when using autocorrelation data and due to motivating results introduce both new MFCC autocorrelation- and later specific cross correlation features. In this context we note that, in contrast to different components (roughly representing the different frequency bands) of a single MFCC vector, identical components across different MFCC vectors in general are not decorrelated. In a subsequent step, the cross correlation transform is integrated into support vector classifiers used for phoneme classification such that a specialized reproducing kernel utilized by the classifiers is deduced directly from the transform. The theoretical prerequisites for the new kernel to be established are derived and proven along with its necessary requirements. Concerning the support vector machines, in line with the new reproducing kernel a family of classifiers is introduced. The structure of the latter evolves around immanent aspects inherited from concepts of phoneme representation and their acoustic progression: The above mentioned tristate model. Based on the topology of the latter and the construction of the features, a specifically structured collection of classes and associated support vector classifiers is designed under additional integration of correlation. All this aims at developing a framework that represents and models both stationarity and transitions within acoustical events to a degree not achieved by recognition and classification systems hitherto. To prove the success of this approach, experiments are conducted to demonstrate the improved recognition rates resulting from the new topology. Further on, the framework is integrated into a common automatic speech recognition system and evaluated in this context. Again, experiments that compare the new approach to a standard recognition system reveal its potentials. Finally, prospects and suggestions for further potential improvements seclude the thesis

    Biological and biomimetic machine learning for automatic classification of human gait

    Get PDF
    Machine learning (ML) research has benefited from a deep understanding of biological mechanisms that have evolved to perform comparable tasks. Recent successes of ML models, superseding human performance in human perception based tasks has garnered interest in improving them further. However, the approach to improving ML models tends to be unstructured, particularly for the models that aim to mimic biology. This thesis proposes and applies a bidirectional learning paradigm to streamline the process of improving ML models’ performance in classification of a task, which humans are already adept at. The approach is validated taking human gait classification as the exemplar task. This paradigm possesses the additional benefit of investigating underlying mechanisms in human perception (HP) using the ML models. Assessment of several biomimetic (BM) and non-biomimetic (NBM) machine learning models on an intrinsic feature of gait, namely the gender of the walker, establishes a functional overlap in the perception of gait between HP and BM, selecting the Long-Short-Term-Memory (LSTM) architecture as the BM of choice for this study, when compared with other models such as support vector machines, decision trees and multi-layer perceptron models. Psychophysics and computational experiments are conducted to understand the overlap between human and machine models. The BM and HP derived from psychophysics experiments, share qualitatively similar profiles of gender classification accuracy across varying stimulus exposure durations. They also share the preference for motion-based cues over structural cues (BM=H>NBM). Further evaluation reveals a human-like expression of the inversion effect, a well-studied cognitive bias in HP that reduces the gender classification accuracy to 37% (p<0.05, chance at 50%) when exposed to inverted stimulus. Its expression in the BM supports the argument for learned rather than hard-wired mechanisms in HP. Particularly given the emergence of the effect in every BM, after training multiple randomly initialised BM models without prior anthropomorphic expectations of gait. The above aspects of HP, namely the preference for motion cues over structural cues and the lack of prior anthropomorphic expectations, were selected to improve BM performance. Representing gait explicitly as motion-based cues of a non-anthropomorphic, gender-neutral skeleton not only mitigates the inversion effect in BM, but also improves significantly the classification accuracy. In the case of gender classification of upright stimuli, mean accuracy improved by 6%, from 76% to 82% (F1,18 = 16, p<0.05). For inverted stimuli, mean accuracy improved by 45%, from 37% to 82% (F1,18 = 20, p<0.05). The model was further tested on a more challenging, extrinsic feature task; the classification of the emotional state of a walker. Emotions were visually induced in subjects through exposure to emotive or neutral images from the International Affective Picture System (IAPS) database. The classification accuracy of the BM was significantly above chance at 43% accuracy (p<0.05, chance at 33.3%). However, application of the proposed paradigm in further binary emotive state classification experiments, improved mean accuracy further by 23%, from 43% to 65% (F1,18 = 7.4, p<0.05) for the positive vs. neutral task. Results validate the proposed paradigm of concurrent bidirectional investigation of HP and BM for the classification of human gait, suggesting future applications for automating perceptual tasks for which the human brain and body has evolved
    corecore