14 research outputs found

    Automatic Speech Emotion Recognition- Feature Space Dimensionality and Classification Challenges

    Get PDF
    In the last decade, research in Speech Emotion Recognition (SER) has become a major endeavour in Human Computer Interaction (HCI), and speech processing. Accurate SER is essential for many applications, like assessing customer satisfaction with quality of services, and detecting/assessing emotional state of children in care. The large number of studies published on SER reflects the demand for its use. The main concern of this thesis is the investigation of SER from a pattern recognition and machine learning points of view. In particular, we aim to identify appropriate mathematical models of SER and examine the process of designing automatic emotion recognition schemes. There are major challenges to automatic SER including ambiguity about the list/definition of emotions, the lack of agreement on a manageable set of uncorrelated speech-based emotion relevant features, and the difficulty of collected emotion-related datasets under natural circumstances. We shall initiate our work by dealing with the identification of appropriate sets of emotion related features/attributes extractible from speech signals as considered from psychological and computational points of views. We shall investigate the use of pattern-recognition approaches to remove redundancies and achieve compactification of digital representation of the extracted data with minimal loss of information. The thesis will include the design of new or complement existing SER schemes and conduct large sets of experiments to empirically test their performances on different databases, identify advantages, and shortcomings of using speech alone for emotion recognition. Existing SER studies seem to deal with the ambiguity/dis-agreement on a “limited” number of emotion-related features by expanding the list from the same speech signal source/sites and apply various feature selection procedures as a mean of reducing redundancies. Attempts are made to discover more relevant features to emotion from speech. One of our investigations focuses on proposing a newly sets of features for SER, extracted from Linear Predictive (LP)-residual speech. We shall demonstrate the usefulness of the proposed relatively small set of features by testing the performance of an SER scheme that is based on fusing our set of features with the existing set of thousands of features using common machine learning schemes of Support Vector Machine (SVM) and Artificial Neural Network (ANN). The challenge of growing dimensionality of SER feature space and its impact on increased model complexity is another major focus of our research project. By studying the pros and cons of the commonly used feature selection approaches, we argued in favour of meta-feature selection and developed various methods in this direction, not only to reduce dimension, but also to adapt and de-correlate emotional feature spaces for improved SER model recognition accuracy. We used rincipal Component Analysis (PCA) and proposed Data Independent PCA (DIPCA) by training on independent emotional and non-emotional datasets. The DIPCA projections, especially when extracted from speech data coloured with different emotions or from Neutral speech data, had comparable capability to the PCA in terms of SER performance. Another adopted approach in this thesis for dimension reduction is the Random Projection (RP) matrices, independent of training data. We have shown that some versions of RP with SVM classifier can offer an adaptation space for Speaker Independent SER that avoid over-fitting and hence improves recognition accuracy. Using PCA trained on a set of data, while testing on emotional data features, has significant implication for machine learning in general. The thesis other major contribution focuses on the classification aspects of SER. We investigate the drawbacks of the well-known SVM classifier when applied to a preprocessed data by PCA and RP. We shall demonstrate the advantages of using the Linear Discriminant Classifier (LDC) instead especially for PCA de-correlated metafeatures. We initiated a variety of LDC-based ensembles classification, to test performance of scheme using a new form of bagging different subsets of metafeature subsets extracted by PCA with encouraging results. The experiments conducted were applied on two benchmark datasets (Emo-Berlin and FAU-Aibo), and an in-house dataset in the Kurdish language. Recognition accuracy achieved by are significantly higher than the state of art results on all datasets. The results, however, revealed a difficult challenge in the form of persisting wide gap in accuracy over different datasets, which cannot be explained entirely by the differences between the natures of the datasets. We conducted various pilot studies that were based on various visualizations of the confusion matrices for the “difficult” databases to build multi-level SER schemes. These studies provide initial evidences to the presence of more than one “emotion” in the same portion of speech. A possible solution may be through presenting recognition accuracy in a score-based measurement like the spider chart. Such an approach may also reveal the presence of Doddington zoo phenomena in SER

    Efficient algorithms and data structures for compressive sensing

    Get PDF
    Wegen der kontinuierlich anwachsenden Anzahl von Sensoren, und den stetig wachsenden Datenmengen, die jene produzieren, stößt die konventielle Art Signale zu verarbeiten, beruhend auf dem Nyquist-Kriterium, auf immer mehr Hindernisse und Probleme. Die kürzlich entwickelte Theorie des Compressive Sensing (CS) formuliert das Versprechen einige dieser Hindernisse zu beseitigen, indem hier allgemeinere Signalaufnahme und -rekonstruktionsverfahren zum Einsatz kommen können. Dies erlaubt, dass hierbei einzelne Abtastwerte komplexer strukturierte Informationen über das Signal enthalten können als dies bei konventiellem Nyquistsampling der Fall ist. Gleichzeitig verändert sich die Signalrekonstruktion notwendigerweise zu einem nicht-linearen Vorgang und ebenso müssen viele Hardwarekonzepte für praktische Anwendungen neu überdacht werden. Das heißt, dass man zwischen der Menge an Information, die man über Signale gewinnen kann, und dem Aufwand für das Design und Betreiben eines Signalverarbeitungssystems abwägen kann und muss. Die hier vorgestellte Arbeit trägt dazu bei, dass bei diesem Abwägen CS mehr begünstigt werden kann, indem neue Resultate vorgestellt werden, die es erlauben, dass CS einfacher in der Praxis Anwendung finden kann, wobei die zu erwartende Leistungsfähigkeit des Systems theoretisch fundiert ist. Beispielsweise spielt das Konzept der Sparsity eine zentrale Rolle, weshalb diese Arbeit eine Methode präsentiert, womit der Grad der Sparsity eines Vektors mittels einer einzelnen Beobachtung geschätzt werden kann. Wir zeigen auf, dass dieser Ansatz für Sparsity Order Estimation zu einem niedrigeren Rekonstruktionsfehler führt, wenn man diesen mit einer Rekonstruktion vergleicht, welcher die Sparsity des Vektors unbekannt ist. Um die Modellierung von Signalen und deren Rekonstruktion effizienter zu gestalten, stellen wir das Konzept von der matrixfreien Darstellung linearer Operatoren vor. Für die einfachere Anwendung dieser Darstellung präsentieren wir eine freie Softwarearchitektur und demonstrieren deren Vorzüge, wenn sie für die Rekonstruktion in einem CS-System genutzt wird. Konkret wird der Nutzen dieser Bibliothek, einerseits für das Ermitteln von Defektpositionen in Prüfkörpern mittels Ultraschall, und andererseits für das Schätzen von Streuern in einem Funkkanal aus Ultrabreitbanddaten, demonstriert. Darüber hinaus stellen wir für die Verarbeitung der Ultraschalldaten eine Rekonstruktionspipeline vor, welche Daten verarbeitet, die im Frequenzbereich Unterabtastung erfahren haben. Wir beschreiben effiziente Algorithmen, die bei der Modellierung und der Rekonstruktion zum Einsatz kommen und wir leiten asymptotische Resultate für die benötigte Anzahl von Messwerten, sowie die zu erwartenden Lokalisierungsgenauigkeiten der Defekte her. Wir zeigen auf, dass das vorgestellte System starke Kompression zulässt, ohne die Bildgebung und Defektlokalisierung maßgeblich zu beeinträchtigen. Für die Lokalisierung von Streuern mittels Ultrabreitbandradaren stellen wir ein CS-System vor, welches auf einem Random Demodulators basiert. Im Vergleich zu existierenden Messverfahren ist die hieraus resultierende Schätzung der Kanalimpulsantwort robuster gegen die Effekte von zeitvarianten Funkkanälen. Um den inhärenten Modellfehler, den gitterbasiertes CS begehen muss, zu beseitigen, zeigen wir auf wie Atomic Norm Minimierung es erlaubt ohne die Einschränkung auf ein endliches und diskretes Gitter R-dimensionale spektrale Komponenten aus komprimierten Beobachtungen zu schätzen. Hierzu leiten wir eine R-dimensionale Variante des ADMM her, welcher dazu in der Lage ist die Signalkovarianz in diesem allgemeinen Szenario zu schätzen. Weiterhin zeigen wir, wie dieser Ansatz zur Richtungsschätzung mit realistischen Antennenarraygeometrien genutzt werden kann. In diesem Zusammenhang präsentieren wir auch eine Methode, welche mittels Stochastic gradient descent Messmatrizen ermitteln kann, die sich gut für Parameterschätzung eignen. Die hieraus resultierenden Kompressionsverfahren haben die Eigenschaft, dass die Schätzgenauigkeit über den gesamten Parameterraum ein möglichst uniformes Verhalten zeigt. Zuletzt zeigen wir auf, dass die Kombination des ADMM und des Stochastic Gradient descent das Design eines CS-Systems ermöglicht, welches in diesem gitterfreien Szenario wünschenswerte Eigenschaften hat.Along with the ever increasing number of sensors, which are also generating rapidly growing amounts of data, the traditional paradigm of sampling adhering the Nyquist criterion is facing an equally increasing number of obstacles. The rather recent theory of Compressive Sensing (CS) promises to alleviate some of these drawbacks by proposing to generalize the sampling and reconstruction schemes such that the acquired samples can contain more complex information about the signal than Nyquist samples. The proposed measurement process is more complex and the reconstruction algorithms necessarily need to be nonlinear. Additionally, the hardware design process needs to be revisited as well in order to account for this new acquisition scheme. Hence, one can identify a trade-off between information that is contained in individual samples of a signal and effort during development and operation of the sensing system. This thesis addresses the necessary steps to shift the mentioned trade-off more to the favor of CS. We do so by providing new results that make CS easier to deploy in practice while also maintaining the performance indicated by theoretical results. The sparsity order of a signal plays a central role in any CS system. Hence, we present a method to estimate this crucial quantity prior to recovery from a single snapshot. As we show, this proposed Sparsity Order Estimation method allows to improve the reconstruction error compared to an unguided reconstruction. During the development of the theory we notice that the matrix-free view on the involved linear mappings offers a lot of possibilities to render the reconstruction and modeling stage much more efficient. Hence, we present an open source software architecture to construct these matrix-free representations and showcase its ease of use and performance when used for sparse recovery to detect defects from ultrasound data as well as estimating scatterers in a radio channel using ultra-wideband impulse responses. For the former of these two applications, we present a complete reconstruction pipeline when the ultrasound data is compressed by means of sub-sampling in the frequency domain. Here, we present the algorithms for the forward model, the reconstruction stage and we give asymptotic bounds for the number of measurements and the expected reconstruction error. We show that our proposed system allows significant compression levels without substantially deteriorating the imaging quality. For the second application, we develop a sampling scheme to acquire the channel Impulse Response (IR) based on a Random Demodulator that allows to capture enough information in the recorded samples to reliably estimate the IR when exploiting sparsity. Compared to the state of the art, this in turn allows to improve the robustness to the effects of time-variant radar channels while also outperforming state of the art methods based on Nyquist sampling in terms of reconstruction error. In order to circumvent the inherent model mismatch of early grid-based compressive sensing theory, we make use of the Atomic Norm Minimization framework and show how it can be used for the estimation of the signal covariance with R-dimensional parameters from multiple compressive snapshots. To this end, we derive a variant of the ADMM that can estimate this covariance in a very general setting and we show how to use this for direction finding with realistic antenna geometries. In this context we also present a method based on a Stochastic gradient descent iteration scheme to find compression schemes that are well suited for parameter estimation, since the resulting sub-sampling has a uniform effect on the whole parameter space. Finally, we show numerically that the combination of these two approaches yields a well performing grid-free CS pipeline

    Intelligent Circuits and Systems

    Get PDF
    ICICS-2020 is the third conference initiated by the School of Electronics and Electrical Engineering at Lovely Professional University that explored recent innovations of researchers working for the development of smart and green technologies in the fields of Energy, Electronics, Communications, Computers, and Control. ICICS provides innovators to identify new opportunities for the social and economic benefits of society.  This conference bridges the gap between academics and R&D institutions, social visionaries, and experts from all strata of society to present their ongoing research activities and foster research relations between them. It provides opportunities for the exchange of new ideas, applications, and experiences in the field of smart technologies and finding global partners for future collaboration. The ICICS-2020 was conducted in two broad categories, Intelligent Circuits & Intelligent Systems and Emerging Technologies in Electrical Engineering

    The perceptual flow of phonetic feature processing

    Get PDF

    Cross-spectral synergy and consonant identification (A)

    Get PDF
    corecore