202 research outputs found

    NEURAL NETWORKS FOR TEXTUAL EMOTION RECOGNITION AND ANALYSIS

    Get PDF

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert

    WiFi-Based Human Activity Recognition Using Attention-Based BiLSTM

    Get PDF
    Recently, significant efforts have been made to explore human activity recognition (HAR) techniques that use information gathered by existing indoor wireless infrastructures through WiFi signals without demanding the monitored subject to carry a dedicated device. The key intuition is that different activities introduce different multi-paths in WiFi signals and generate different patterns in the time series of channel state information (CSI). In this paper, we propose and evaluate a full pipeline for a CSI-based human activity recognition framework for 12 activities in three different spatial environments using two deep learning models: ABiLSTM and CNN-ABiLSTM. Evaluation experiments have demonstrated that the proposed models outperform state-of-the-art models. Also, the experiments show that the proposed models can be applied to other environments with different configurations, albeit with some caveats. The proposed ABiLSTM model achieves an overall accuracy of 94.03%, 91.96%, and 92.59% across the 3 target environments. While the proposed CNN-ABiLSTM model reaches an accuracy of 98.54%, 94.25% and 95.09% across those same environments

    Explaining Deep Learning Models for Tabular Data Using Layer-Wise Relevance Propagation

    Get PDF
    Trust and credibility in machine learning models are bolstered by the ability of a model to explain its decisions. While explainability of deep learning models is a well-known challenge, a further challenge is clarity of the explanation itself for relevant stakeholders of the model. Layer-wise Relevance Propagation (LRP), an established explainability technique developed for deep models in computer vision, provides intuitive human-readable heat maps of input images. We present the novel application of LRP with tabular datasets containing mixed data (categorical and numerical) using a deep neural network (1D-CNN), for Credit Card Fraud detection and Telecom Customer Churn prediction use cases. We show how LRP is more effective than traditional explainability concepts of Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) for explainability. This effectiveness is both local to a sample level and holistic over the whole testing set. We also discuss the significant computational time advantage of LRP (1–2 s) over LIME (22 s) and SHAP (108 s) on the same laptop, and thus its potential for real time application scenarios. In addition, our validation of LRP has highlighted features for enhancing model performance, thus opening up a new area of research of using XAI as an approach for feature subset selection

    Multimodal sentiment analysis in real-life videos

    Get PDF
    This thesis extends the emerging field of multimodal sentiment analysis of real-life videos, taking two components into consideration: the emotion and the emotion's target. The emotion component of media is traditionally represented as a segment-based intensity model of emotion classes. This representation is replaced here by a value- and time-continuous view. Adjacent research fields, such as affective computing, have largely neglected the linguistic information available from automatic transcripts of audio-video material. As is demonstrated here, this text modality is well-suited for time- and value-continuous prediction. Moreover, source-specific problems, such as trustworthiness, have been largely unexplored so far. This work examines perceived trustworthiness of the source, and its quantification, in user-generated video data and presents a possible modelling path. Furthermore, the transfer between the continuous and discrete emotion representations is explored in order to summarise the emotional context at a segment level. The other component deals with the target of the emotion, for example, the topic the speaker is addressing. Emotion targets in a video dataset can, as is shown here, be coherently extracted based on automatic transcripts without limiting a priori parameters, such as the expected number of targets. Furthermore, alternatives to purely linguistic investigation in predicting targets, such as knowledge-bases and multimodal systems, are investigated. A new dataset is designed for this investigation, and, in conjunction with proposed novel deep neural networks, extensive experiments are conducted to explore the components described above. The developed systems show robust prediction results and demonstrate strengths of the respective modalities, feature sets, and modelling techniques. Finally, foundations are laid for cross-modal information prediction systems with applications to the correction of corrupted in-the-wild signals from real-life videos

    Sentiment Analysis for micro-blogging platforms in Arabic

    Get PDF
    Sentiment Analysis (SA) concerns the automatic extraction and classification of sentiments conveyed in a given text, i.e. labelling a text instance as positive, negative or neutral. SA research has attracted increasing interest in the past few years due to its numerous real-world applications. The recent interest in SA is also fuelled by the growing popularity of social media platforms (e.g. Twitter), as they provide large amounts of freely available and highly subjective content that can be readily crawled. Most previous SA work has focused on English with considerable success. In this work, we focus on studying SA in Arabic, as a less-resourced language. This work reports on a wide set of investigations for SA in Arabic tweets, systematically comparing three existing approaches that have been shown successful in English. Specifically, we report experiments evaluating fully-supervised-based (SL), distantsupervision- based (DS), and machine-translation-based (MT) approaches for SA. The investigations cover training SA models on manually-labelled (i.e. in SL methods) and automatically-labelled (i.e. in DS methods) data-sets. In addition, we explored an MT-based approach that utilises existing off-the-shelf SA systems for English with no need for training data, assessing the impact of translation errors on the performance of SA models, which has not been previously addressed for Arabic tweets. Unlike previous work, we benchmark the trained models against an independent test-set of >3.5k instances collected at different points in time to account for topic-shifts issues in the Twitter stream. Despite the challenging noisy medium of Twitter and the mixture use of Dialectal and Standard forms of Arabic, we show that our SA systems are able to attain performance scores on Arabic tweets that are comparable to the state-of-the-art SA systems for English tweets. The thesis also investigates the role of a wide set of features, including syntactic, semantic, morphological, language-style and Twitter-specific features. We introduce a set of affective-cues/social-signals features that capture information about the presence of contextual cues (e.g. prayers, laughter, etc.) to correlate them with the sentiment conveyed in an instance. Our investigations reveal a generally positive impact for utilising these features for SA in Arabic. Specifically, we show that a rich set of morphological features, which has not been previously used, extracted using a publicly-available morphological analyser for Arabic can significantly improve the performance of SA classifiers. We also demonstrate the usefulness of languageindependent features (e.g. Twitter-specific) for SA. Our feature-sets outperform results reported in previous work on a previously built data-set

    Privileged Learning using Unselected Features

    Get PDF
    This thesis proposes a novel machine learning paradigm called Learning using Unselected Features (LUFe), which front-loads computation to training time in order to improve classifier performance, without additional cost at deployment. This is achieved by repurposing and combining techniques from feature selection and Learning Using Privileged Information (LUPI). Feature selection is a means of reducing model complexity, which enables deployment in devices with limited computational power, but this can waste additional resources which may be available at training time. LUPI is a paradigm that allows extra information about the training data to be harnessed by the learner, but this requires an additional set of highly informative attributes. In the LUFe setting, feature selection is used to partition datasets into primary and secondary subsets, instead of discarding the features which are unselected. Both datasets are then passed to a LUPI algorithm, enabling the secondary feature-set to provide additional guidance at training time only, in place of `privileged' information. Only the selected features are used at train time, maintaining low-cost deployment while exploiting train-time resources. Experimental results on a large number of datasets demonstrate that LUFe facilitates an improvement in classification accuracy over standard feature selection approaches in a majority of cases. This performance boost is consistent across a range of feature selection approaches, and is largest when the SVM+ algorithm is used for implementation. This effect is shown to be partially dependent on the usage of information in the unselected features, as well as resulting from the presence of additional constraints on the function space searched for the model. The enhancement by LUFe is shown to be inversely correlated with the performance of standard feature selection and mediated by a further reduction in model variance, beyond that provided by standard feature selection. Aside from demonstrating the direct practical benefit of LUFe, this work makes the contribution of broadening the scope of applications for the LUPI framework

    Predicting Forex Currency Fluctuations Using a Novel Bio-inspired Modular Neural Network

    Get PDF
    This thesis explores the intricate interplay of rational choice theory (RCT), brain modularity, and artificial neural networks (ANNs) for modelling and forecasting hourly rate fluctuations in the foreign exchange (Forex) market. While RCT traditionally models human decision-making by emphasising self-interest and rational choices, this study extends its scope to encompass emotions, recognising their significant impact on investor decisions. Recent advances in neuro- science, particularly in understanding the cognitive and emotional processes associated with decision-making, have inspired computational methods to emulate these processes. ANNs, in particular, have shown promise in simulating neuroscience findings and translating them into effective models for financial market dynamics. However, their monolithic architectures of ANNs, characterised by fixed struc- tures, pose challenges in adaptability and flexibility when faced with data perturbations, limiting overall performance. To address these limitations, this thesis proposes a Modular Convolutional orthogonal Recurrent Neural Net- work with Monte Carlo dropout-ANN (MCoRNNMCD-ANN) inspired by recent neuroscience findings. A comprehensive literature review contextualises the challenges associated with monolithic architectures, leading to the identification of neural network structures that could enhance predictions of Forex price fluctuations, such as in the most prominently traded currencies, the EUR/GBP pairing. The proposed MCoRNNMCD-ANN is thoroughly evaluated through a detailed comparative analysis against state-of-the-art techniques, such as BiCuDNNL- STM, CNN–LSTM, LSTM–GRU, CLSTM, and ensemble modelling and single- monolithic CNN and RNN models. Results indicate that the MCoRNNMCD- ANN outperforms competitors. For instance, reducing prediction errors in test sets from 19.70% to an impressive 195.51%, measured by objective evaluation metrics like a mean square error. This innovative neurobiologically-inspired model not only capitalises on modularity but also integrates partial transfer learning to improve forecasting ac- curacy in anticipating Forex price fluctuations when less data occurs in the EUR/USD currency pair. The proposed bio-inspired modular approach, incorporating transfer learning in a similar task, brings advantages such as robust forecasts and enhanced generalisation performance, especially valuable in domains where prior knowledge guides modular learning processes. The proposed model presents a promising avenue for advancing predictive modelling in Forex predictions by incorporating transfer learning principles

    Topical subcategory structure in text classification

    Get PDF
    Data sets with rich topical structure are common in many real world text classification tasks. A single data set often contains a wide variety of topics and, in a typical task, documents belonging to each class are dispersed across many of the topics. Often, a complex relationship exists between the topic a document discusses and the class label: positive or negative sentiment is expressed in documents from many different topics, but knowing the topic does not necessarily help in determining the sentiment label. We know from tasks such as Domain Adaptation that sentiment is expressed in different ways under different topics. Topical context can in some cases even reverse the sentiment polarity of words: to be sharp is a good quality for knives but bad for singers. This property can be found in many different document classification tasks. Standard document classification algorithms do not account for or take advantage of topical diversity; instead, classifiers are usually trained with the tacit assumption that topical diversity does not play a role. This thesis is focused on the interplay between the topical structure of corpora, how the target labels in a classification task distribute over the topics and how the topical structure can be utilised in building ensemble models for text classification. We show empirically that a dataset with rich topical structure can be problematic for single classifiers, and we develop two novel ensemble models to address the issues. We focus on two document classification tasks: document level sentiment analysis of product reviews and hierarchical categorisation of news text. For each task we develop a novel ensemble method that utilises topic models to address the shortcomings of traditional text classification algorithms. Our contribution is in showing empirically that the class association of document features is topic dependent. We show that using the topical context of documents for building ensembles is beneficial for some tasks, and present two new ensemble models for document classification. We also provide a fresh viewpoint for reasoning about the relationship of class labels, topical categories and document features
    corecore