66 research outputs found

    Interpreting Black-Box Models: A Review on Explainable Artificial Intelligence

    Get PDF
    Recent years have seen a tremendous growth in Artificial Intelligence (AI)-based methodological development in a broad range of domains. In this rapidly evolving field, large number of methods are being reported using machine learning (ML) and Deep Learning (DL) models. Majority of these models are inherently complex and lacks explanations of the decision making process causing these models to be termed as 'Black-Box'. One of the major bottlenecks to adopt such models in mission-critical application domains, such as banking, e-commerce, healthcare, and public services and safety, is the difficulty in interpreting them. Due to the rapid proleferation of these AI models, explaining their learning and decision making process are getting harder which require transparency and easy predictability. Aiming to collate the current state-of-the-art in interpreting the black-box models, this study provides a comprehensive analysis of the explainable AI (XAI) models. To reduce false negative and false positive outcomes of these back-box models, finding flaws in them is still difficult and inefficient. In this paper, the development of XAI is reviewed meticulously through careful selection and analysis of the current state-of-the-art of XAI research. It also provides a comprehensive and in-depth evaluation of the XAI frameworks and their efficacy to serve as a starting point of XAI for applied and theoretical researchers. Towards the end, it highlights emerging and critical issues pertaining to XAI research to showcase major, model-specific trends for better explanation, enhanced transparency, and improved prediction accuracy

    Human Activity Recognition and Fall Detection Using Unobtrusive Technologies

    Full text link
    As the population ages, health issues like injurious falls demand more attention. Wearable devices can be used to detect falls. However, despite their commercial success, most wearable devices are obtrusive, and patients generally do not like or may forget to wear them. In this thesis, a monitoring system consisting of two 24×32 thermal array sensors and a millimetre-wave (mmWave) radar sensor was developed to unobtrusively detect locations and recognise human activities such as sitting, standing, walking, lying, and falling. Data were collected by observing healthy young volunteers simulate ten different scenarios. The optimal installation position of the sensors was initially unknown. Therefore, the sensors were mounted on a side wall, a corner, and on the ceiling of the experimental room to allow performance comparison between these sensor placements. Every thermal frame was converted into an image and a set of features was manually extracted or convolutional neural networks (CNNs) were used to automatically extract features. Applying a CNN model on the infrared stereo dataset to recognise five activities (falling plus lying on the floor, lying in bed, sitting on chair, sitting in bed, standing plus walking), overall average accuracy and F1-score were 97.6%, and 0.935, respectively. The scores for detecting falling plus lying on the floor from the remaining activities were 97.9%, and 0.945, respectively. When using radar technology, the generated point clouds were converted into an occupancy grid and a CNN model was used to automatically extract features, or a set of features was manually extracted. Applying several classifiers on the manually extracted features to detect falling plus lying on the floor from the remaining activities, Random Forest (RF) classifier achieved the best results in overhead position (an accuracy of 92.2%, a recall of 0.881, a precision of 0.805, and an F1-score of 0.841). Additionally, the CNN model achieved the best results (an accuracy of 92.3%, a recall of 0.891, a precision of 0.801, and an F1-score of 0.844), in overhead position and slightly outperformed the RF method. Data fusion was performed at a feature level, combining both infrared and radar technologies, however the benefit was not significant. The proposed system was cost, processing time, and space efficient. The system with further development can be utilised as a real-time fall detection system in aged care facilities or at homes of older people

    Modern automatic recognition technologies for visual communication tools

    Get PDF
    Общение представляет собой широкий спектр различных действий, связанных с приёмом и передачей информации. Процесс общения складывается из вербальных, паравербальных и невербальных компонентов, содержащих информационную часть передаваемого сообщения и его эмоциональную окраску соответственно. Комплексный анализ всех компонентов общения позволяет оценить не только содержательную составляющую, но и ситуативный контекст сказанного, а также выявлять дополнительные факторы, относящиеся к психическому и соматическому состоянию говорящего. Существует несколько методов передачи вербального сообщения, среди которых устная и жестовая речь. Речевые и околоречевые компоненты общения могут содержаться в различных каналах данных, таких как аудио- или видеоканалы. В данном обзоре рассматриваются системы анализа видеоданных ввиду того, что аудиоканал не способен передать ряд околоречевых компонентов общения, вносящих в передаваемое сообщение дополнительную информацию. Проводится анализ существующих баз данных статических и динамических образов и систем, разрабатываемых для распознавания вербальной составляющей в устной и жестовой речи, а также систем, оценивающих паравербальные и невербальные компоненты общения. Обозначены сложности, с которыми сталкиваются разработчики подобных баз данных и систем. Также сформулированы перспективные направления разработок, связанные в том числе с комплексным анализом всех компонентов общения с целью наиболее полной оценки передаваемого сообщения.Работа выполнена при поддержке Госпрограммы 47 ГП «Научно-технологическое развитие Российской Федерации» (2019-2030), тема 0134-2019-0006

    Multimodal Assessment of Cognitive Decline: Applications in Alzheimer’s Disease and Depression

    Get PDF
    The initial diagnosis and assessment of cognitive decline are generally based around the judgement of clinicians, and commonly used semi-structured interviews, guided by pre-determined sets of topics, in a clinical set-up. Publicly available multimodal datasets have provided an opportunity to explore a range of experiments in the automatic detecting of cognitive decline. Drawing on the latest developments in representation learning, machine learning, and natural language processing, we seek to develop models capable of identifying cognitive decline with an eye to discovering the differences and commonalities that should be considered in computational treatment of mental health disorders. We present models that learn the indicators of cognitive decline from audio and visual modalities as well as lexical, syntactic, disfluency and pause information. Our study is carried out in two parts: moderation analysis and predictive modelling. We do some experiments with different fusion techniques. Our approaches are motivated by some of the recent efforts in multimodal fusion for classifying cognitive states to capture the interaction between modalities and maximise the use and combination of each modality. We create tools for detecting cognitive decline and use them to analyze three major datasets containing speech produced by people with and without cognitive decline. These findings are being used to develop multimodal models for the detection of depression and Alzheimer’s dementia

    Comprehensive Study of Automatic Speech Emotion Recognition Systems

    Get PDF
    Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition

    LIPIcs, Volume 277, GIScience 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 277, GIScience 2023, Complete Volum

    12th International Conference on Geographic Information Science: GIScience 2023, September 12–15, 2023, Leeds, UK

    Get PDF
    No abstract available

    Analysis and automatic identification of spontaneous emotions in speech from human-human and human-machine communication

    Get PDF
    383 p.This research mainly focuses on improving our understanding of human-human and human-machineinteractions by analysing paricipants¿ emotional status. For this purpose, we have developed andenhanced Speech Emotion Recognition (SER) systems for both interactions in real-life scenarios,explicitly emphasising the Spanish language. In this framework, we have conducted an in-depth analysisof how humans express emotions using speech when communicating with other persons or machines inactual situations. Thus, we have analysed and studied the way in which emotional information isexpressed in a variety of true-to-life environments, which is a crucial aspect for the development of SERsystems. This study aimed to comprehensively understand the challenge we wanted to address:identifying emotional information on speech using machine learning technologies. Neural networks havebeen demonstrated to be adequate tools for identifying events in speech and language. Most of themaimed to make local comparisons between some specific aspects; thus, the experimental conditions weretailored to each particular analysis. The experiments across different articles (from P1 to P19) are hardlycomparable due to our continuous learning of dealing with the difficult task of identifying emotions inspeech. In order to make a fair comparison, additional unpublished results are presented in the Appendix.These experiments were carried out under identical and rigorous conditions. This general comparisonoffers an overview of the advantages and disadvantages of the different methodologies for the automaticrecognition of emotions in speech

    Multi-Label ECG Classification using Temporal Convolutional Neural Network

    Full text link
    Automated analysis of 12-lead electrocardiogram (ECG) plays a crucial role in the early screening and management of cardiovascular diseases (CVDs). In practice, it is common to see multiple co-occurring cardiac disorders, i.e., multi-label or multimorbidity in patients with CVDs, which increases the risk for mortality. Most current research focuses on the single-label ECG classification, i.e., each ECG record corresponds to one cardiac disorder, ignoring ECG records with multi-label phenomenon. In this paper, we propose an ensemble of attention-based temporal convolutional neural network (ATCNN) models for the multi-label classification of 12-lead ECG records. Specifically, a set of ATCNN-based single-lead binary classifiers are trained one for each cardiac disorder, and the predictions from these classifiers with simple thresholding generate the final multi-label decisions. The ATCNN contains a stack of TCNN layers followed by temporal and spatial attention layers. The TCNN layers operate at different dilation rates to represent the multi-scaled pathological ECG features dynamics, and attention layers emphasize/reduce the diagnostically relevant/redundant 12-lead ECG information. The proposed framework is evaluated on the PTBXL-2020 dataset and achieved an average F1-score of 76.51%. Moreover, the spatial and temporal attention weights visualization provides the optimal ECG-lead subset selection for each disease and model interpretability results, respectively. The improved performance and interpretability of the proposed approach demonstrate its ability to screen multimorbidity patients and help clinicians initiate timely treatment.Comment: Under review for publication in the IEEE Journal (8 pages, 6 figures
    corecore