21 research outputs found

    RankTrace : relative and unbounded affect annotation

    Get PDF
    How should annotation data be processed so that it can best characterize the ground truth of affect? This paper attempts to address this critical question by testing various methods of processing annotation data on their ability to capture phasic elements of skin conductance. Towards this goal the paper introduces a new affect annotation tool, RankTrace, that allows for the annotation of affect in a continuous yet unbounded fashion. RankTrace is tested on first-person annotation lines (traces) of tension elicited from a horror video game. The key findings of the paper suggest that the relative processing of traces via their mean gradient yields the best and most robust predictors of phasic manifestations of skin conductance.peer-reviewe

    Deep recurrent neural networks with attention mechanisms for respiratory anomaly classification.

    Get PDF
    In recent years, a variety of deep learning techniques and methods have been adopted to provide AI solutions to issues within the medical field, with one specific area being audio-based classification of medical datasets. This research aims to create a novel deep learning architecture for this purpose, with a variety of different layer structures implemented for undertaking audio classification. Specifically, bidirectional Long Short-Term Memory (BiLSTM) and Gated Recurrent Units (GRU) networks in conjunction with an attention mechanism, are implemented in this research for chronic and non-chronic lung disease and COVID-19 diagnosis. We employ two audio datasets, i.e. the Respiratory Sound and the Coswara datasets, to evaluate the proposed model architectures pertaining to lung disease classification. The Respiratory Sound Database contains audio data with respect to lung conditions such as Chronic Obstructive Pulmonary Disease (COPD) and asthma, while the Coswara dataset contains coughing audio samples associated with COVID-19. After a comprehensive evaluation and experimentation process, as the most performant architecture, the proposed attention BiLSTM network (A-BiLSTM) achieves accuracy rates of 96.2% and 96.8% for the Respiratory Sound and the Coswara datasets, respectively. Our research indicates that the implementation of the BiLSTM and attention mechanism was effective in improving performance for undertaking audio classification with respect to various lung condition diagnoses

    A Multi-Population FA for Automatic Facial Emotion Recognition

    Get PDF
    Automatic facial emotion recognition system is popular in various domains such as health care, surveillance and human-robot interaction. In this paper we present a novel multi-population FA for automatic facial emotion recognition. The overall system is equipped with horizontal vertical neighborhood local binary patterns (hvnLBP) for feature extraction, a novel multi-population FA for feature selection and diverse classifiers for emotion recognition. First, we extract features using hvnLBP, which are robust to illumination changes, scaling and rotation variations. Then, a novel FA variant is proposed to further select most important and emotion specific features. These selected features are used as input to the classifier to further classify seven basic emotions. The proposed system is evaluated with multiple facial expression datasets and also compared with other state-of-the-art models

    Failure Mode Identification of Elastomer for Well Completion Systems using Mask R-CNN

    Get PDF

    Mask R-CNN Transfer Learning Variants for Multi-Organ Medical Image Segmentation

    Get PDF
    Medical abdomen image segmentation is a challenging task owing to discernible characteristics of the tumour against other organs. As an effective image segmenter, Mask R-CNN has been employed in many medical imaging applications, e.g. for segmenting nucleus from cytoplasm for leukaemia diagnosis and skin lesion segmentation. Motivated by such existing studies, this research takes advantage of the strengths of Mask R-CNN in leveraging on pre-trained CNN architectures such as ResNet and proposes three variants of Mask R-CNN for multi-organ medical image segmentation. Specifically, we propose three variants of the Mask R-CNN transfer learning model successively, each with a set of configurations modified from the one preceding. To be specific, the three variants are (1) the traditional transfer learning with customized loss functions with comparatively more weightage on the segmentation performance, (2) transfer learning based on Mask R-CNN with deepened re-trained layers instead of only the last two/three layers as in traditional transfer learning, and (3) the fine-tuning of Mask R-CNN with expansion of the Region of Interest pooling sizes. Evaluating using Beyond-the-Cranial-Vault (BTCV) abdominal dataset, a well-established benchmark for multi-organ medical image segmentation, the three proposed variants of Mask R-CNN obtain promising performances. In particular, the empirical results indicate the effectiveness of the proposed adapted loss functions, the deepened transfer learning process, as well as the expansion of the RoI pooling sizes. Such variations account for the great efficiency of the proposed transfer learning variant schemes for undertaking multi-organ image segmentation tasks

    A Deep Learning Based Wearable Healthcare Iot Device for AI-Enabled Hearing Assistance Automation

    Get PDF
    With the recent booming of artificial intelligence (AI), particularly deep learning techniques, digital healthcare is one of the prevalent areas that could gain benefits from AI-enabled functionality. This research presents a novel AI-enabled Internet of Things (IoT) device operating from the ESP-8266 platform capable of assisting those who suffer from impairment of hearing or deafness to communicate with others in conversations. In the proposed solution, a server application is created that leverages Google's online speech recognition service to convert the received conversations into texts, then deployed to a micro-display attached to the glasses to display the conversation contents to deaf people, to enable and assist conversation as normal with the general population. Furthermore, in order to raise alert of traffic or dangerous scenarios, an 'urban-emergency' classifier is developed using a deep learning model, Inception-v4, with transfer learning to detect/recognize alerting/alarming sounds, such as a horn sound or a fire alarm, with texts generated to alert the prospective user. The training of Inception-v4 was carried out on a consumer desktop PC and then implemented into the AI-based IoT application. The empirical results indicate that the developed prototype system achieves an accuracy rate of 92% for sound recognition and classification with real-time performance

    Intelligent facial emotion recognition using moth-firefly optimization

    Get PDF
    In this research, we propose a facial expression recognition system with a variant of evolutionary firefly algorithm for feature optimization. First of all, a modified Local Binary Pattern descriptor is proposed to produce an initial discriminative face representation. A variant of the firefly algorithm is proposed to perform feature optimization. The proposed evolutionary firefly algorithm exploits the spiral search behaviour of moths and attractiveness search actions of fireflies to mitigate premature convergence of the Levy-flight firefly algorithm (LFA) and the moth-flame optimization (MFO) algorithm. Specifically, it employs the logarithmic spiral search capability of the moths to increase local exploitation of the fireflies, whereas in comparison with the flames in MFO, the fireflies not only represent the best solutions identified by the moths but also act as the search agents guided by the attractiveness function to increase global exploration. Simulated Annealing embedded with Levy flights is also used to increase exploitation of the most promising solution. Diverse single and ensemble classifiers are implemented for the recognition of seven expressions. Evaluated with frontal-view images extracted from CK+, JAFFE, and MMI, and 45-degree multi-view and 90-degree side-view images from BU-3DFE and MMI, respectively, our system achieves a superior performance, and outperforms other state-of-the-art feature optimization methods and related facial expression recognition models by a significant margin

    Intelligent affect regression for bodily expressions using hybrid particle swarm optimization and adaptive ensembles

    No full text
    his research focuses on continuous dimensional affect recognition from bodily expressions using feature optimization and adaptive regression. Both static posture and dynamic motion bodily features are extracted in this research. A hybrid particle swarm optimization (PSO) algorithm is proposed for feature selection, which overcomes premature convergence and local optimum trap encountered by conventional PSO. It integrates diverse jump-out mechanisms such as the genetic algorithm (GA) and mutation techniques of Gaussian, Cauchy and Levy distributions to balance well between convergence speed and swarm diversity, thus called GM-PSO. The proposed PSO variant employs the subswarm concept and a cooperative strategy to enable mutation mechanisms of each subswarm, i.e. the GA and the probability distributions, to work in a collaborative manner to enhance the exploration and exploitation capability of the swarm leader, sustain the population diversity and guide the search toward an ultimate global optimum. An adaptive ensemble regression model is subsequently proposed to robustly map subjects’ emotional states onto a continuous arousal–valence affective space using the identified optimized feature subsets. This regression model also shows great adaption to newly arrived bodily expression patterns to deal with data stream regression. Empirical findings indicate that the proposed hybrid PSO optimization algorithm outperforms other state-of-the-art PSO variants, conventional PSO and classic GA significantly in terms of catching global optimum and discriminative feature selection. The system achieves the best performance for the regression of arousal and valence when ensemble regression model is applied, in terms of both mean squared error (arousal: 0.054, valence: 0.08) and Pearson correlation coefficient (arousal: 0.97, valence: 0.91) and outperforms other state-of-the-art PSO-based optimization combined with ensemble regression and related bodily expression perception research by a significant margin
    corecore