943 research outputs found

    Affective social anthropomorphic intelligent system

    Full text link
    Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness.Comment: Multimedia Tools and Applications (2023

    Solving the imbalanced data issue: automatic urgency detection for instructor assistance in MOOC discussion forums

    Get PDF
    In MOOCs, identifying urgent comments on discussion forums is an ongoing challenge. Whilst urgent comments require immediate reactions from instructors, to improve interaction with their learners, and potentially reducing drop-out rates—the task is difficult, as truly urgent comments are rare. From a data analytics perspective, this represents a highly unbalanced (sparse) dataset. Here, we aim to automate the urgent comments identification process, based on fine-grained learner modelling—to be used for automatic recommendations to instructors. To showcase and compare these models, we apply them to the first gold standard dataset for Urgent iNstructor InTErvention (UNITE), which we created by labelling FutureLearn MOOC data. We implement both benchmark shallow classifiers and deep learning. Importantly, we not only compare, for the first time for the unbalanced problem, several data balancing techniques, comprising text augmentation, text augmentation with undersampling, and undersampling, but also propose several new pipelines for combining different augmenters for text augmentation. Results show that models with undersampling can predict most urgent cases; and 3X augmentation + undersampling usually attains the best performance. We additionally validate the best models via a generic benchmark dataset (Stanford). As a case study, we showcase how the naïve Bayes with count vector can adaptively support instructors in answering learner questions/comments, potentially saving time or increasing efficiency in supporting learners. Finally, we show that the errors from the classifier mirrors the disagreements between annotators. Thus, our proposed algorithms perform at least as well as a ‘super-diligent’ human instructor (with the time to consider all comments)

    Emotion recognition: recognition of emotions through voice

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringAs the years go by, the interaction between humans and machines seems to gain more and more importance for many different reasons, whether it's taken into consideration personal or commercial use. On a time where technology is reaching many parts of our lives, it's important to keep thriving for a healthy progress and help not only to improve but also to maintain the benefits that everyone gets from it. This relationship can be tackled through many points, but here the focus will be on the mind. Emotions are still a mystery. The concept itself brings up serious questions because of its complex nature. Till the date, scientists still struggle to understand it, so it's crucial to pave the right path for the growth on technology on the aid of such topic. There is some consensus on a few indicators that provide important insights on mental state, like words used, facial expressions, voice. The context of this work is on the use of voice and, based on the field of Automatic Speech Emotion Recognition, it is proposed a full pipeline of work with a wide scope by resorting to sound capture and signal processing software, to learning and classifying through algorithms belonging on the Semi Supervised Learning paradigm and visualization techniques for interpretation of results. For the classification of the samples,using a semi-supervised approach with Neural Networks represents an important setting to try alleviating the dependency of human labelling of emotions, a task that has proven to be challenging and, in many cases, highly subjective, not to mention expensive. It is intended to rely mostly on empiric results more than theoretical concepts due to the complexity of the human emotions concept and its inherent uncertainty, but never to disregard prior knowledge on the matter.À medida que os anos passam, a interacção entre indivíduos e máquinas tem vindo a ganhar maior importância por várias razões, quer seja para uso pessoal ou comercial. Numa altura onde a tecnologia está a chegar a várias partes das nossas vidas, é importante continuar a perseguir um progresso saudável e ajudar não só a melhorar mas também manter os benefícios que todos recebem. Esta relação pode ser abordada por vários pontos, neste trabalho o foco está na mente. Emoções são um mistério. O próprio conceito levanta questões sobre a sua natureza complexa. Até aos dias de hoje, muitos cientistas debatem-se para a compreender, e é crucial que um caminho apropriado seja criado para o crescimento de tecnologia na ajuda da compreensão deste assunto. Existe algum consenso sobre indicadores que demonstram pistas importantes sobre o estado mental de um sujeito, como palavras, expressões faciais, voz. O conteúdo deste trabalho foca-se na voz e, com base no campo de Automatic Speech Emotion Recognition, é proposto uma sequência de procedimentos diversificados, ao optar por software de captura de som e processamento de sinais, aprendizagem e classificação através de algoritmos de Aprendizagem Semi Supervisionada e técnicas de visualização para interpretar resultados. Para a classificação de amostras, o uso de uma abordagem Semi Supervisionada com redes neuronais representam um procedimentos importante para tentar combater a alta dependência da anotação de amostras de emoções humanas, uma tarefa que se demonstra ser árdua e, em muitos casos, altamente subjectiva, para não dizer cara. A intenção é estabelecer raciocínios baseados em factores experimentais, mais que teóricos, devido à complexidade do conceito de emoções humanas e à sua incerteza associada, mas tendo sempre em conta conhecimento já estabelecido no assunto

    Alzheimer Disease Detection Techniques and Methods: A Review

    Get PDF
    Brain pathological changes linked with Alzheimer's disease (AD) can be measured with Neuroimaging. In the past few years, these measures are rapidly integrated into the signatures of Alzheimer disease (AD) with the help of classification frameworks which are offering tools for diagnosis and prognosis. Here is the review study of Alzheimer's disease based on Neuroimaging and cognitive impairment classification. This work is a systematic review for the published work in the field of AD especially the computer-aided diagnosis. The imaging modalities include 1) Magnetic resonance imaging (MRI) 2) Functional MRI (fMRI) 3) Diffusion tensor imaging 4) Positron emission tomography (PET) and 5) amyloid-PET. The study revealed that the classification criterion based on the features shows promising results to diagnose the disease and helps in clinical progression. The most widely used machine learning classifiers for AD diagnosis include Support Vector Machine, Bayesian Classifiers, Linear Discriminant Analysis, and K-Nearest Neighbor along with Deep learning. The study revealed that the deep learning techniques and support vector machine give higher accuracies in the identification of Alzheimer’s disease. The possible challenges along with future directions are also discussed in the paper

    Emotion-aware cross-modal domain adaptation in video sequences

    Get PDF
    corecore