Deep neural networks and data augmentationfor semantic labelling in a dialogue corpus

Abstract

El presente proyecto estudia y aplica técnicas de Deep Neural Networks y Data Augmentation para el etiquetado semántico en un corpus de diálogo, todo ello en el ámbito del Sentiment Analysis. El objetivo principal es abordar un problema de clasificación de temas utilizando arquitecturas basadas tanto en Convolutional Neural Networks (CNN) como en Recurrent Neural Networks (RNN). Cabe resaltar la comparación del rendimiento de cada modelo proporcionada por el proyecto. Como parte del proyecto se han desarrollado igualmente las herramientas de optimización de hiperparámetros necesarias para obtener unos resultados satisfactorios. Todo ello para clasificar los datos del conjunto de datos del proyecto Europeo EMPHATIC. Más información sobre el proyecto EMPHATIC en www.empathic-project.eu. La memoria del proyecto está realizada en Inglés.Sentiment analysis, also known as opinion mining, refers to the use of Natural LanguageProcessing (NLP), among other techniques, in order to extract and analyze subjective in-formation from text, such as emotions or the topic of a text. These techniques are normallyapplied to reviews or data from social media but, in this project, we will apply these tech-niques to the analysis of coaching dialogues involving senior adults. These dialogues havebeen collected as part of the EMPATHIC project.EMPATHIC is an European project whose goal is to implement a virtual agent designedto help elderly to live a healthy and independent life as they age [1][2]. Within this imple-mentation, a Natural-language Understanding (NLU) component plays the role of clas-sifying the utterance (spoken words) of the user into semantic components. This is amachine learning classification problem where there are multiple classes and a model hasto be taught to classify the text into these classes.Currently, the NLU model implementation is based on seq2seq models (a variant of Re-current Neural Network (RNN) networks). However, convolutional neural networks havebeen also proposed for text classification in different contexts [3][4][5].The main objective of this project will be to address a topic classification problem usingConvolutional Neural Network (CNN) based architectures in order to classify the datafrom the Empathic project’s dataset. Besides that, we will also propose and test a numberof architectures based on RNN in order to provide some comparison of the performancefrom each model

    Similar works