548 research outputs found

    Pragmatics and Prosody

    Get PDF
    Most of the papers collected in this book resulted from presentations and discussions undertaken during the V Lablita Workshop that took place at the Federal University of Minas Gerais, Brazil, on August 23-25, 2011. The workshop was held in conjunction with the II Brazilian Seminar on Pragmatics and Prosody. The guiding themes for the joint event were illocution, modality, attitude, information patterning and speech annotation. Thus, all papers presented here are concerned with theoretical and methodological issues related to the study of speech. Among the papers in this volume, there are different theoretical orientations, which are mirrored through the methodological designs of studies pursued. However, all papers are based on the analysis of actual speech, be it from corpora or from experimental contexts trying to emulate natural speech. Prosody is the keyword that comes out from all the papers in this publication, which indicates the high standing of this category in relation to studies that are geared towards the understanding of major elements that are constitutive of the structuring of speech

    SemClinBr -- a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

    Full text link
    The high volume of research focusing on extracting patient's information from electronic health records (EHR) has led to an increase in the demand for annotated corpora, which are a very valuable resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multi-purpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. In this study, we developed a semantically annotated corpus using clinical texts from multiple medical specialties, document types, and institutions. We present the following: (1) a survey listing common aspects and lessons learned from previous research, (2) a fine-grained annotation schema which could be replicated and guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. The result of this work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations, and can support a variety of clinical NLP tasks and boost the EHR's secondary use for the Portuguese language

    Recognizing Emotions in Short Texts

    Get PDF
    Tese de mestrado, Ciência Cognitiva, Universidade de Lisboa, Faculdade de Ciências, 2022O reconhecimento automático de emoções em texto é uma tarefa que mobiliza as áreas de processamento de linguagem natural e de computação afetiva, para as quais se pode contar com o especial contributo de disciplinas da Ciência Cognitiva como Inteligência Artificial e Ciência da Computação, Linguística e Psicologia. Visa, sobretudo, a deteção e interpretação de emoções humanas através da sua expressão na forma escrita por sistemas computacionais. A interação entre processos afetivos e cognitivos, o papel essencial que as emoções desempenham nas interações interpessoais e a crescente utilização de comunicação escrita online nos dias de hoje fazem com que o reconhecimento de emoções de forma automática seja cada vez mais importante, nomeadamente em áreas como saúde mental, interação pessoa-computador, ciência política ou marketing. A língua inglesa tem sido o maior alvo de estudo no que diz respeito ao reconhecimento de emoções em textos, sendo que ainda existe pouco trabalho desenvolvido para a língua portuguesa. Assim, existe uma necessidade em expandir o trabalho feito para a língua inglesa para o português. Esta dissertação tem como objetivo a comparação de dois métodos distintos de aprendizagem profunda resultantes dos avanços na área de Inteligência Artificial para detetar e classificar de forma automática estados emocionais discretos em textos escritos em língua portuguesa. Para tal, a abordagem de classificação de Polignano et al. (2019) baseada em redes de aprendizagem profunda como Long Short-Term Memory bidirecionais e redes convolucionais mediadas por um mecanismo de atenção será replicada para a língua inglesa e será reproduzida para a língua portuguesa. Para a língua inglesa, será utilizado o conjunto de dados da tarefa 1 do SemEval-2018 (Mohammad et al., 2018) tal como na experiência original, que considera quatro emoções discretas: raiva, medo, alegria e tristeza. Para a língua portuguesa, tendo em consideração a falta de conjuntos de dados disponíveis anotados relativamente a emoções, será efetuada uma recolha de dados a partir da rede social Twitter recorrendo a hashtags com conteúdo associado a uma emoção específica para determinar a emoção subjacente ao texto de entre as mesmas quatro emoções presentes no conjunto de dados da língua inglesa que será utilizado. De acordo com experiências realizadas por Mohammad & Kiritchenko (2015), este método de recolha de dados é consistente com a anotação de juízes humanos treinados. Tendo em conta a rápida e contínua evolução dos métodos de aprendizagem profunda para o processamento de linguagem natural e o estado da arte estabelecido por métodos recentes em tarefas desta área tal como o modelo pré-treinado BERT (Bidirectional Encoder Representations from Tranformers) (Devlin et al., 2019), será também aplicada esta abordagem para a tarefa de reconhecimento de emoções para as duas línguas em questão, utilizando os mesmos conjuntos de dados das experiências anteriores. Enquanto a abordagem de Polignano et al. teve um melhor desempenho nas experiências que realizámos com dados em inglês, com diferenças de F1-score de 0.02, o melhor resultado obtido nas experiências com dados na língua portuguesa foi com o modelo BERT, obtendo um resultado máximo de F1-score de 0.6124.Automatic emotion recognition from text is a task that mobilizes the areas of natural language processing and affective computing counting with the special contribution of Cognitive Science subjects such as Artificial Intelligence and Computer Science, Linguistics and Psychology. It aims at the detection and interpretation of human emotions expressed in the written form by computational systems. The interaction of affective and cognitive processes, the essential role that emotions play in interpersonal interactions and the currently increasing use of written communication online make automatic emotion recognition progressively important, namely in areas such as mental healthcare, human-computer interaction, political science, or marketing. The English language has been the main target of studies in emotion recognition in text and the work developed for the Portuguese language is still scarce. Thus, there is a need to expand the work developed for English to Portuguese. The goal of this dissertation is to present and compare two distinct deep learning methods resulting from the advances in Artificial Intelligence to automatically detect and classify discrete emotional states in texts written in Portuguese. For this, the classification approach of Polignano et al. (2019) based on deep learning networks such as bidirectional Long Short-Term Memory and convolutional networks mediated by a self-attention level will be replicated for English and it will be reproduced for Portuguese. For English, the SemEval-2018 task 1 dataset (Mohammad et al., 2018) will be used, as in the original experience, and it considers four discrete emotions: anger, fear, joy, and sadness. For Portuguese, considering the lack of available emotionally annotated datasets, data will be collected from the social network Twitter using hashtags associated to a specific emotional content to determine the underlying emotion of the text from the same four emotions present in the English dataset. According to experiments carried out by Mohammad & Kiritchenko (2015), this method of data collection is consistent with the annotation of trained human judges. Considering the fast and continuous evolution of deep learning methods for natural language processing and the state-of-the-art results achieved by recent methods in tasks in this area such as the pre-trained language model BERT (Bidirectional Encoder Representations from Transformers) (Devlin et al., 2019), this approach will also be applied to the task of emotion recognition for both languages using the same datasets from the previous experiments. It is expected to draw conclusions about the adequacy of these two presented approaches in emotion recognition and to contribute to the state of the art in this task for the Portuguese language. While the approach of Polignano et al. had a better performance in the experiments with English data with a difference in F1 scores of 0.02, for Portuguese we obtained the best result with BERT having a maximum F1 score of 0.6124

    Illocution, Modality, Attitude, Information Patterning and Speech Annotation

    Get PDF
    Most of the papers collected in this book resulted from presentations and discussions undertaken during the V Lablita Workshop that took place at the Federal University of Minas Gerais, Brazil, on August 23-25, 2011. The workshop was held in conjunction with the II Brazilian Seminar on Pragmatics and Prosody. The guiding themes for the joint event were illocution, modality, attitude, information patterning and speech annotation. Thus, all papers presented here are concerned with theoretical and methodological issues related to the study of speech. Among the papers in this volume, there are different theoretical orientations, which are mirrored through the methodological designs of studies pursued. However, all papers are based on the analysis of actual speech, be it from corpora or from experimental contexts trying to emulate natural speech. Prosody is the keyword that comes out from all the papers in this publication, which indicates the high standing of this category in relation to studies that are geared towards the understanding of major elements that are constitutive of the structuring of speech

    Social software for music

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 200

    Prosodic boundary phenomena

    Get PDF
    Synopsis: In spoken language comprehension, the hearer is faced with a more or less continuous stream of auditory information. Prosodic cues, such as pitch movement, pre-boundary lengthening, and pauses, incrementally help to organize the incoming stream of information into prosodic phrases, which often coincide with syntactic units. Prosody is hence central to spoken language comprehension and some models assume that the speaker produces prosody in a consistent and hierarchical fashion. While there is manifold empirical evidence that prosodic boundary cues are reliably and robustly produced and effectively guide spoken sentence comprehension across different populations and languages, the underlying mechanisms and the nature of the prosody-syntax interface still have not been identified sufficiently. This is also reflected in the fact that most models on sentence processing completely lack prosodic information. This edited book volume is grounded in a workshop that was held in 2021 at the annual conference of the Deutsche Gesellschaft für Sprachwissenschaft (DGfS). The five chapters cover selected topics on the production and comprehension of prosodic cues in various populations and languages, all focusing in particular on processing of prosody at structurally relevant prosodic boundaries. Specifically, the book comprises cross-linguistic evidence as well as evidence from non-native listeners, infants, adults, and elderly speakers, highlighting the important role of prosody in both language production and comprehension
    corecore