236 research outputs found

    Recent Trends in Deep Learning Based Personality Detection

    Full text link
    Recently, the automatic prediction of personality traits has received a lot of attention. Specifically, personality trait prediction from multimodal data has emerged as a hot topic within the field of affective computing. In this paper, we review significant machine learning models which have been employed for personality detection, with an emphasis on deep learning-based methods. This review paper provides an overview of the most popular approaches to automated personality detection, various computational datasets, its industrial applications, and state-of-the-art machine learning models for personality detection with specific focus on multimodal approaches. Personality detection is a very broad and diverse topic: this survey only focuses on computational approaches and leaves out psychological studies on personality detection

    Computational modeling of turn-taking dynamics in spoken conversations

    Get PDF
    The study of human interaction dynamics has been at the center for multiple research disciplines in- cluding computer and social sciences, conversational analysis and psychology, for over decades. Recent interest has been shown with the aim of designing computational models to improve human-machine interaction system as well as support humans in their decision-making process. Turn-taking is one of the key aspects of conversational dynamics in dyadic conversations and is an integral part of human- human, and human-machine interaction systems. It is used for discourse organization of a conversation by means of explicit phrasing, intonation, and pausing, and it involves intricate timing. In verbal (e.g., telephone) conversation, the turn transitions are facilitated by inter- and intra- speaker silences and over- laps. In early research of turn-taking in the speech community, the studies include durational aspects of turns, cues for turn yielding intention and lastly designing turn transition modeling for spoken dia- log agents. Compared to the studies of turn transitions very few works have been done for classifying overlap discourse, especially the competitive act of overlaps and function of silences. Given the limitations of the current state-of-the-art, this dissertation focuses on two aspects of con- versational dynamics: 1) design automated computational models for analyzing turn-taking behavior in a dyadic conversation, 2) predict the outcome of the conversations, i.e., observed user satisfaction, using turn-taking descriptors, and later these two aspects are used to design a conversational profile for each speaker using turn-taking behavior and the outcome of the conversations. The analysis, experiments, and evaluation has been done on a large dataset of Italian call-center spoken conversations where customers and agents are engaged in real problem-solving tasks. Towards solving our research goal, the challenges include automatically segmenting and aligning speakers’ channel from the speech signal, identifying and labeling the turn-types and its functional aspects. The task becomes more challenging due to the presence of overlapping speech. To model turn- taking behavior, the intension behind these overlapping turns needed to be considered. However, among all, the most critical question is how to model observed user satisfaction in a dyadic conversation and what properties of turn-taking behavior can be used to represent and predict the outcome. Thus, the computational models for analyzing turn-taking dynamics, in this dissertation includes au- tomatic segmenting and labeling turn types, categorization of competitive vs non-competitive overlaps, silences (e.g., lapse, pauses) and functions of turns in terms of dialog acts. The novel contributions of the work presented here are to 1. design of a fully automated turn segmentation and labeling (e.g., agent vs customer’s turn, lapse within the speaker, and overlap) system. 2. the design of annotation guidelines for segmenting and annotating the speech overlaps with the competitive and non-competitive labels. 3. demonstrate how different channels of information such as acoustic, linguistic, and psycholin- guistic feature sets perform in the classification of competitive vs non-competitive overlaps. 4. study the role of speakers and context (i.e., agents’ and customers’ speech) for conveying the information of competitiveness for each individual feature set and their combinations. 5. investigate the function of long silences towards the information flow in a dyadic conversation. The extracted turn-taking cues is then used to automatically predict the outcome of the conversation, which is modeled from continuous manifestations of emotion. The contributions include 1. modeling the state of the observed user satisfaction in terms of the final emotional manifestation of the customer (i.e., user). 2. analysis and modeling turn-taking properties to display how each turn type influence the user satisfaction. 3. study of how turn-taking behavior changes within each emotional state. Based on the studies conducted in this work, it is demonstrated that turn-taking behavior, specially competitiveness of overlaps, is more than just an organizational tool in daily human interactions. It represents the beneficial information and contains the power to predict the outcome of the conversation in terms of satisfaction vs not-satisfaction. Combining the turn-taking behavior and the outcome of the conversation, the final and resultant goal is to design a conversational profile for each speaker. Such profiled information not only facilitate domain experts but also would be useful to the call center agent in real time. These systems are fully automated and no human intervention is required. The findings are po- tentially relevant to the research of overlapping speech and automatic analysis of human-human and human-machine interactions

    A review of affective computing: From unimodal analysis to multimodal fusion

    Get PDF
    Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field

    Developing Secondary Language Identity in the Context of Professional Communication

    Get PDF

    VOICE QUALITY AND TV INTERPRETING: A PROPOSAL FOR A GESTALTIC EVALUATION

    Get PDF
    RESUMEN. La presente tesis doctoral es un estudio de interpretaci\uf3n basado en corpus y consiste en una propuesta de evaluaci\uf3n subjetiva de tipo gest\ue1ltico de la interpretaci\uf3n simult\ue1nea transmitida por televisi\uf3n. El objetivo principal del estudio ha sido la construcci\uf3n de un modelo de evaluaci\uf3n de la calidad basado en la percepci\uf3n gest\ue1ltica del habla y del sonido-imagen percibido a trav\ue9s del medio auiovisual. El modelo de percepci\uf3n gest\ue1ltica adoptado est\ue1 formado por voz-s\uedlaba-prosodia-sentido-contexto-conocimiento (ling\u3cb\uedstico) del mundo, propuesto en \u201cIl volto fonico delle parole\u201d (Albano Leoni 2009), que es una reelaboraci\uf3n del modelo basado en melod\ueda-ritmo-palabras-oraciones, propuesto por Karl B\u3cbhler en su \u201cTeor\ueda del lenguaje\u201d (1934). Se construy\uf3 un corpus tem\ue1tico formado por las interpretaciones en italiano (2) y en espa\uf1ol (2 \u2013 Espa\uf1a y Estados Unidos) de los Debates Presidenciales de Estados Unidos de 2012: el corpus ORenesit (Obama-Romney English espa\uf1ol italiano) se incluye en el corpus de referencia CorIT (Corpus Italiano de Interpretaci\uf3n Televisiva). El modelo de evaluaci\uf3n fue ensayado en una encuesta piloto basada en cuestionario, que incluye 3 extractos v\ueddeo de la interpretaci\uf3n en italiano del Tercer Debate Presidencial de EE.UU. de 2008, entre Obama y McCain, debido a que el corpus ORenesit todav\ueda no se hab\ueda terminado. Uno de los tres v\ueddeos fue modificado por fines experimentales: la voz del int\ue9rprete original se sustituy\uf3 por la de un actor doblador profesional que imit\uf3 en estudio la interpretaci\uf3n original leyendo la transcripci\uf3n y escuchando al orador. Esta decisi\uf3n respond\ueda a dos necesidades, relacionadas sobre todo a la validez ecol\uf3gica del experimento: a) ensayar el efecto de una voz teleg\ue9nica; b) utilizar la expresi\uf3n natural y personal del sujeto. El cuestionario se construy\uf3 sobre categor\uedas extra\ueddas de \u201cLa vive voix\u201d (F\uf3nagy 1983) e \u201cL\u2019Audio-Vision\u201d (Chion 1990). Los datos obtenidos del cuestionario se trataron estad\uedsticamente. Los resultados del estudio cuali-cuantitativo parecen confirmar una percepci\uf3n gest\ue1ltica de la interpretaci\uf3n simult\ue1nea percibida a trav\ue9s del medio audio-visual formada por las componentes: sonido-imagen, s\uedlaba-melod\ueda(-voz-personalidad), palabras-oraciones. Lor resultados parecen poner en duda la efectividad del enfoque cuantitativo para el an\ue1lisis de la percepci\uf3n del habla.ABSTRACT. The present thesis is a corpus-based Interpreting study consisting of a proposal for a gestaltic subjective evaluation of quality in television broadcast simultaneous interpreting. The main objective of the research was to build and test a model of quality assessment based on the gestaltic perception both of speech and the sound-image perceived through the audiovisual medium. The model of gestaltic perception adopted is the one formed by voice-syllable-prosody-sense-context-(linguistic) knowledge of the world, proposed in \u201cIl volto fonico delle parole\u201d (Albano Leoni 2009), which is a re-elaborated version of the model based on melody-rhythm-words-sentences, proposed by Karl B\u3cbhler in his \u201cTheory of Language\u201d (1934). A thematic corpus was built consisting of 2 Italian and 2 Spanish (Spain and United States) interpretations of the 2012 US Presidential Debates: the corpus ORenesit (Obama-Romney English espa\uf1ol italiano) is included in the reference corpus CorIT (Italian Television Interpreting Corpus). The assessment model was tested in a questionnaire-based pilot survey including 3 video excerpts from the Italian interpretations of the 2008 Third Presidential Debate (Obama vs. McCain), since the corpus ORenesit had not been completed yet. One of the 3 video excerpts was modified for experimental purpose: the interpreter\u2019s voice was replaced with the voice of a professional actor and dubber, who imitated in studio the original interpretation while reading the transcript and listening to the speaker. This choice was made to fulfill two needs, mainly related to the ecological validity of the experiment: i) to test the effect of a telegenic voice; and ii) to use a natural and personal expression of the subject. The questionnaire was built on categories extracted from the \u201cLa vive voix\u201d (F\uf3nagy 1983) and \u201cL\u2019Audio-Vision\u201d (Chion 1990). The data obtained were treated statistically. Results of the qualitative and quantitative research seem to confirm a gestaltic perception of interpreting speech received through audio-vision and formed by the following components: sound-image; syllable-melody(-voice-personality), words-sentences. Results seem to raise doubts on the effectiveness of the quantitative approach to the analysis of speech perception

    More than a condition: an examination of synaesthesia as a key cognitive factor in the processing of reality and in its literary and pictorial renditions

    Get PDF
    What seems to influence language at its most profound level,is the Lebenswelt, the experience of the bodyin its relationship to the environment and to others. […]Meaning is thus firstly anchored in anticipation, qualitative,often synesthesic feelings, and not in a directional grasping of “objects”.(my emphasis; Cadiot 41) The issue of synaesthesia, whether it be addressed as a benign anomaly or an artistic device is a highly composite and heterogeneous matter. It is more complex than the mere..
    • …
    corecore