1,316 research outputs found

    A Just-in-Time Document Retrieval System for Dialogues or Monologues

    Get PDF
    The Automatic Content Linking Device is a just-in-time document retrieval system that monitors an ongoing dialogue or monologue and enriches it with potentially related documents from local repositories or from theWeb. The documents are found using queries that are built from the dialogue words, obtained through automatic speech recognition. Results are displayed in real time to the dialogue participants, or to people watching a recorded dialogue or a talk. The system can be demonstrated in both settings

    Interaction Analysis in Smart Work Environments through Fuzzy Temporal Logic

    Get PDF
    Interaction analysis is defined as the generation of situation descriptions from machine perception. World models created through machine perception are used by a reasoning engine based on fuzzy metric temporal logic and situation graph trees, with optional parameter learning and clustering as preprocessing, to deduce knowledge about the observed scene. The system is evaluated in a case study on automatic behavior report generation for staff training purposes in crisis response control rooms

    Interaction Analysis in Smart Work Environments through Fuzzy Temporal Logic

    Get PDF
    Interaction analysis is defined as the generation of situation descriptions from machine perception. World models created through machine perception are used by a reasoning engine based on fuzzy metric temporal logic and situation graph trees, with optional parameter learning and clustering as preprocessing, to deduce knowledge about the observed scene. The system is evaluated in a case study on automatic behavior report generation for staff training purposes in crisis response control rooms

    Computationally Efficient and Robust BIC-Based Speaker Segmentation

    Get PDF
    An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches

    Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

    Full text link
    Compared to traditional visual question answering, video-grounded dialogues require additional reasoning over dialogue context to answer questions in a multi-turn setting. Previous approaches to video-grounded dialogues mostly use dialogue context as a simple text input without modelling the inherent information flows at the turn level. In this paper, we propose a novel framework of Reasoning Paths in Dialogue Context (PDC). PDC model discovers information flows among dialogue turns through a semantic graph constructed based on lexical components in each question and answer. PDC model then learns to predict reasoning paths over this semantic graph. Our path prediction model predicts a path from the current turn through past dialogue turns that contain additional visual cues to answer the current question. Our reasoning model sequentially processes both visual and textual information through this reasoning path and the propagated features are used to generate the answer. Our experimental results demonstrate the effectiveness of our method and provide additional insights on how models use semantic dependencies in a dialogue context to retrieve visual cues.Comment: Accepted at ICLR (International Conference on Learning Representations) 202

    How did the discussion go: Discourse act classification in social media conversations

    Full text link
    We propose a novel attention based hierarchical LSTM model to classify discourse act sequences in social media conversations, aimed at mining data from online discussion using textual meanings beyond sentence level. The very uniqueness of the task is the complete categorization of possible pragmatic roles in informal textual discussions, contrary to extraction of question-answers, stance detection or sarcasm identification which are very much role specific tasks. Early attempt was made on a Reddit discussion dataset. We train our model on the same data, and present test results on two different datasets, one from Reddit and one from Facebook. Our proposed model outperformed the previous one in terms of domain independence; without using platform-dependent structural features, our hierarchical LSTM with word relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively to predict discourse roles of comments in Reddit and Facebook discussions. Efficiency of recurrent and convolutional architectures in order to learn discursive representation on the same task has been presented and analyzed, with different word and comment embedding schemes. Our attention mechanism enables us to inquire into relevance ordering of text segments according to their roles in discourse. We present a human annotator experiment to unveil important observations about modeling and data annotation. Equipped with our text-based discourse identification model, we inquire into how heterogeneous non-textual features like location, time, leaning of information etc. play their roles in charaterizing online discussions on Facebook

    Pragmatics and Prosody

    Get PDF
    Most of the papers collected in this book resulted from presentations and discussions undertaken during the V Lablita Workshop that took place at the Federal University of Minas Gerais, Brazil, on August 23-25, 2011. The workshop was held in conjunction with the II Brazilian Seminar on Pragmatics and Prosody. The guiding themes for the joint event were illocution, modality, attitude, information patterning and speech annotation. Thus, all papers presented here are concerned with theoretical and methodological issues related to the study of speech. Among the papers in this volume, there are different theoretical orientations, which are mirrored through the methodological designs of studies pursued. However, all papers are based on the analysis of actual speech, be it from corpora or from experimental contexts trying to emulate natural speech. Prosody is the keyword that comes out from all the papers in this publication, which indicates the high standing of this category in relation to studies that are geared towards the understanding of major elements that are constitutive of the structuring of speech
    • …
    corecore