27 research outputs found

    Meeting decision detection: multimodal information fusion for multi-party dialogue understanding

    Get PDF
    Modern advances in multimedia and storage technologies have led to huge archives of human conversations in widely ranging areas. These archives offer a wealth of information in the organization contexts. However, retrieving and managing information in these archives is a time-consuming and labor-intensive task. Previous research applied keyword and computer vision-based methods to do this. However, spontaneous conversations, complex in the use of multimodal cues and intricate in the interactions between multiple speakers, have posed new challenges to these methods. We need new techniques that can leverage the information hidden in multiple communication modalities – including not just “what” the speakers say but also “how” they express themselves and interact with others. In responding to this need, the thesis inquires into the multimodal nature of meeting dialogues and computational means to retrieve and manage the recorded meeting information. In particular, this thesis develops the Meeting Decision Detector (MDD) to detect and track decisions, one of the most important outcomes of the meetings. The MDD involves not only the generation of extractive summaries pertaining to the decisions (“decision detection”), but also the organization of a continuous stream of meeting speech into locally coherent segments (“discourse segmentation”). This inquiry starts with a corpus analysis which constitutes a comprehensive empirical study of the decision-indicative and segment-signalling cues in the meeting corpora. These cues are uncovered from a variety of communication modalities, including the words spoken, gesture and head movements, pitch and energy level, rate of speech, pauses, and use of subjective terms. While some of the cues match the previous findings of speech segmentation, some others have not been studied before. The analysis also provides empirical grounding for computing features and integrating them into a computational model. To handle the high-dimensional multimodal feature space in the meeting domain, this thesis compares empirically feature discriminability and feature pattern finding criteria. As the different knowledge sources are expected to capture different types of features, the thesis also experiments with methods that can harness synergy between the multiple knowledge sources. The problem formalization and the modeling algorithm so far correspond to an optimal setting: an off-line, post-meeting analysis scenario. However, ultimately the MDD is expected to be operated online – right after a meeting, or when a meeting is still in progress. Thus this thesis also explores techniques that help relax the optimal setting, especially those using only features that can be generated with a higher degree of automation. Empirically motivated experiments are designed to handle the corresponding performance degradation. Finally, with the users in mind, this thesis evaluates the use of query-focused summaries in a decision debriefing task, which is common in the organization context. The decision-focused extracts (which represent compressions of 1%) is compared against the general-purpose extractive summaries (which represent compressions of 10-40%). To examine the effect of model automation on the debriefing task, this evaluation experiments with three versions of decision-focused extracts, each relaxing one manual annotation constraint. Task performance is measured in actual task effectiveness, usergenerated report quality, and user-perceived success. The users’ clicking behaviors are also recorded and analyzed to understand how the users leverage the different versions of extractive summaries to produce abstractive summaries. The analysis framework and computational means developed in this work is expected to be useful for the creation of other dialogue understanding applications, especially those that require to uncover the implicit semantics of meeting dialogues

    Capturing Synchronous Collaborative Design Activities: A State-Of-The-Art Technology Review

    Get PDF

    Towards Context-free Information Importance Estimation

    Get PDF
    The amount of information contained in heterogeneous text documents such as news articles, blogs, social media posts, scientific articles, discussion forums, and microblogging platforms is already huge and is going to increase further. It is not possible for humans to cope with this flood of information, so that important information can neither be found nor be utilized. This situation is unfortunate since information is the key driver in many areas of society in the present Information Age. Hence, developing automatic means that can assist people to handle the information overload is crucial. Developing methods for automatic estimation of information importance is an essential step towards this goal. The guiding hypothesis of this work is that prior methods for automatic information importance estimation are inherently limited because they are based on merely correlated signals that are, however, not causally linked with information importance. To resolve this issue, we lay in this work the foundations for a fundamentally new approach for importance estimation. The key idea of context-free information importance estimation is to equip machine learning models with world knowledge so that they can estimate information importance based on causal reasons. In the first part of this work, we lay the theoretical foundations for context-free information importance estimation. First, we discuss how the abstract concept of information importance can be formally defined. So far, a formal definition of this concept is missing in the research community. We close this gap by discussing two information importance definitions, which equate the importance of information with its impact on the behavior and the impact on the course of life of the information recipients, respectively. Second, we discuss how information importance estimation abilities can be assessed. Usually, this is done by performing automatic summarization of text documents. However, we find that this approach is not ideal. Instead, we propose to consider ranking, regression, and preference prediction tasks as alternatives in future work. Third, we deduce context-free information importance estimation as a logical consequence of the previously introduced importance definitions. We find that reliable importance estimation, in particular for heterogeneous text documents, is only possible with context-free methods. In the second part, we develop the first machine learning models based on the idea of context-free information importance estimation. To this end, we first tackle the lack of suited datasets that are required to train and test machine learning models. In particular, large and heterogeneous datasets to investigate automatic summarization of multiple source documents are missing, because their construction is complicated and costly. To solve this problem, we present a simple and cost-efficient corpus construction approach and demonstrate its applicability by creating new multi-document summarization datasets. Second, we develop a new machine learning approach for context-free information importance estimation, implement a concrete realization, and demonstrate its advantages over contextual importance estimators. Third, we develop a new method to evaluate automatic summarization methods. Previous works are based on expensive reference summaries and unreliable semantic comparisons of text documents. On the contrary, our approach uses cheap pairwise preference annotations and only much simpler sentence-level similarity estimation. This work lays the foundations for context-free information importance estimation. We hope that future research will explore if this fundamentally new type of information importance estimation can eventually lead to human-level information importance estimation abilities

    Text Encoding and Decoding from Global Perspectives

    Get PDF
    As an important application scenario of deep learning, Natural Language Processing (NLP) is receiving more and more attention and developing rapidly. Learning representation for words or documents via neural networks is gradually replacing feature engineering in almost all text-related applications. On the other hand, how to decode these representations or encodings is also very vital for sequence-to-sequence text generation tasks such as Neural Abstractive Summarization (NAS), Neural Machine Translation (NMT), etc. Towards a more comprehensive representation and decoding strategy, this dissertation explores several global perspectives that previous studies ignored. We treat {\it global} as a relative concept that indicates higher-level knowledge conducive to enriching representation or improving decoding. However, its specific definition may vary in different tasks. In text representation or encoding, {\it global} refers to relatively higher-level context information. There usually are three natural contextual relationships for mapping words or documents into latent space, namely (1) co-occurrence relationships between words, (2) coherence relationships between sentences, and (3) subordinate relationships between documents/sentences and their words. Beyond these naturally occurring contexts, there are possibly hidden context relationships between dependent documents from the perspective of the whole corpus (i.e., the global perspective). Although we often assume that documents in a corpus are independent of each other, the assumption may not be valid for some corpora like news corpora, since events reported by news documents interact in the real world. To capture the global-contextual information, we construct a news network for the whole corpus to model the latent relationships between news. A network embedding algorithm is then designed to produce news representations based on the above-mentioned subordinate relationship and news dependency. Besides, such a cross-document relationship plays a vital role in some specific tasks which need to represent or encode a cluster of multiple documents, e.g., Multi-document Summarization (MDS). Some studies concatenate all documents as a flat sequence, which is detrimental to modeling the cross-document and long-term dependency. To alleviate the two problems, we design a Parallel Hierarchical Transformer (PHT), whose local and global attention mechanisms can simultaneously capture cross-token and cross-document relationships. On the other hand, {\it global} in text decoding refers to a higher-level optimum, i.e., the global optimum relative to the local optimum. Under the fact that the neural text generator is almost impossible to generate the whole sentence at once, the heuristic algorithm -- beam search has been the natural choice for text decoding. Inevitably, beam search often gets stuck of local optimum as it decodes word-by-word. Although global optimum is hard to touch directly, it is feasible to conduct a one-shot prediction of how the global optimal hypothesis attends to the source tokens. A global scoring mechanism is then proposed to evaluate generated sentences at each step based on the predicted global attention distribution, thus calibrating beam search stepwise to return a hypothesis that can assign attention distribution to the source in a more-near global optimal manner. Decoding with global awareness improves the local optimum problem to enhance the generation quality significantly, and it can be developed and used in various text generation fields

    XVIII. Magyar Számítógépes Nyelvészeti Konferencia

    Get PDF
    corecore