1,283 research outputs found

    CluCDD:Contrastive Dialogue Disentanglement via Clustering

    Full text link
    A huge number of multi-participant dialogues happen online every day, which leads to difficulty in understanding the nature of dialogue dynamics for both humans and machines. Dialogue disentanglement aims at separating an entangled dialogue into detached sessions, thus increasing the readability of long disordered dialogue. Previous studies mainly focus on message-pair classification and clustering in two-step methods, which cannot guarantee the whole clustering performance in a dialogue. To address this challenge, we propose a simple yet effective model named CluCDD, which aggregates utterances by contrastive learning. More specifically, our model pulls utterances in the same session together and pushes away utterances in different ones. Then a clustering method is adopted to generate predicted clustering labels. Comprehensive experiments conducted on the Movie Dialogue dataset and IRC dataset demonstrate that our model achieves a new state-of-the-art result.Comment: 5 page

    Conversation Disentanglement with Bi-Level Contrastive Learning

    Full text link
    Conversation disentanglement aims to group utterances into detached sessions, which is a fundamental task in processing multi-party conversations. Existing methods have two main drawbacks. First, they overemphasize pairwise utterance relations but pay inadequate attention to the utterance-to-context relation modeling. Second, huge amount of human annotated data is required for training, which is expensive to obtain in practice. To address these issues, we propose a general disentangle model based on bi-level contrastive learning. It brings closer utterances in the same session while encourages each utterance to be near its clustered session prototypes in the representation space. Unlike existing approaches, our disentangle model works in both supervised setting with labeled data and unsupervised setting when no such data is available. The proposed method achieves new state-of-the-art performance on both settings across several public datasets

    Revisiting Disentanglement and Fusion on Modality and Context in Conversational Multimodal Emotion Recognition

    Full text link
    It has been a hot research topic to enable machines to understand human emotions in multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion analysis in conversation (MM-ERC). MM-ERC has received consistent attention in recent years, where a diverse range of methods has been proposed for securing better task performance. Most existing works treat MM-ERC as a standard multimodal classification problem and perform multimodal feature disentanglement and fusion for maximizing feature utility. Yet after revisiting the characteristic of MM-ERC, we argue that both the feature multimodality and conversational contextualization should be properly modeled simultaneously during the feature disentanglement and fusion steps. In this work, we target further pushing the task performance by taking full consideration of the above insights. On the one hand, during feature disentanglement, based on the contrastive learning technique, we devise a Dual-level Disentanglement Mechanism (DDM) to decouple the features into both the modality space and utterance space. On the other hand, during the feature fusion stage, we propose a Contribution-aware Fusion Mechanism (CFM) and a Context Refusion Mechanism (CRM) for multimodal and context integration, respectively. They together schedule the proper integrations of multimodal and context features. Specifically, CFM explicitly manages the multimodal feature contributions dynamically, while CRM flexibly coordinates the introduction of dialogue contexts. On two public MM-ERC datasets, our system achieves new state-of-the-art performance consistently. Further analyses demonstrate that all our proposed mechanisms greatly facilitate the MM-ERC task by making full use of the multimodal and context features adaptively. Note that our proposed methods have the great potential to facilitate a broader range of other conversational multimodal tasks.Comment: Accepted by ACM MM 202

    An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

    Get PDF
    Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.Comment: Submitted to the Proceedings of IEE

    Interactional Slingshots: Providing Support Structure to User Interactions in Hybrid Intelligence Systems

    Full text link
    The proliferation of artificial intelligence (AI) systems has enabled us to engage more deeply and powerfully with our digital and physical environments, from chatbots to autonomous vehicles to robotic assistive technology. Unfortunately, these state-of-the-art systems often fail in contexts that require human understanding, are never-before-seen, or complex. In such cases, though the AI-only approaches cannot solve the full task, their ability to solve a piece of the task can be combined with human effort to become more robust to handling complexity and uncertainty. A hybrid intelligence systemโ€”one that combines human and machine skill setsโ€”can make intelligent systems more operable in real-world settings. In this dissertation, we propose the idea of using interactional slingshots as a means of providing support structure to user interactions in hybrid intelligence systems. Much like how gravitational slingshots provide boosts to spacecraft en route to their final destinations, so do interactional slingshots provide boosts to user interactions en route to solving tasks. Several challenges arise: What does this support structure look like? How much freedom does the user have in their interactions? How is user expertise paired with that of the machineโ€™s? To do this as a tractable socio-technical problem, we explore this idea in the context of data annotation problems, especially in those domains where AI methods fail to solve the overall task. Getting annotated (labeled) data is crucial for successful AI methods, and becomes especially more difficult in domains where AI fails, since problems in such domains require human understanding to fully solve, but also present challenges related to annotator expertise, annotation freedom, and context curation from the data. To explore data annotation problems in this space, we develop techniques and workflows whose interactional slingshot support structure harnesses the userโ€™s interaction with data. First, we explore providing support in the form of nudging non-expert usersโ€™ interactions as they annotate text data for the task of creating conversational memory. Second, we add support structure in the form of assisting non-expert users during the annotation process itself for the task of grounding natural language references to objects in 3D point clouds. Finally, we supply support in the form of guiding expert and non-expert users both before and during their annotations for the task of conversational disentanglement across multiple domains. We demonstrate that building hybrid intelligence systems with each of these interactional slingshot support mechanismsโ€”nudging, assisting, and guiding a userโ€™s interaction with dataโ€”improves annotation outcomes, such as annotation speed, accuracy, effort level, even when annotatorsโ€™ expertise and skill levels vary. Thesis Statement: By providing support structure that nudges, assists, and guides user interactions, it is possible to create hybrid intelligence systems that enable more efficient (faster and/or more accurate) data annotation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163138/1/sairohit_1.pd

    A Novel Information-Theoretic Objective to Disentangle Representations for Fair Classification

    Full text link
    One of the pursued objectives of deep learning is to provide tools that learn abstract representations of reality from the observation of multiple contextual situations. More precisely, one wishes to extract disentangled representations which are (i) low dimensional and (ii) whose components are independent and correspond to concepts capturing the essence of the objects under consideration (Locatello et al., 2019b). One step towards this ambitious project consists in learning disentangled representations with respect to a predefined (sensitive) attribute, e.g., the gender or age of the writer. Perhaps one of the main application for such disentangled representations is fair classification. Existing methods extract the last layer of a neural network trained with a loss that is composed of a cross-entropy objective and a disentanglement regularizer. In this work, we adopt an information-theoretic view of this problem which motivates a novel family of regularizers that minimizes the mutual information between the latent representation and the sensitive attribute conditional to the target. The resulting set of losses, called CLINIC, is parameter free and thus, it is easier and faster to train. CLINIC losses are studied through extensive numerical experiments by training over 2k neural networks. We demonstrate that our methods offer a better disentanglement/accuracy trade-off than previous techniques, and generalize better than training with cross-entropy loss solely provided that the disentanglement task is not too constraining.Comment: Findings AACL 202

    Beyond denial and exclusion: The history of relations between Christians and Muslims in the Cape Colony during the 17thโ€“18th centuries with lessons for a post-colonial theology of religions

    Get PDF
    Learning from the past prepares one for being able to cope with the future. History is made upย of strings of relationships. This article follows a historical line from colonialism, throughย apartheid to post-colonialism in order to illustrate inter-religious relations in South-Africa andย how each context determines these relations. Social cohesion is enhanced by a post-colonialย theology of religions based on the current context. By describing the relationship betweenย Christians and Muslims during the 17thโ€“18th centuries in the Cape Colony, lessons can beย deduced to guide inter-religious relations in a post-colonial era in South Africa. One of theย most prominent Muslim leaders during the 17th century in the Cape Colony was Sheik Yusufย al-Makassari. His influence determined the future face of Islam in the Cape Colony and here,ย during the 18th century, ethics started playing a crucial role in determining the relationshipย between Christians and Muslims. The ethical guidance of the Imams formed the Muslimย communities whilst ethical decline was apparent amongst the Christian colonists during theย same period. The place of ethics as determinative of future inter-religious dialogue isย emphasised. Denial and exclusion characterised relationships between Christians andย Muslims. According to a post-colonial understanding of inter-religious contact the equalityย and dignity of non-Christian religions are to be acknowledged. In the postcolonial and postapartheidย struggle for equality, also of religions, prof Graham Duncan, to whom this article isย dedicated, contributed to the process of acknowledging the plurality of the religious reality inย South Africa

    ์Œ์•…์  ์š”์†Œ์— ๋Œ€ํ•œ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ์˜ ๊ฐœ์„ ์— ๊ด€ํ•œ ์—ฐ๊ตฌ: ํ™”์Œ๊ณผ ํ‘œํ˜„์„ ์ค‘์‹ฌ์œผ๋กœ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ์œตํ•ฉ๊ณผํ•™๊ธฐ์ˆ ๋Œ€ํ•™์› ์œตํ•ฉ๊ณผํ•™๋ถ€(๋””์ง€ํ„ธ์ •๋ณด์œตํ•ฉ์ „๊ณต), 2023. 2. ์ด๊ต๊ตฌ.Conditional generation of musical components (CGMC) creates a part of music based on partial musical components such as melody or chord. CGMC is beneficial for discovering complex relationships among musical attributes. It can also assist non-experts who face difficulties in making music. However, recent studies for CGMC are still facing two challenges in terms of generation quality and model controllability. First, the structure of the generated music is not robust. Second, only limited ranges of musical factors and tasks have been examined as targets for flexible control of generation. In this thesis, we aim to mitigate these two challenges to improve the CGMC systems. For musical structure, we focus on intuitive modeling of musical hierarchy to help the model explicitly learn musically meaningful dependency. To this end, we utilize alignment paths between the raw music data and the musical units such as notes or chords. For musical creativity, we facilitate smooth control of novel musical attributes using latent representations. We attempt to achieve disentangled representations of the intended factors by regularizing them with data-driven inductive bias. This thesis verifies the proposed approaches particularly in two representative CGMC tasks, melody harmonization and expressive performance rendering. A variety of experimental results show the possibility of the proposed approaches to expand musical creativity under stable generation quality.์Œ์•…์  ์š”์†Œ๋ฅผ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑํ•˜๋Š” ๋ถ„์•ผ์ธ CGMC๋Š” ๋ฉœ๋กœ๋””๋‚˜ ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์˜ ์ผ๋ถ€๋ถ„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. ์ด ๋ถ„์•ผ๋Š” ์Œ์•…์  ์š”์†Œ ๊ฐ„ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํƒ๊ตฌํ•˜๋Š” ๋ฐ ์šฉ์ดํ•˜๊ณ , ์Œ์•…์„ ๋งŒ๋“œ๋Š” ๋ฐ ์–ด๋ ค์›€์„ ๊ฒช๋Š” ๋น„์ „๋ฌธ๊ฐ€๋“ค์„ ๋„์šธ ์ˆ˜ ์žˆ๋‹ค. ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ์ˆ ์„ ํ™œ์šฉํ•˜์—ฌ CGMC ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๋†’์—ฌ์™”๋‹ค. ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์—๋Š” ์•„์ง ์ƒ์„ฑ ํ’ˆ์งˆ๊ณผ ์ œ์–ด๊ฐ€๋Šฅ์„ฑ ์ธก๋ฉด์—์„œ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์ด ์žˆ๋‹ค. ๋จผ์ €, ์ƒ์„ฑ๋œ ์Œ์•…์˜ ์Œ์•…์  ๊ตฌ์กฐ๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š๋‹ค. ๋˜ํ•œ, ์•„์ง ์ข์€ ๋ฒ”์œ„์˜ ์Œ์•…์  ์š”์†Œ ๋ฐ ํ…Œ์Šคํฌ๋งŒ์ด ์œ ์—ฐํ•œ ์ œ์–ด์˜ ๋Œ€์ƒ์œผ๋กœ์„œ ํƒ๊ตฌ๋˜์—ˆ๋‹ค. ์ด์— ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC์˜ ๊ฐœ์„ ์„ ์œ„ํ•ด ์œ„ ๋‘ ๊ฐ€์ง€์˜ ํ•œ๊ณ„์ ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ฒซ ๋ฒˆ์งธ๋กœ, ์Œ์•… ๊ตฌ์กฐ๋ฅผ ์ด๋ฃจ๋Š” ์Œ์•…์  ์œ„๊ณ„๋ฅผ ์ง๊ด€์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐ ์ง‘์ค‘ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๋ณธ๋ž˜ ๋ฐ์ดํ„ฐ์™€ ์Œ, ํ™”์Œ๊ณผ ๊ฐ™์€ ์Œ์•…์  ๋‹จ์œ„ ๊ฐ„ ์ •๋ ฌ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์ด ์Œ์•…์ ์œผ๋กœ ์˜๋ฏธ์žˆ๋Š” ์ข…์†์„ฑ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋‘ ๋ฒˆ์งธ๋กœ, ์ž ์žฌ ํ‘œ์ƒ์„ ํ™œ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด ์Œ์•…์  ์š”์†Œ๋“ค์„ ์œ ์—ฐํ•˜๊ฒŒ ์ œ์–ดํ•˜๊ณ ์ž ํ•œ๋‹ค. ํŠนํžˆ ์ž ์žฌ ํ‘œ์ƒ์ด ์˜๋„๋œ ์š”์†Œ์— ๋Œ€ํ•ด ํ’€๋ฆฌ๋„๋ก ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋น„์ง€๋„ ํ˜น์€ ์ž๊ฐ€์ง€๋„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž ์žฌ ํ‘œ์ƒ์„ ์ œํ•œํ•˜๋„๋ก ํ•œ๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” CGMC ๋ถ„์•ผ์˜ ๋Œ€ํ‘œ์ ์ธ ๋‘ ํ…Œ์Šคํฌ์ธ ๋ฉœ๋กœ๋”” ํ•˜๋ชจ๋‚˜์ด์ œ์ด์…˜ ๋ฐ ํ‘œํ˜„์  ์—ฐ์ฃผ ๋ Œ๋”๋ง ํ…Œ์Šคํฌ์— ๋Œ€ํ•ด ์œ„์˜ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•œ๋‹ค. ๋‹ค์–‘ํ•œ ์‹คํ—˜์  ๊ฒฐ๊ณผ๋“ค์„ ํ†ตํ•ด ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋ก ์ด CGMC ์‹œ์Šคํ…œ์˜ ์Œ์•…์  ์ฐฝ์˜์„ฑ์„ ์•ˆ์ •์ ์ธ ์ƒ์„ฑ ํ’ˆ์งˆ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Motivation 5 1.2 Definitions 8 1.3 Tasks of Interest 10 1.3.1 Generation Quality 10 1.3.2 Controllability 12 1.4 Approaches 13 1.4.1 Modeling Musical Hierarchy 14 1.4.2 Regularizing Latent Representations 16 1.4.3 Target Tasks 18 1.5 Outline of the Thesis 19 Chapter 2 Background 22 2.1 Music Generation Tasks 23 2.1.1 Melody Harmonization 23 2.1.2 Expressive Performance Rendering 25 2.2 Structure-enhanced Music Generation 27 2.2.1 Hierarchical Music Generation 27 2.2.2 Transformer-based Music Generation 28 2.3 Disentanglement Learning 29 2.3.1 Unsupervised Approaches 30 2.3.2 Supervised Approaches 30 2.3.3 Self-supervised Approaches 31 2.4 Controllable Music Generation 32 2.4.1 Score Generation 32 2.4.2 Performance Rendering 33 2.5 Summary 34 Chapter 3 Translating Melody to Chord: Structured and Flexible Harmonization of Melody with Transformer 36 3.1 Introduction 36 3.2 Proposed Methods 41 3.2.1 Standard Transformer Model (STHarm) 41 3.2.2 Variational Transformer Model (VTHarm) 44 3.2.3 Regularized Variational Transformer Model (rVTHarm) 46 3.2.4 Training Objectives 47 3.3 Experimental Settings 48 3.3.1 Datasets 49 3.3.2 Comparative Methods 50 3.3.3 Training 50 3.3.4 Metrics 51 3.4 Evaluation 56 3.4.1 Chord Coherence and Diversity 57 3.4.2 Harmonic Similarity to Human 59 3.4.3 Controlling Chord Complexity 60 3.4.4 Subjective Evaluation 62 3.4.5 Qualitative Results 67 3.4.6 Ablation Study 73 3.5 Conclusion and Future Work 74 Chapter 4 Sketching the Expression: Flexible Rendering of Expressive Piano Performance with Self-supervised Learning 76 4.1 Introduction 76 4.2 Proposed Methods 79 4.2.1 Data Representation 79 4.2.2 Modeling Musical Hierarchy 80 4.2.3 Overall Network Architecture 81 4.2.4 Regularizing the Latent Variables 84 4.2.5 Overall Objective 86 4.3 Experimental Settings 87 4.3.1 Dataset and Implementation 87 4.3.2 Comparative Methods 88 4.4 Evaluation 88 4.4.1 Generation Quality 89 4.4.2 Disentangling Latent Representations 90 4.4.3 Controllability of Expressive Attributes 91 4.4.4 KL Divergence 93 4.4.5 Ablation Study 94 4.4.6 Subjective Evaluation 95 4.4.7 Qualitative Examples 97 4.4.8 Extent of Control 100 4.5 Conclusion 102 Chapter 5 Conclusion and Future Work 103 5.1 Conclusion 103 5.2 Future Work 106 5.2.1 Deeper Investigation of Controllable Factors 106 5.2.2 More Analysis of Qualitative Evaluation Results 107 5.2.3 Improving Diversity and Scale of Dataset 108 Bibliography 109 ์ดˆ ๋ก 137๋ฐ•
    • โ€ฆ
    corecore