206 research outputs found

    Personalized Dialogue Generation with Diversified Traits

    Full text link
    Endowing a dialogue system with particular personality traits is essential to deliver more human-like conversations. However, due to the challenge of embodying personality via language expression and the lack of large-scale persona-labeled dialogue data, this research problem is still far from well-studied. In this paper, we investigate the problem of incorporating explicit personality traits in dialogue generation to deliver personalized dialogues. To this end, firstly, we construct PersonalDialog, a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker. This large-scale dataset will facilitate not only the study of personalized dialogue generation, but also other researches on sociolinguistics or social science. Secondly, to study how personality traits can be captured and addressed in dialogue generation, we propose persona-aware dialogue generation models within the sequence to sequence learning framework. Explicit personality traits (structured by key-value pairs) are embedded using a trait fusion module. During the decoding process, two techniques, namely persona-aware attention and persona-aware bias, are devised to capture and address trait-related information. Experiments demonstrate that our model is able to address proper traits in different contexts. Case studies also show interesting results for this challenging research problem.Comment: Please contact [zhengyinhe1 at 163 dot com] for the PersonalDialog datase

    Out-of-domain Detection for Natural Language Understanding in Dialog Systems

    Full text link
    Natural Language Understanding (NLU) is a vital component of dialogue systems, and its ability to detect Out-of-Domain (OOD) inputs is critical in practical applications, since the acceptance of the OOD input that is unsupported by the current system may lead to catastrophic failure. However, most existing OOD detection methods rely heavily on manually labeled OOD samples and cannot take full advantage of unlabeled data. This limits the feasibility of these models in practical applications. In this paper, we propose a novel model to generate high-quality pseudo OOD samples that are akin to IN-Domain (IND) input utterances, and thereby improves the performance of OOD detection. To this end, an autoencoder is trained to map an input utterance into a latent code. and the codes of IND and OOD samples are trained to be indistinguishable by utilizing a generative adversarial network. To provide more supervision signals, an auxiliary classifier is introduced to regularize the generated OOD samples to have indistinguishable intent labels. Experiments show that these pseudo OOD samples generated by our model can be used to effectively improve OOD detection in NLU. Besides, we also demonstrate that the effectiveness of these pseudo OOD data can be further improved by efficiently utilizing unlabeled data.Comment: Accepted by TALS

    Lessons from Computational Modelling of Reference Production in Mandarin and English

    Get PDF
    Referring expression generation (REG) algorithms offer computational models of the production of referring expressions. In earlier work, a corpus of referring expressions (REs) in Mandarin was introduced. In the present paper, we annotate this corpus, evaluate classic REG algorithms on it, and compare the results with earlier results on the evaluation of REG for English referring expressions. Next, we offer an in-depth analysis of the corpus, focusing on issues that arise from the grammar of Mandarin. We discuss shortcomings of previous REG evaluations that came to light during our investigation and we highlight some surprising results. Perhaps most strikingly, we found a much higher proportion of under-specified expressions than previous studies had suggested, not just in Mandarin but in English as well.Comment: Long paper accepted at INLG 202

    Modelling Pro-drop with the Rational Speech Acts Model

    Get PDF
    Publisher PD

    Computational Modelling of Plurality and Definiteness in Chinese Noun Phrases

    Full text link
    Theoretical linguists have suggested that some languages (e.g., Chinese and Japanese) are "cooler" than other languages based on the observation that the intended meaning of phrases in these languages depends more on their contexts. As a result, many expressions in these languages are shortened, and their meaning is inferred from the context. In this paper, we focus on the omission of the plurality and definiteness markers in Chinese noun phrases (NPs) to investigate the predictability of their intended meaning given the contexts. To this end, we built a corpus of Chinese NPs, each of which is accompanied by its corresponding context, and by labels indicating its singularity/plurality and definiteness/indefiniteness. We carried out corpus assessments and analyses. The results suggest that Chinese speakers indeed drop plurality and definiteness markers very frequently. Building on the corpus, we train a bank of computational models using both classic machine learning models and state-of-the-art pre-trained language models to predict the plurality and definiteness of each NP. We report on the performance of these models and analyse their behaviours.Comment: Accepted to LREC-COLING 202

    DrivingBeacon : Driving Behaviour Change Support System Considering Mobile Use and Geo-information

    Get PDF
    Publisher PD

    Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis

    Get PDF
    Knowledge Graph Embeddings (KGEs) have been intensively explored in recent years due to their promise for a wide range of applications. However, existing studies focus on improving the final model performance without acknowledging the computational cost of the proposed approaches, in terms of execution time and environmental impact. This paper proposes a simple yet effective KGE framework which can reduce the training time and carbon footprint by orders of magnitudes compared with state-of-the-art approaches, while producing competitive performance. We highlight three technical innovations: full batch learning via relational matrices, closed-form Orthogonal Procrustes Analysis for KGEs, and non-negative-sampling training. In addition, as the first KGE method whose entity embeddings also store full relation information, our trained models encode rich semantics and are highly interpretable. Comprehensive experiments and ablation studies involving 13 strong baselines and two standard datasets verify the effectiveness and efficiency of our algorithm.Comment: To appear at NAACL 202

    BASIC:A Comprehensive Model for so <sub>x</sub>Formation Mechanism and Optimization in Municipal Solid Waste (MSW) Combustion

    Get PDF
    [Image: see text] Municipal solid waste (MSW) incineration is one of the main techniques currently used for waste to energy (WTE) conversion in China. Although the sulfur content in MSW is lower than that in coal, its emission cannot be neglected due to environmental pollution, malodor, health problems, and global climate change. Therefore, it is particularly important to effectively predict and control the sulfur pollutants. In this study, a comprehensive model was developed and coupled with the full combustion process bed model bulk accumulated solids incineration code (BASIC) to investigate the formation and transformation processes of sulfur in MSW incineration. The submodels of the four stages in the MSW combustion processes; governing equations of mass, momentum, and energy conservation; and various chemical reactions were included in the model. Based on this model, the effects of different parameters on the formation of sulfur pollutants during the incineration process were studied under different operating conditions. The study finds that for SO(X) formation, initial temperature, primary air volume, and material particle size have significant impacts, whereas pressure shows a less significant effect. This article also considers H(2)S, COS, and CS(2) formation under different conditions. An optimization study was performed to reduce SO(X) pollutants

    Improving Variational Autoencoder for Text Modelling with Timestep-Wise Regularisation

    Get PDF
    Accepted by COLING 2020, final camera ready versionPreprin
    • โ€ฆ
    corecore