215 research outputs found

    A User-Centered Concept Mining System for Query and Document Understanding at Tencent

    Full text link
    Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

    ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task

    Full text link
    Transformer-based language models, including ChatGPT, have demonstrated exceptional performance in various natural language generation tasks. However, there has been limited research evaluating ChatGPT's keyphrase generation ability, which involves identifying informative phrases that accurately reflect a document's content. This study seeks to address this gap by comparing ChatGPT's keyphrase generation performance with state-of-the-art models, while also testing its potential as a solution for two significant challenges in the field: domain adaptation and keyphrase generation from long documents. We conducted experiments on six publicly available datasets from scientific articles and news domains, analyzing performance on both short and long documents. Our results show that ChatGPT outperforms current state-of-the-art models in all tested datasets and environments, generating high-quality keyphrases that adapt well to diverse domains and document lengths

    Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

    Full text link
    Tackling online hatred using informed textual responses - called counter narratives - has been brought under the spotlight recently. Accordingly, a research line has emerged to automatically generate counter narratives in order to facilitate the direct intervention in the hate discussion and to prevent hate content from further spreading. Still, current neural approaches tend to produce generic/repetitive responses and lack grounded and up-to-date evidence such as facts, statistics, or examples. Moreover, these models can create plausible but not necessarily true arguments. In this paper we present the first complete knowledge-bound counter narrative generation pipeline, grounded in an external knowledge repository that can provide more informative content to fight online hatred. Together with our approach, we present a series of experiments that show its feasibility to produce suitable and informative counter narratives in in-domain and cross-domain settings.Comment: To appear in "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL): Findings

    Topic Distiller:distilling semantic topics from documents

    Get PDF
    Abstract. This thesis details the design and implementation of a system that can find relevant and latent semantic topics from textual documents. The design of this system, named Topic Distiller, is inspired by research conducted on automatic keyphrase extraction and automatic topic labeling, and it employs entity linking and knowledge bases to reduce text documents to their semantic topics. The Topic Distiller is evaluated using methods and datasets used in information retrieval and automatic keyphrase extraction. On top of the common datasets used in the literature three additional datasets are created to evaluate the system. The evaluation reveals that the Topic Distiller is able to find relevant and latent topics from textual documents, beating the state-of-the-art automatic keyphrase methods in performance when used on news articles and social media posts.Semanttisten aiheiden suodattaminen dokumenteista. Tiivistelmä. Tässä diplomityössä tarkastellaan järjestelmää, joka pystyy löytämään tekstistä relevantteja ja piileviä semanttisia aihealueita, sekä kyseisen järjestelmän suunnittelua ja implementaatiota. Tämän Topic Distiller -järjestelmän suunnittelu ammentaa inspiraatiota automaattisen termintunnistamisen ja automaattisen aiheiden nimeämisen tutkimuksesta sekä hyödyntää automaattista semanttista annotointia ja tietämyskantoja tekstin aihealueiden löytämisessä. Topic Distiller -järjestelmän suorituskykyä mitataan hyödyntämällä kirjallisuudessa paljon käytettyjä automaattisen termintunnistamisen evaluontimenetelmiä ja aineistoja. Näiden yleisten aineistojen lisäksi esittelemme kolme uutta aineistoa, jotka on luotu Topic Distiller -järjestelmän arviointia varten. Evaluointi tuo ilmi, että Topic Distiller kykenee löytämään relevantteja ja piileviä aiheita tekstistä. Se päihittää kirjallisuuden viimeisimmät automaattisen termintunnistamisen menetelmät suorituskyvyssä, kun sitä käytetään uutisartikkelien sekä sosiaalisen median julkaisujen analysointiin
    corecore