215 research outputs found
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
ChatGPT vs State-of-the-Art Models: A Benchmarking Study in Keyphrase Generation Task
Transformer-based language models, including ChatGPT, have demonstrated
exceptional performance in various natural language generation tasks. However,
there has been limited research evaluating ChatGPT's keyphrase generation
ability, which involves identifying informative phrases that accurately reflect
a document's content. This study seeks to address this gap by comparing
ChatGPT's keyphrase generation performance with state-of-the-art models, while
also testing its potential as a solution for two significant challenges in the
field: domain adaptation and keyphrase generation from long documents. We
conducted experiments on six publicly available datasets from scientific
articles and news domains, analyzing performance on both short and long
documents. Our results show that ChatGPT outperforms current state-of-the-art
models in all tested datasets and environments, generating high-quality
keyphrases that adapt well to diverse domains and document lengths
Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech
Tackling online hatred using informed textual responses - called counter
narratives - has been brought under the spotlight recently. Accordingly, a
research line has emerged to automatically generate counter narratives in order
to facilitate the direct intervention in the hate discussion and to prevent
hate content from further spreading. Still, current neural approaches tend to
produce generic/repetitive responses and lack grounded and up-to-date evidence
such as facts, statistics, or examples. Moreover, these models can create
plausible but not necessarily true arguments. In this paper we present the
first complete knowledge-bound counter narrative generation pipeline, grounded
in an external knowledge repository that can provide more informative content
to fight online hatred. Together with our approach, we present a series of
experiments that show its feasibility to produce suitable and informative
counter narratives in in-domain and cross-domain settings.Comment: To appear in "Proceedings of the 59th Annual Meeting of the
Association for Computational Linguistics (ACL): Findings
Topic Distiller:distilling semantic topics from documents
Abstract. This thesis details the design and implementation of a system that can find relevant and latent semantic topics from textual documents. The design of this system, named Topic Distiller, is inspired by research conducted on automatic keyphrase extraction and automatic topic labeling, and it employs entity linking and knowledge bases to reduce text documents to their semantic topics.
The Topic Distiller is evaluated using methods and datasets used in information retrieval and automatic keyphrase extraction. On top of the common datasets used in the literature three additional datasets are created to evaluate the system.
The evaluation reveals that the Topic Distiller is able to find relevant and latent topics from textual documents, beating the state-of-the-art automatic keyphrase methods in performance when used on news articles and social media posts.Semanttisten aiheiden suodattaminen dokumenteista. Tiivistelmä. Tässä diplomityössä tarkastellaan järjestelmää, joka pystyy löytämään tekstistä relevantteja ja piileviä semanttisia aihealueita, sekä kyseisen järjestelmän suunnittelua ja implementaatiota. Tämän Topic Distiller -järjestelmän suunnittelu ammentaa inspiraatiota automaattisen termintunnistamisen ja automaattisen aiheiden nimeämisen tutkimuksesta sekä hyödyntää automaattista semanttista annotointia ja tietämyskantoja tekstin aihealueiden löytämisessä.
Topic Distiller -järjestelmän suorituskykyä mitataan hyödyntämällä kirjallisuudessa paljon käytettyjä automaattisen termintunnistamisen evaluontimenetelmiä ja aineistoja. Näiden yleisten aineistojen lisäksi esittelemme kolme uutta aineistoa, jotka on luotu Topic Distiller -järjestelmän arviointia varten.
Evaluointi tuo ilmi, että Topic Distiller kykenee löytämään relevantteja ja piileviä aiheita tekstistä. Se päihittää kirjallisuuden viimeisimmät automaattisen termintunnistamisen menetelmät suorituskyvyssä, kun sitä käytetään uutisartikkelien sekä sosiaalisen median julkaisujen analysointiin
- …