8 research outputs found
Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network
Personalized review generation (PRG) aims to automatically produce review
text reflecting user preference, which is a challenging natural language
generation task. Most of previous studies do not explicitly model factual
description of products, tending to generate uninformative content. Moreover,
they mainly focus on word-level generation, but cannot accurately reflect more
abstractive user preference in multiple aspects. To address the above issues,
we propose a novel knowledge-enhanced PRG model based on capsule graph neural
network~(Caps-GNN). We first construct a heterogeneous knowledge graph (HKG)
for utilizing rich item attributes. We adopt Caps-GNN to learn graph capsules
for encoding underlying characteristics from the HKG. Our generation process
contains two major steps, namely aspect sequence generation and sentence
generation. First, based on graph capsules, we adaptively learn aspect capsules
for inferring the aspect sequence. Then, conditioned on the inferred aspect
label, we design a graph-based copy mechanism to generate sentences by
incorporating related entities or words from HKG. To our knowledge, we are the
first to utilize knowledge graph for the PRG task. The incorporated KG
information is able to enhance user preference at both aspect and word levels.
Extensive experiments on three real-world datasets have demonstrated the
effectiveness of our model on the PRG task.Comment: Accepted by CIKM 2020 (Long Paper
Table-to-Text: Generating Descriptive Text for Scientific Tables from Randomized Controlled Trials
Unprecedented amounts of data have been generated in the biomedical domain, and the bottleneck for biomedical research has shifted from data generation to data management, interpretation, and communication. Therefore, it is highly desirable to develop systems to assist in text generation from biomedical data, which will greatly improve the dissemination of scientific findings. However, very few studies have investigated issues of data-to-text generation in the biomedical domain. Here I present a systematic study for generating descriptive text from tables in randomized clinical trials (RCT) articles, which includes: (1) an information model for representing RCT tables; (2) annotated corpora containing pairs of RCT table and descriptive text, and labeled structural and semantic information of RCT tables; (3) methods for recognizing structural and semantic information of RCT tables; (4) methods for generating text from RCT tables, evaluated by a user study on three aspects: relevance, grammatical quality, and matching. The proposed hybrid text generation method achieved a low bilingual evaluation understudy (BLEU) score of 5.69; but human review achieved scores of 9.3, 9.9 and 9.3 for relevance, grammatical quality and matching, respectively, which are comparable to review of original human-written text. To the best of our knowledge, this is the first study to generate text from scientific tables in the biomedical domain. The proposed information model, labeled corpora and developed methods for recognizing tables and generating descriptive text could also facilitate other biomedical and informatics research and applications
Neural natural language generation with unstructured contextual information
[EU] Lan honetan, hizkuntza naturalaren sorrera automatikoan informazio ez-egituratuaren esplotazioak izan dezakeen eragina aztertzen da. Bere helburu nagusia, sistema batek aurrez ikusi gabeko informazioa erabiliz testu koherentea sortzeko duen gaitasuna ebaluatzea da. Corpus berri bat ere aurkezten da, zeregin honetarako bereziki prestatutako Amazon Review corpusaren aldaera bat, produktuen deskribapenak input gisa erabiliz, erabiltzaileen iritziak automatikoki sortzeko erabiltzen dena. Hainbat deep learning ereduk eginkizun honetan lortzen dituzten emaitzak konparatzen dira eta informazio ez egituratua ustiatzeko gaitasun maila ezberdina dutela erakusten da.[EN] In this work, we present a novel task for automatic natural language generation, based on
the exploitation of unstructured contextual information. The main aim of the task is to enable the evaluation of a system's capability to generate coherent text based on previously unseen and unstructured information. A new corpus was prepared specifically for the task, based on the Amazon Review corpus with product descriptions used as input for the generation of user reviews. Different deep learning generation models were implemented and compared under the proposed task, with significant differences in terms of their ability to exploit unstructured contextual information
Crowd and AI Powered Manipulation: Characterization and Detection
User reviews are ubiquitous. They power online review aggregators that influence our daily-based decisions, from what products to purchase (e.g., Amazon), movies to view (e.g., Netflix,
HBO, Hulu), restaurants to patronize (e.g., Yelp), and hotels to book (e.g., TripAdvisor, Airbnb).
In addition, policy makers rely on online commenting platforms like Regulations.gov and FCC.gov as a means for citizens to voice their opinions about public policy issues. However, showcasing the opinions of fellow users has a dark side as these reviews and comments are vulnerable to manipulation. And as advances in AI continue, fake reviews generated by AI agents rather than users pose even more scalable and dangerous manipulation attacks. These attacks on online discourse can sway ratings of products, manipulate opinions and perceived support of key issues, and degrade our trust in online platforms. Previous efforts have mainly focused on highly visible anomaly behaviors captured by statistical modeling or clustering algorithms. While detection of such anomalous behaviors helps to improve the reliability of online interactions, it misses subtle and difficult-to-detect behaviors.
This research investigates two major research thrusts centered around manipulation strategies.
In the first thrust, we study crowd-based manipulation strategies wherein crowds of paid workers organize to spread fake reviews. In the second thrust, we explore AI-based manipulation strategies, where crowd workers are replaced by scalable, and potentially undetectable generative models of fake reviews. In particular, one of the key aspects of this work is to address the research gap in previous efforts for anomaly detection where ground truth data is missing (and hence, evaluation can be challenging). In addition, this work studies the capabilities and impact of model-based attacks as the next generation of online threats. We propose inter-related methods for collecting evidence of these attacks, and create new countermeasures for defending against them. The performance of proposed methods are compared against other state-of-the-art approaches in the literature. We find that although crowd campaigns do not show obvious anomaly behavior, they can be detected
given a careful formulation of their behaviors. And, although model-generated fake reviews may appear on the surface to be legitimate, we find that they do not completely mimic the underlying distribution of human-written reviews, so we can leverage this signal to detect them