385 research outputs found
Scientific Opinion Summarization: Meta-review Generation with Checklist-guided Iterative Introspection
Opinions in the scientific domain can be divergent, leading to controversy or
consensus among reviewers. However, current opinion summarization datasets
mostly focus on product review domains, which do not account for this
variability under the assumption that the input opinions are non-controversial.
To address this gap, we propose the task of scientific opinion summarization,
where research paper reviews are synthesized into meta-reviews. To facilitate
this task, we introduce a new ORSUM dataset covering 10,989 paper meta-reviews
and 40,903 paper reviews from 39 conferences. Furthermore, we propose the
Checklist-guided Iterative Introspection (CGI) approach, which breaks down
the task into several stages and iteratively refines the summary under the
guidance of questions from a checklist. We conclude that (1) human-written
summaries are not always reliable since many do not follow the guidelines, and
(2) the combination of task decomposition and iterative self-refinement shows
promising discussion involvement ability and can be applied to other complex
text generation using black-box LLM
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Machine generated text is increasingly difficult to distinguish from human
authored text. Powerful open-source models are freely available, and
user-friendly tools that democratize access to generative models are
proliferating. ChatGPT, which was released shortly after the first preprint of
this survey, epitomizes these trends. The great potential of state-of-the-art
natural language generation (NLG) systems is tempered by the multitude of
avenues for abuse. Detection of machine generated text is a key countermeasure
for reducing abuse of NLG models, with significant technical challenges and
numerous open problems. We provide a survey that includes both 1) an extensive
analysis of threat models posed by contemporary NLG systems, and 2) the most
complete review of machine generated text detection methods to date. This
survey places machine generated text within its cybersecurity and social
context, and provides strong guidance for future work addressing the most
critical threat models, and ensuring detection systems themselves demonstrate
trustworthiness through fairness, robustness, and accountability.Comment: Manuscript submitted to ACM Special Session on Trustworthy AI.
2022/11/19 - Updated reference
Graph Learning for Anomaly Analytics: Algorithms, Applications, and Challenges
Anomaly analytics is a popular and vital task in various research contexts,
which has been studied for several decades. At the same time, deep learning has
shown its capacity in solving many graph-based tasks like, node classification,
link prediction, and graph classification. Recently, many studies are extending
graph learning models for solving anomaly analytics problems, resulting in
beneficial advances in graph-based anomaly analytics techniques. In this
survey, we provide a comprehensive overview of graph learning methods for
anomaly analytics tasks. We classify them into four categories based on their
model architectures, namely graph convolutional network (GCN), graph attention
network (GAT), graph autoencoder (GAE), and other graph learning models. The
differences between these methods are also compared in a systematic manner.
Furthermore, we outline several graph-based anomaly analytics applications
across various domains in the real world. Finally, we discuss five potential
future research directions in this rapidly growing field
Graph learning for anomaly analytics : algorithms, applications, and challenges
Anomaly analytics is a popular and vital task in various research contexts that has been studied for several decades. At the same time, deep learning has shown its capacity in solving many graph-based tasks, like node classification, link prediction, and graph classification. Recently, many studies are extending graph learning models for solving anomaly analytics problems, resulting in beneficial advances in graph-based anomaly analytics techniques. In this survey, we provide a comprehensive overview of graph learning methods for anomaly analytics tasks. We classify them into four categories based on their model architectures, namely graph convolutional network, graph attention network, graph autoencoder, and other graph learning models. The differences between these methods are also compared in a systematic manner. Furthermore, we outline several graph-based anomaly analytics applications across various domains in the real world. Finally, we discuss five potential future research directions in this rapidly growing field. © 2023 Association for Computing Machinery
Artificial Intelligence and Bank Soundness: Between the Devil and the Deep Blue Sea - Part 2
Banks have experienced chronic weaknesses as well as frequent crisis over the years. As bank failures are costly and affect global economies, banks are constantly under intense scrutiny by regulators. This makes banks the most highly regulated industry in the world today. As banks grow into the 21st century framework, banks are in need to embrace Artificial Intelligence (AI) to not only to provide personalized world class service to its large database of customers but most importantly to survive. The chapter provides a taxonomy of bank soundness in the face of AI through the lens of CAMELS where C (Capital), A(Asset), M(Management), E(Earnings), L(Liquidity), S(Sensitivity). The taxonomy partitions challenges from the main strand of CAMELS into distinct categories of AI into 1(C), 4(A), 17(M), 8 (E), 1(L), 2(S) categories that banks and regulatory teams need to consider in evaluating AI use in banks. Although AI offers numerous opportunities to enable banks to operate more efficiently and effectively, at the same time banks also need to give assurance that AI ‘do no harm’ to stakeholders. Posing many unresolved questions, it seems that banks are trapped between the devil and the deep blue sea for now
FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering
Combating disinformation is one of the burning societal crises -- about 67%
of the American population believes that disinformation produces a lot of
uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows
that disinformation can manipulate democratic processes and public opinion,
causing disruption in the share market, panic and anxiety in society, and even
death during crises. Therefore, disinformation should be identified promptly
and, if possible, mitigated. With approximately 3.2 billion images and 720,000
hours of video shared online daily on social media platforms, scalable
detection of multimodal disinformation requires efficient fact verification.
Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR),
the research community lacks substantial effort in multimodal fact
verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3
million samples that pushes the boundaries of the domain of fact verification
via a multimodal fake news dataset, in addition to offering explainability
through the concept of 5W question-answering. Salient features of the dataset
include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii)
associated images, (iv) stable diffusion-generated additional images (i.e.,
visual paraphrases), (v) pixel-level image heatmap to foster image-text
explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news
stories.Comment: arXiv admin note: text overlap with arXiv:2305.0432
Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
Recent research has focused on using large language models (LLMs) to generate
explanations for hate speech through fine-tuning or prompting. Despite the
growing interest in this area, these generated explanations' effectiveness and
potential limitations remain poorly understood. A key concern is that these
explanations, generated by LLMs, may lead to erroneous judgments about the
nature of flagged content by both users and content moderators. For instance,
an LLM-generated explanation might inaccurately convince a content moderator
that a benign piece of content is hateful. In light of this, we propose an
analytical framework for examining hate speech explanations and conducted an
extensive survey on evaluating such explanations. Specifically, we prompted
GPT-3 to generate explanations for both hateful and non-hateful content, and a
survey was conducted with 2,400 unique respondents to evaluate the generated
explanations. Our findings reveal that (1) human evaluators rated the
GPT-generated explanations as high quality in terms of linguistic fluency,
informativeness, persuasiveness, and logical soundness, (2) the persuasive
nature of these explanations, however, varied depending on the prompting
strategy employed, and (3) this persuasiveness may result in incorrect
judgments about the hatefulness of the content. Our study underscores the need
for caution in applying LLM-generated explanations for content moderation. Code
and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.Comment: 9 pages, 2 figures, Accepted by International Joint Conference on
Artificial Intelligence(IJCAI
- …