3,633 research outputs found
Analysis of Abstractive and Extractive Summarization Methods
This paper explains the existing approaches employed for (automatic) text summarization. The summarizing method is part of the natural language processing (NLP) field and is applied to the source document to produce a compact version that preserves its aggregate meaning and key concepts. On a broader scale, approaches for text-based summarization are categorized into two groups: abstractive and extractive. In abstractive summarization, the main contents of the input text are paraphrased, possibly using vocabulary that is not present in the source document, while in extractive summarization, the output summary is a subset of the input text and is generated by using the sentence ranking technique. In this paper, the main ideas behind the existing methods used for abstractive and extractive summarization are discussed broadly. A comparative study of these methods is also highlighted
UMSL Bulletin 2023-2024
The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp
Multidisciplinary perspectives on Artificial Intelligence and the law
This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio
Explainable text-based features in predictive models of crowdfunding campaigns
Reward-Based Crowdfunding offers an opportunity for innovative ventures that would not be supported through traditional financing. A key problem for those seeking funding is understanding which features of a crowdfunding campaign will sway the decisions of a sufficient number of funders. Predictive models of fund-raising campaigns used in combination with Explainable AI methods promise to provide such insights. However, previous work on Explainable AI has largely focused on quantitative structured data. In this study, our aim is to construct explainable models of human decisions based on analysis of natural language text, thus contributing to a fast-growing body of research on the use of Explainable AI for text analytics. We propose a novel method to construct predictions based on text via semantic clustering of sentences, which, compared with traditional methods using individual words and phrases, allows complex meaning contained in the text to be operationalised. Using experimental evaluation, we compare our proposed method to keyword extraction and topic modelling, which have traditionally been used in similar applications. Our results demonstrate that the sentence clustering method produces features with significant predictive power, compared to keyword-based methods and topic models, but which are much easier to interpret for human raters. We furthermore conduct a SHAP analysis of the models incorporating sentence clusters, demonstrating concrete insights into the types of natural language content that influence the outcome of crowdfunding campaigns
UMSL Bulletin 2022-2023
The 2022-2023 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1087/thumbnail.jp
Low- and high-resource opinion summarization
Customer reviews play a vital role in the online purchasing decisions we make. The reviews
express user opinions that are useful for setting realistic expectations and uncovering important
details about products. However, some products receive hundreds or even thousands of
reviews, making them time-consuming to read. Moreover, many reviews contain uninformative
content, such as irrelevant personal experiences. Automatic summarization offers an
alternative – short text summaries capturing the essential information expressed in reviews.
Automatically produced summaries can reflect overall or particular opinions and be tailored to
user preferences. Besides being presented on major e-commerce platforms, home assistants
can also vocalize them. This approach can improve user satisfaction by assisting in making
faster and better decisions.
Modern summarization approaches are based on neural networks, often requiring thousands of
annotated samples for training. However, human-written summaries for products are expensive
to produce because annotators need to read many reviews. This has led to annotated data
scarcity where only a few datasets are available. Data scarcity is the central theme of our
works, and we propose a number of approaches to alleviate the problem. The thesis consists
of two parts where we discuss low- and high-resource data settings.
In the first part, we propose self-supervised learning methods applied to customer reviews
and few-shot methods for learning from small annotated datasets. Customer reviews without
summaries are available in large quantities, contain a breadth of in-domain specifics, and
provide a powerful training signal. We show that reviews can be used for learning summarizers
via a self-supervised objective. Further, we address two main challenges associated with
learning from small annotated datasets. First, large models rapidly overfit on small datasets
leading to poor generalization. Second, it is not possible to learn a wide range of in-domain
specifics (e.g., product aspects and usage) from a handful of gold samples. This leads to
subtle semantic mistakes in generated summaries, such as ‘great dead on arrival battery.’ We
address the first challenge by explicitly modeling summary properties (e.g., content coverage
and sentiment alignment). Furthermore, we leverage small modules – adapters – that are
more robust to overfitting. As we show, despite their size, these modules can be used to
store in-domain knowledge to reduce semantic mistakes. Lastly, we propose a simple method
for learning personalized summarizers based on aspects, such as ‘price,’ ‘battery life,’ and
‘resolution.’ This task is harder to learn, and we present a few-shot method for training a
query-based summarizer on small annotated datasets.
In the second part, we focus on the high-resource setting and present a large dataset with
summaries collected from various online resources. The dataset has more than 33,000 humanwritten
summaries, where each is linked up to thousands of reviews. This, however, makes it
challenging to apply an ‘expensive’ deep encoder due to memory and computational costs. To
address this problem, we propose selecting small subsets of informative reviews. Only these
subsets are encoded by the deep encoder and subsequently summarized. We show that the
selector and summarizer can be trained end-to-end via amortized inference and policy gradient
methods
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Recent breakthroughs in natural language processing (NLP) have permitted the
synthesis and comprehension of coherent text in an open-ended way, therefore
translating the theoretical algorithms into practical applications. The large
language models (LLMs) have significantly impacted businesses such as report
summarization software and copywriters. Observations indicate, however, that
LLMs may exhibit social prejudice and toxicity, posing ethical and societal
dangers of consequences resulting from irresponsibility. Large-scale benchmarks
for accountable LLMs should consequently be developed. Although several
empirical investigations reveal the existence of a few ethical difficulties in
advanced LLMs, there is little systematic examination and user study of the
risks and harmful behaviors of current LLM usage. To further educate future
efforts on constructing ethical LLMs responsibly, we perform a qualitative
research method called ``red teaming'' on OpenAI's ChatGPT\footnote{In this
paper, ChatGPT refers to the version released on Dec 15th.} to better
understand the practical features of ethical dangers in recent LLMs. We analyze
ChatGPT comprehensively from four perspectives: 1) \textit{Bias} 2)
\textit{Reliability} 3) \textit{Robustness} 4) \textit{Toxicity}. In accordance
with our stated viewpoints, we empirically benchmark ChatGPT on multiple sample
datasets. We find that a significant number of ethical risks cannot be
addressed by existing benchmarks, and hence illustrate them via additional case
studies. In addition, we examine the implications of our findings on AI ethics
and harmal behaviors of ChatGPT, as well as future problems and practical
design considerations for responsible LLMs. We believe that our findings may
give light on future efforts to determine and mitigate the ethical hazards
posed by machines in LLM applications.Comment: Technical Repor
- …