251 research outputs found
MILDSum: A Novel Benchmark Dataset for Multilingual Summarization of Indian Legal Case Judgments
Automatic summarization of legal case judgments is a practically important
problem that has attracted substantial research efforts in many countries. In
the context of the Indian judiciary, there is an additional complexity --
Indian legal case judgments are mostly written in complex English, but a
significant portion of India's population lacks command of the English
language. Hence, it is crucial to summarize the legal documents in Indian
languages to ensure equitable access to justice. While prior research primarily
focuses on summarizing legal case judgments in their source languages, this
study presents a pioneering effort toward cross-lingual summarization of
English legal documents into Hindi, the most frequently spoken Indian language.
We construct the first high-quality legal corpus comprising of 3,122 case
judgments from prominent Indian courts in English, along with their summaries
in both English and Hindi, drafted by legal practitioners. We benchmark the
performance of several diverse summarization approaches on our corpus and
demonstrate the need for further research in cross-lingual summarization in the
legal domain.Comment: Accepted at EMNLP 2023 (Main Conference
ํธ๋์คํฌ๋จธ๋ฅผ ์ด์ฉํ ํค์๋ ๋ฐ์ ์ถ์ถ ์์ฝ์ ๋ํ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ํ๋๊ณผ์ ๋ฐ์ด์ค์์ง๋์ด๋ง์ ๊ณต, 2023. 2. ์ต์ง์ฑ.Text summarization is well-known as a representative task in natural language processing. Text summarization methods generate brief written summaries of documents such as journal articles. In recent years, the performance of text summarization methods has improved significantly with the development of pretrained language models based on Transformer architectures such as BERT and GPT 3.
Recently, the development of language models designed to generate controllable output based on user preferences has attracted considerable attention as a topic of active research. Controllable summarization methods such as query-focused or aspect-oriented summarization techniques have also emerged as promising
approaches. In particular, aspect-oriented summarization generates a summary in terms of specific aspects provided as user input.
In this study, we propose a method to improve the performance of an aspect-oriented extractive summarization model presented in a previous work. The proposed method helps the model to generate
aspect-oriented summaries by reflecting the relevance between sentence features and keyword features representing the aspect. To evaluate the performance of the proposed method, we constructed a new dataset consisting of articles on COVID-19 labeled in terms of two aspects: Trend and Action. The results showed that our proposed method outperformed a baseline model on the new dataset.
The proposed method exhibited higher performance than the baseline by roughly 3.6โ4.3% in terms of Trend, and showed a relatively low impact with an improvement of less than 1% in terms of Action. However, in both aspects, we observed that even incorrect sentences included in a generated summary tended to be related to the defined aspect. Thus, we demonstrate that the proposed method generated more aspect-oriented summaries with content relevant to the defined aspect.ํ
์คํธ ์์ฝ(Text Summarization)์ ์์ฐ์ด ์ฒ๋ฆฌ ๋ถ์ผ์ ๋ํ์ ์ธ ์์
์ค ํ๋์ด๋ค. ํ
์คํธ ์์ฝ์ ๋ชฉ์ ์ ์ ๋ฌธ ๊ธฐ์ฌ์
๊ฐ์ ๋ฌธ์๋ฅผ ๊ฐ๊ฒฐํ์ง๋ง, ํต์ฌ์ ์ธ ๋ด์ฉ์ ์ค์ฌ์ผ๋ก ์์ฝํ๋ ๊ฒ์ด๋ค. BERT, GPT-3์ ๊ฐ์ ํธ๋์คํฌ๋จธ ๊ธฐ๋ฐ์ ์ฌ์ ํ์ต ๋ชจ๋ธ๋ค์ด ๊ฐ๋ฐ๋จ์ ๋ฐ๋ผ, ์์ฝ ๋ชจ๋ธ์ ์ฑ๋ฅ์ด ํฌ๊ฒ ํฅ์๋์๋ค.
์ต๊ทผ์๋ ์ฌ์ฉ์์ ๋ชฉ์ ํน์ ์ ํธ๋๋ฅผ ๋ฐ์ํ์ฌ ์ถ๋ ฅ์ ์์ฑํ๋ ์ธ์ด ๋ชจ๋ธ์ ๊ฐ๋ฐํ๊ธฐ ์ํด ๋ง์ ์ฐ๊ตฌ๋ค์ด ์งํ๋๊ณ ์๋ค. ํ
์คํธ ์์ฝ ๋ถ์ผ์์๋ ์ด๋ฌํ ํ๋ฆ์ ๋ฐ๋ผ ์ฟผ๋ฆฌ ์ค์ฌ(Query focused) ํน์ ์ธก๋ฉด ์ค์ฌ(Aspect oriented) ์์ฝ๊ณผ ๊ฐ์ด ์ ์ด ๊ฐ๋ฅํ ์์ฝ๋ฌธ ์์ฑ์ ๋ํ ์ฐ๊ตฌ๋ค์ด ๋ฑ์ฅํ๊ณ ์๋ค. ์ธก๋ฉด ์ค์ฌ ์์ฝ(Aspect oriented)์ ์ฌ์ฉ์๊ฐ ์๊ณ ์ถ์ ํน์ ์ธก๋ฉด์ ๋ํด์ ์์ฝ๋ฌธ์ ์์ฑํ๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค.
๋ณธ ๋
ผ๋ฌธ์์๋ ์ ํ ์ฐ๊ตฌ์์ ์ ์ํ ์ธก๋ฉด ์ค์ฌ ์์ฝ ๋ชจ๋ธ์ ์ฑ๋ฅ ํฅ์์ ์ํ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ์ ์๋ ๋ฐฉ๋ฒ์ ๋ฌธ์ฅ์ ํํ ๋ฒกํฐ์ ์ธก๋ฉด์ ๋ํํ๋ ํค์๋ ํํ ๋ฒกํฐ๋ค ์ฌ์ด์ ์ฐ๊ด์ฑ์ ๊ธฐ์กด์ ๋ฌธ์ฅ ํํ ๋ฒกํฐ์ ๋ฐ์ํจ์ผ๋ก์จ ๋ชจ๋ธ์ด ์ธก๋ฉด๊ณผ ๊ด๋ จ๋ ์์ฝ๋ฌธ์ ์์ฑํ๋๋ก ํ๋ค. ํ๊ฐ๋ฅผ ์ํด์, ๋ฐ์ ํํฉ๊ณผ ๊ด๋ จ ๋์์ด๋ผ๋ ๋ ๊ฐ์ง ์ธก๋ฉด์ ๊ฐ์ง๋ COVID-19 ๊ด๋ จ ๊ธฐ์ฌ๋ก ๊ตฌ์ฑ๋ ์๋ก์ด ๋ฐ์ดํฐ์
์ ๊ตฌ์ถํ์๋ค. ์ ์๋ ๋ฐฉ๋ฒ๋ค์ ์๋ก์ด ๋ฐ์ดํฐ์
์ ๋ํ์ฌ ๊ธฐ์กด ๋ชจ๋ธ๋ณด๋ค ๋ ์ข์ ์ฑ๋ฅ์
๋ณด์ฌ์ฃผ์๋ค.
์ ์๋ ๋ฐฉ๋ฒ์ ๋ฐ์ ํํฉ ์ธก๋ฉด์์๋ 3.6~4.3%๋ก ๋์ ์ฑ๋ฅ ํฅ์์ ๊ฐ์ ธ์์ผ๋ฉฐ, ๊ด๋ จ ๋์ ์ธก๋ฉด์์๋ 1%๋ฏธ๋ง์ ํฅ์์ผ๋ก, ๋น๊ต์ ๋ฎ์ ํจ๊ณผ๋ฅผ ๋ณด์ฌ์ฃผ์๋ค. ํ์ง๋ง ๋ ์ธก๋ฉด ๋ชจ๋์์ ์ค๋ต์ด๋ผ ํ๋๋ผ๋ ์ธก๋ฉด๊ณผ ๊ด๋ จ๋ ๋ฌธ์ฅ์ ์ ํํ๋ ๊ฒ์ ๊ด์ฐฐํ๋ค. ์ด๋ฅผ ํตํด, ์ ์๋ ๋ฐฉ๋ฒ์ด ๋ชจ๋ธ์ ์ธก๋ฉด ์งํฅ ์์ฝ์ ๋์์ ์ฃผ์์์ ํ์ธํ ์ ์์๋ค.1. Introduction 1
1.1. Background 1
1.2. Task Description 3
1.2.1. Text Summarization 3
1.2.2. Aspect Oriented Summarization 4
2. Related Works 7
2.1. Extractive Summarization 7
2.2. Aspect Oriented Summarization 10
2.3. AOSUMM 11
3. Materials and Method 13
3.1. Dataset for Training 13
3.2. Dataset for Evaluation 13
3.2.1. Aspect Definition 14
3.2.2. Annotation 15
3.3. Evaluation Metric 17
3.4. Keyword Selection 19
3.5. Method 21
3.5.1. Extraction of keywords feature 23
3.5.2. Relevance Score 25
3.5.3. Proposed Method 27
4. Results and Discussion 29
4.1. Experiment Settings 29
4.2. Results 30
4.2.1. Automatic Evaluation 30
4.2.2. Qualitative Evaluation 35
4.3. Discussion 38
5. Conclusion 41
Bibliography 43
Appendix 45
Abstract in Korean 51์
GreekT5: A Series of Greek Sequence-to-Sequence Models for News Summarization
Text summarization (TS) is a natural language processing (NLP) subtask
pertaining to the automatic formulation of a concise and coherent summary that
covers the major concepts and topics from one or multiple documents. Recent
advancements in deep learning have led to the development of abstractive
summarization transformer-based models, which outperform classical approaches.
In any case, research in this field focuses on high resource languages such as
English, while the corresponding work for low resource languages is still
underdeveloped. Taking the above into account, this paper proposes a series of
novel TS models for Greek news articles. The proposed models were thoroughly
evaluated on the same dataset against GreekBART, which is the state-of-the-art
model in Greek abstractive news summarization. Our evaluation results reveal
that most of the proposed models significantly outperform GreekBART on various
evaluation metrics. We make our evaluation code public, aiming to increase the
reproducibility of this work and facilitate future research in the field.Comment: 26 pages, 0 figure
How Ready are Pre-trained Abstractive Models and LLMs for Legal Case Judgement Summarization?
Automatic summarization of legal case judgements has traditionally been
attempted by using extractive summarization methods. However, in recent years,
abstractive summarization models are gaining popularity since they can generate
more natural and coherent summaries. Legal domain-specific pre-trained
abstractive summarization models are now available. Moreover, general-domain
pre-trained Large Language Models (LLMs), such as ChatGPT, are known to
generate high-quality text and have the capacity for text summarization. Hence
it is natural to ask if these models are ready for off-the-shelf application to
automatically generate abstractive summaries for case judgements. To explore
this question, we apply several state-of-the-art domain-specific abstractive
summarization models and general-domain LLMs on Indian court case judgements,
and check the quality of the generated summaries. In addition to standard
metrics for summary quality, we check for inconsistencies and hallucinations in
the summaries. We see that abstractive summarization models generally achieve
slightly higher scores than extractive models in terms of standard summary
evaluation metrics such as ROUGE and BLEU. However, we often find inconsistent
or hallucinated information in the generated abstractive summaries. Overall,
our investigation indicates that the pre-trained abstractive summarization
models and LLMs are not yet ready for fully automatic deployment for case
judgement summarization; rather a human-in-the-loop approach including manual
checks for inconsistencies is more suitable at present.Comment: Accepted at the 3rd Workshop on Artificial Intelligence and
Intelligent Assistance for Legal Professionals in the Digital Workplace
(LegalAIIA 2023), in conjunction with the ICAIL 2023 conferenc
Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
In long document controllable summarization, where labeled data is scarce,
pretrained models struggle to adapt to the task and effectively respond to user
queries. In this paper, we introduce Socratic pretraining, a question-driven,
unsupervised pretraining objective specifically designed to improve
controllability in summarization tasks. By training a model to generate and
answer relevant questions in a given context, Socratic pretraining enables the
model to more effectively adhere to user-provided queries and identify relevant
content to be summarized. We demonstrate the effectiveness of this approach
through extensive experimentation on two summarization domains, short stories
and dialogue, and multiple control strategies: keywords, questions, and factoid
QA pairs. Our pretraining method relies only on unlabeled documents and a
question generation system and outperforms pre-finetuning approaches that use
additional supervised data. Furthermore, our results show that Socratic
pretraining cuts task-specific labeled data requirements in half, is more
faithful to user-provided queries, and achieves state-of-the-art performance on
QMSum and SQuALITY.Comment: To appear at ACL 202
Scientific Opinion Summarization: Meta-review Generation with Checklist-guided Iterative Introspection
Opinions in the scientific domain can be divergent, leading to controversy or
consensus among reviewers. However, current opinion summarization datasets
mostly focus on product review domains, which do not account for this
variability under the assumption that the input opinions are non-controversial.
To address this gap, we propose the task of scientific opinion summarization,
where research paper reviews are synthesized into meta-reviews. To facilitate
this task, we introduce a new ORSUM dataset covering 10,989 paper meta-reviews
and 40,903 paper reviews from 39 conferences. Furthermore, we propose the
Checklist-guided Iterative Introspection (CGI) approach, which breaks down
the task into several stages and iteratively refines the summary under the
guidance of questions from a checklist. We conclude that (1) human-written
summaries are not always reliable since many do not follow the guidelines, and
(2) the combination of task decomposition and iterative self-refinement shows
promising discussion involvement ability and can be applied to other complex
text generation using black-box LLM
SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression
Neural sequence-to-sequence models are currently the dominant approach in
several natural language processing tasks, but require large parallel corpora.
We present a sequence-to-sequence-to-sequence autoencoder (SEQ^3), consisting
of two chained encoder-decoder pairs, with words used as a sequence of discrete
latent variables. We apply the proposed model to unsupervised abstractive
sentence compression, where the first and last sequences are the input and
reconstructed sentences, respectively, while the middle sequence is the
compressed sentence. Constraining the length of the latent word sequences
forces the model to distill important information from the input. A pretrained
language model, acting as a prior over the latent sequences, encourages the
compressed sentences to be human-readable. Continuous relaxations enable us to
sample from categorical distributions, allowing gradient-based optimization,
unlike alternatives that rely on reinforcement learning. The proposed model
does not require parallel text-summary pairs, achieving promising results in
unsupervised sentence compression on benchmark datasets.Comment: Accepted to NAACL 201
- โฆ