93,637 research outputs found
SECaps: A Sequence Enhanced Capsule Model for Charge Prediction
Automatic charge prediction aims to predict appropriate final charges
according to the fact descriptions for a given criminal case. Automatic charge
prediction plays a critical role in assisting judges and lawyers to improve the
efficiency of legal decisions, and thus has received much attention.
Nevertheless, most existing works on automatic charge prediction perform
adequately on high-frequency charges but are not yet capable of predicting
few-shot charges with limited cases. In this paper, we propose a Sequence
Enhanced Capsule model, dubbed as SECaps model, to relieve this problem.
Specifically, following the work of capsule networks, we propose the seq-caps
layer, which considers sequence information and spatial information of legal
texts simultaneously. Then we design a attention residual unit, which provides
auxiliary information for charge prediction. In addition, our SECaps model
introduces focal loss, which relieves the problem of imbalanced charges.
Comparing the state-of-the-art methods, our SECaps model obtains 4.5% and 6.4%
absolutely considerable improvements under Macro F1 in Criminal-S and
Criminal-L respectively. The experimental results consistently demonstrate the
superiorities and competitiveness of our proposed model.Comment: 13 pages, 3figures, 5 table
Few-Shot and Zero-Shot Learning for Historical Text Normalization
Historical text normalization often relies on small training datasets. Recent
work has shown that multi-task learning can lead to significant improvements by
exploiting synergies with related datasets, but there has been no systematic
study of different multi-task learning architectures. This paper evaluates
63~multi-task learning configurations for sequence-to-sequence-based historical
text normalization across ten datasets from eight languages, using
autoencoding, grapheme-to-phoneme mapping, and lemmatization as auxiliary
tasks. We observe consistent, significant improvements across languages when
training data for the target task is limited, but minimal or no improvements
when training data is abundant. We also show that zero-shot learning
outperforms the simple, but relatively strong, identity baseline.Comment: Accepted at DeepLo-201
- …