93,629 research outputs found

    SECaps: A Sequence Enhanced Capsule Model for Charge Prediction

    Full text link
    Automatic charge prediction aims to predict appropriate final charges according to the fact descriptions for a given criminal case. Automatic charge prediction plays a critical role in assisting judges and lawyers to improve the efficiency of legal decisions, and thus has received much attention. Nevertheless, most existing works on automatic charge prediction perform adequately on high-frequency charges but are not yet capable of predicting few-shot charges with limited cases. In this paper, we propose a Sequence Enhanced Capsule model, dubbed as SECaps model, to relieve this problem. Specifically, following the work of capsule networks, we propose the seq-caps layer, which considers sequence information and spatial information of legal texts simultaneously. Then we design a attention residual unit, which provides auxiliary information for charge prediction. In addition, our SECaps model introduces focal loss, which relieves the problem of imbalanced charges. Comparing the state-of-the-art methods, our SECaps model obtains 4.5% and 6.4% absolutely considerable improvements under Macro F1 in Criminal-S and Criminal-L respectively. The experimental results consistently demonstrate the superiorities and competitiveness of our proposed model.Comment: 13 pages, 3figures, 5 table

    Few-Shot and Zero-Shot Learning for Historical Text Normalization

    Get PDF
    Historical text normalization often relies on small training datasets. Recent work has shown that multi-task learning can lead to significant improvements by exploiting synergies with related datasets, but there has been no systematic study of different multi-task learning architectures. This paper evaluates 63~multi-task learning configurations for sequence-to-sequence-based historical text normalization across ten datasets from eight languages, using autoencoding, grapheme-to-phoneme mapping, and lemmatization as auxiliary tasks. We observe consistent, significant improvements across languages when training data for the target task is limited, but minimal or no improvements when training data is abundant. We also show that zero-shot learning outperforms the simple, but relatively strong, identity baseline.Comment: Accepted at DeepLo-201
    corecore