3,142 research outputs found
Cross-Lingual Zero Pronoun Resolution
In languages like Arabic, Chinese, Italian, Japanese, Korean, Portuguese, Spanish, and many others, predicate arguments in certainsyntactic positions are not realized instead of being realized as overt pronouns, and are thus called zero- or null-pronouns. Identifyingand resolving such omitted arguments is crucial to machine translation, information extraction and other NLP tasks, but depends heavilyonsemanticcoherenceandlexicalrelationships. WeproposeaBERT-basedcross-lingualmodelforzeropronounresolution,andevaluateit on the Arabic and Chinese portions of OntoNotes 5.0. As far as we know, ours is the first neural model of zero-pronoun resolutionfor Arabic; and our model also outperforms the state-of-the-art for Chinese. In the paper we also evaluate BERT feature extraction andfine-tune models on the task, and compare them with our model. We also report on an investigation of BERT layers indicating whichlayer encodes the most suitable representation for the task. Our code is available at https://github.com/amaloraini/cross-lingual-Z
Learning to Check Contract Inconsistencies
Contract consistency is important in ensuring the legal validity of the
contract. In many scenarios, a contract is written by filling the blanks in a
precompiled form. Due to carelessness, two blanks that should be filled with
the same (or different)content may be incorrectly filled with different (or
same) content. This will result in the issue of contract inconsistencies, which
may severely impair the legal validity of the contract. Traditional methods to
address this issue mainly rely on manual contract review, which is
labor-intensive and costly. In this work, we formulate a novel Contract
Inconsistency Checking (CIC) problem, and design an end-to-end framework,
called Pair-wise Blank Resolution (PBR), to solve the CIC problem with high
accuracy. Our PBR model contains a novel BlankCoder to address the challenge of
modeling meaningless blanks. BlankCoder adopts a two-stage attention mechanism
that adequately associates a meaningless blank with its relevant descriptions
while avoiding the incorporation of irrelevant context words. Experiments
conducted on real-world datasets show the promising performance of our method
with a balanced accuracy of 94.05% and an F1 score of 90.90% in the CIC
problem.Comment: Accepted by AAAI 202
A Sequence-to-Sequence Approach for Arabic Pronoun Resolution
This paper proposes a sequence-to-sequence learning approach for Arabic
pronoun resolution, which explores the effectiveness of using advanced natural
language processing (NLP) techniques, specifically Bi-LSTM and the BERT
pre-trained Language Model, in solving the pronoun resolution problem in
Arabic. The proposed approach is evaluated on the AnATAr dataset, and its
performance is compared to several baseline models, including traditional
machine learning models and handcrafted feature-based models. Our results
demonstrate that the proposed model outperforms the baseline models, which
include KNN, logistic regression, and SVM, across all metrics. In addition, we
explore the effectiveness of various modifications to the model, including
concatenating the anaphor text beside the paragraph text as input, adding a
mask to focus on candidate scores, and filtering candidates based on gender and
number agreement with the anaphor. Our results show that these modifications
significantly improve the model's performance, achieving up to 81% on MRR and
71% for F1 score while also demonstrating higher precision, recall, and
accuracy. These findings suggest that the proposed model is an effective
approach to Arabic pronoun resolution and highlights the potential benefits of
leveraging advanced NLP neural models
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
- …