2 research outputs found
A Chinese Dataset with Negative Full Forms for General Abbreviation Prediction
Abbreviation is a common phenomenon across languages, especially in Chinese.
In most cases, if an expression can be abbreviated, its abbreviation is used
more often than its fully expanded forms, since people tend to convey
information in a most concise way. For various language processing tasks,
abbreviation is an obstacle to improving the performance, as the textual form
of an abbreviation does not express useful information, unless it's expanded to
the full form. Abbreviation prediction means associating the fully expanded
forms with their abbreviations. However, due to the deficiency in the
abbreviation corpora, such a task is limited in current studies, especially
considering general abbreviation prediction should also include those full form
expressions that do not have valid abbreviations, namely the negative full
forms (NFFs). Corpora incorporating negative full forms for general
abbreviation prediction are few in number. In order to promote the research in
this area, we build a dataset for general Chinese abbreviation prediction,
which needs a few preprocessing steps, and evaluate several different models on
the built dataset. The dataset is available at
https://github.com/lancopku/Chinese-abbreviation-datase
End to End Chinese Lexical Fusion Recognition with Sememe Knowledge
In this paper, we present Chinese lexical fusion recognition, a new task
which could be regarded as one kind of coreference recognition. First, we
introduce the task in detail, showing the relationship with coreference
recognition and differences from the existing tasks. Second, we propose an
end-to-end joint model for the task, which exploits the state-of-the-art BERT
representations as encoder, and is further enhanced with the sememe knowledge
from HowNet by graph attention networks. We manually annotate a benchmark
dataset for the task and then conduct experiments on it. Results demonstrate
that our joint model is effective and competitive for the task. Detailed
analysis is offered for comprehensively understanding the new task and our
proposed model