2 research outputs found
Partially-Typed NER Datasets Integration: Connecting Practice to Theory
While typical named entity recognition (NER) models require the training set
to be annotated with all target types, each available datasets may only cover a
part of them. Instead of relying on fully-typed NER datasets, many efforts have
been made to leverage multiple partially-typed ones for training and allow the
resulting model to cover a full type set. However, there is neither guarantee
on the quality of integrated datasets, nor guidance on the design of training
algorithms. Here, we conduct a systematic analysis and comparison between
partially-typed NER datasets and fully-typed ones, in both theoretical and
empirical manner. Firstly, we derive a bound to establish that models trained
with partially-typed annotations can reach a similar performance with the ones
trained with fully-typed annotations, which also provides guidance on the
algorithm design. Moreover, we conduct controlled experiments, which shows
partially-typed datasets leads to similar performance with the model trained
with the same amount of fully-typed annotationsComment: Work in progres
One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets
Named entity recognition (NER) is a fundamental component in the modern
language understanding pipeline. Public NER resources such as annotated data
and model services are available in many domains. However, given a particular
downstream application, there is often no single NER resource that supports all
the desired entity types, so users must leverage multiple resources with
different tag sets. This paper presents a marginal distillation (MARDI)
approach for training a unified NER model from resources with disjoint or
heterogeneous tag sets. In contrast to recent works, MARDI merely requires
access to pre-trained models rather than the original training datasets. This
flexibility makes it easier to work with sensitive domains like healthcare and
finance. Furthermore, our approach is general enough to integrate with
different NER architectures, including local models (e.g., BiLSTM) and global
models (e.g., CRF). Experiments on two benchmark datasets show that MARDI
performs on par with a strong marginal CRF baseline, while being more flexible
in the form of required NER resources. MARDI also sets a new state of the art
on the progressive NER task. MARDI significantly outperforms the
start-of-the-art model on the task of progressive NER.Comment: 9 pages, LaTeX; column header of Table 2 correcte