144 research outputs found

    Bipartite Flat-Graph Network for Nested Named Entity Recognition

    Full text link
    In this paper, we propose a novel bipartite flat-graph network (BiFlaG) for nested named entity recognition (NER), which contains two subgraph modules: a flat NER module for outermost entities and a graph module for all the entities located in inner layers. Bidirectional LSTM (BiLSTM) and graph convolutional network (GCN) are adopted to jointly learn flat entities and their inner dependencies. Different from previous models, which only consider the unidirectional delivery of information from innermost layers to outer ones (or outside-to-inside), our model effectively captures the bidirectional interaction between them. We first use the entities recognized by the flat NER module to construct an entity graph, which is fed to the next graph module. The richer representation learned from graph module carries the dependencies of inner entities and can be exploited to improve outermost entity predictions. Experimental results on three standard nested NER datasets demonstrate that our BiFlaG outperforms previous state-of-the-art models.Comment: Accepted by ACL202

    A Boundary-aware Neural Model for Nested Named Entity Recognition

    Full text link

    S2F-NER: Exploring Sequence-to-Forest Generation for Complex Entity Recognition

    Full text link
    Named Entity Recognition (NER) remains challenging due to the complex entities, like nested, overlapping, and discontinuous entities. Existing approaches, such as sequence-to-sequence (Seq2Seq) generation and span-based classification, have shown impressive performance on various NER subtasks, but they are difficult to scale to datasets with longer input text because of either exposure bias issue or inefficient computation. In this paper, we propose a novel Sequence-to-Forest generation paradigm, S2F-NER, which can directly extract entities in sentence via a Forest decoder that decode multiple entities in parallel rather than sequentially. Specifically, our model generate each path of each tree in forest autoregressively, where the maximum depth of each tree is three (which is the shortest feasible length for complex NER and is far smaller than the decoding length of Seq2Seq). Based on this novel paradigm, our model can elegantly mitigates the exposure bias problem and keep the simplicity of Seq2Seq. Experimental results show that our model significantly outperforms the baselines on three discontinuous NER datasets and on two nested NER datasets, especially for discontinuous entity recognition

    Cell line name recognition in support of the identification of synthetic lethality in cancer from text

    Get PDF
    Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers

    Named Entity Recognition as Dependency Parsing

    Get PDF
    Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in [Bank of [China]] (Finkel and Manning, 2009). In this paper, we use ideas from graph-based dependency parsing to provide our model a global view on the input via a biaffine model (Dozat and Manning, 2017). The biaffine model scores pairs of start and end tokens in a sentence which we use to explore all spans, so that the model is able to predict named entities accurately. We show that the model works well for both nested and flat NER through evaluation on 8 corpora and achieving SoTA performance on all of them, with accuracy gains of up to 2.2 percentage points
    • …
    corecore