11 research outputs found

    LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

    Full text link
    Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.Comment: Accepted to the ACL 2020 (demo). The first two authors contributed equally. Project page: http://inklab.usc.edu/leanlife

    Why Attention? Analyze BiLSTM Deficiency and Its Remedies in the Case of NER

    Full text link
    BiLSTM has been prevalently used as a core module for NER in a sequence-labeling setup. State-of-the-art approaches use BiLSTM with additional resources such as gazetteers, language-modeling, or multi-task supervision to further improve NER. This paper instead takes a step back and focuses on analyzing problems of BiLSTM itself and how exactly self-attention can bring improvements. We formally show the limitation of (CRF-)BiLSTM in modeling cross-context patterns for each word -- the XOR limitation. Then, we show that two types of simple cross-structures -- self-attention and Cross-BiLSTM -- can effectively remedy the problem. We test the practical impacts of the deficiency on real-world NER datasets, OntoNotes 5.0 and WNUT 2017, with clear and consistent improvements over the baseline, up to 8.7% on some of the multi-token entity mentions. We give in-depth analyses of the improvements across several aspects of NER, especially the identification of multi-token mentions. This study should lay a sound foundation for future improvements on sequence-labeling NER. (Source codes: https://github.com/jacobvsdanniel/cross-ner)Comment: In proceedings of AAAI 202

    Multi-directional gated recurrent unit and convolutional neural network for load and energy forecasting: A novel hybridization

    Get PDF
    Energy operations and schedules are significantly impacted by load and energy forecasting systems. An effective system is a requirement for a sustainable and equitable environment. Additionally, a trustworthy forecasting management system enhances the resilience of power systems by cutting power and load-forecast flaws. However, due to the numerous inherent nonlinear properties of huge and diverse data, the classical statistical methodology cannot appropriately learn this non-linearity in data. Energy systems can appropriately evaluate data and regulate energy consumption because of advanced techniques. In comparison to machine learning, deep learning techniques have lately been used to predict energy consumption as well as to learn long-term dependencies. In this work, a fusion of novel multi-directional gated recurrent unit (MD-GRU) with convolutional neural network (CNN) using global average pooling (GAP) as hybridization is being proposed for load and energy forecasting. The spatial and temporal aspects, along with the high dimensionality of the data, are addressed by employing the capabilities of MD-GRU and CNN integration. The obtained results are compared to baseline algorithms including CNN, Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (Bi-LSTM), Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). The experimental findings indicate that the proposed approach surpasses conventional approaches in terms of accuracy, Mean Absolute Percentage Error (MAPE), and Root Mean Square Error (RSME).</p> </abstract&gt

    Win-Win Cooperation: Bundling Sequence and Span Models for Named Entity Recognition

    Full text link
    For Named Entity Recognition (NER), sequence labeling-based and span-based paradigms are quite different. Previous research has demonstrated that the two paradigms have clear complementary advantages, but few models have attempted to leverage these advantages in a single NER model as far as we know. In our previous work, we proposed a paradigm known as Bundling Learning (BL) to address the above problem. The BL paradigm bundles the two NER paradigms, enabling NER models to jointly tune their parameters by weighted summing each paradigm's training loss. However, three critical issues remain unresolved: When does BL work? Why does BL work? Can BL enhance the existing state-of-the-art (SOTA) NER models? To address the first two issues, we implement three NER models, involving a sequence labeling-based model--SeqNER, a span-based NER model--SpanNER, and BL-NER that bundles SeqNER and SpanNER together. We draw two conclusions regarding the two issues based on the experimental results on eleven NER datasets from five domains. We then apply BL to five existing SOTA NER models to investigate the third issue, consisting of three sequence labeling-based models and two span-based models. Experimental results indicate that BL consistently enhances their performance, suggesting that it is possible to construct a new SOTA NER system by incorporating BL into the current SOTA system. Moreover, we find that BL reduces both entity boundary and type prediction errors. In addition, we compare two commonly used labeling tagging methods as well as three types of span semantic representations

    Transfer learning for Turkish named entity recognition on noisy text

    Get PDF
    This is an accepted manuscript of an article published by Cambridge University Press in Natural Language Engineering on 28/01/2020, available online: https://doi.org/10.1017/S1351324919000627 The accepted version of the publication may differ from the final published version.© Cambridge University Press 2020. In this article, we investigate using deep neural networks with different word representation techniques for named entity recognition (NER) on Turkish noisy text. We argue that valuable latent features for NER can, in fact, be learned without using any hand-crafted features and/or domain-specific resources such as gazetteers and lexicons. In this regard, we utilize character-level, character n-gram-level, morpheme-level, and orthographic character-level word representations. Since noisy data with NER annotation are scarce for Turkish, we introduce a transfer learning model in order to learn infrequent entity types as an extension to the Bi-LSTM-CRF architecture by incorporating an additional conditional random field (CRF) layer that is trained on a larger (but formal) text and a noisy text simultaneously. This allows us to learn from both formal and informal/noisy text, thus improving the performance of our model further for rarely seen entity types. We experimented on Turkish as a morphologically rich language and English as a relatively morphologically poor language. We obtained an entity-level F1 score of 67.39% on Turkish noisy data and 45.30% on English noisy data, which outperforms the current state-of-art models on noisy text. The English scores are lower compared to Turkish scores because of the intense sparsity in the data introduced by the user writing styles. The results prove that using subword information significantly contributes to learning latent features for morphologically rich languages.Published versio
    corecore