58 research outputs found

    Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

    Full text link
    Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.Comment: In proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Onlin

    NECE: Narrative Event Chain Extraction Toolkit

    Full text link
    To understand a narrative, it is essential to comprehend the temporal event flows, especially those associated with main characters; however, this can be challenging with lengthy and unstructured narrative texts. To address this, we introduce NECE, an open-access, document-level toolkit that automatically extracts and aligns narrative events in the temporal order of their occurrence. Through extensive evaluations, we show the high quality of the NECE toolkit and demonstrates its downstream application in analyzing narrative bias regarding gender. We also openly discuss the shortcomings of the current approach, and potential of leveraging generative models in future works. Lastly the NECE toolkit includes both a Python library and a user-friendly web interface, which offer equal access to professionals and layman audience alike, to visualize event chain, obtain narrative flows, or study narrative bias

    Genome-wide identification of cystathionine beta synthase genes in wheat and its relationship with anther male sterility under heat stress

    Get PDF
    Cystathionine beta synthase (CBS) domains containing proteins (CDCPs) plays an important role in plant development through regulation of the thioredoxin system, as well as its ability to respond to biotic and abiotic stress conditions. Despite this, no systematic study has examined the wheat CBS gene family and its relation to high temperature-induced male sterility. In this study, 66 CBS family members were identified in the wheat genome, and their gene or protein sequences were used for subsequent analysis. The TaCBS gene family was found to be unevenly distributed on 21 chromosomes, and they were classified into four subgroups according to their gene structure and phylogeny. The results of collinearity analysis showed that there were 25 shared orthologous genes between wheat, rice and Brachypodium distachyon, and one shared orthologous gene between wheat, millet and barley. The cis-regulatory elements of the TaCBS were related to JA, IAA, MYB, etc. GO and KEGG pathway analysis identified these TaCBS genes to be associated with pollination, reproduction, and signaling and cellular processes, respectively. A heatmap of wheat plants based on transcriptome data showed that TaCBS genes were expressed to a higher extent in spikelets relative to other tissues. In addition, 29 putative tae-miRNAs were identified, targeting 41 TaCBS genes. Moreover, qRT-PCR validation of six TaCBS genes indicated their critical role in anther development, as five of them were expressed at lower levels in heat-stressed male sterile anthers than in Normal anthers. Together with anther phenotypes, paraffin sections, starch potassium iodide staining, and qRT-PCR data, we hypothesized that the TaCBS gene has a very important connection with the heat-stressed sterility process in wheat, and these data provide a basis for further insight into their relationship

    The Joint Training of Transition-Based AMR Parser

    No full text
    Abstract Meaning Representation(AMR) parsing converts a natural language sentence into a specially designed semantic graph(AMR), which captures the most essential semantic entities and relations of the input sentence. While the recent introduction of pretrained sequence- to-sequence models have brought performance improvement and pipeline simplification, the problem of how to best encode structural information into seq2seq models remains. This exploratory work proposes joint training of transition-based AMR parsers that incorporates not only the parsing objective, but also a denoising objective into training; it seeks to answer whether the improved understanding of structural alignment can benefit sequence- to-sequence AMR parsers. It also shows potential application of the joint-trained models: the joint-training setup can greatly liberate the transition-based parsers from State Machine’s alignment constraints and allow them to be easily repurposed for a set of related tasks that could theoretically benefit from the structural training, such as paraphrase generation and generation from keywords

    The Joint Training of Transition-Based AMR Parser

    No full text

    A Fast Point Clouds Registration Algorithm for Laser Scanners

    No full text
    Point clouds registration is an important step for laser scanner data processing, and there have been numerous methods. However, the existing methods often suffer from low accuracy and low speed when registering large point clouds. To meet this challenge, an improved iterative closest point (ICP) algorithm combining random sample consensus (RANSAC) algorithm, intrinsic shape signatures (ISS), and 3D shape context (3DSC) is proposed. The proposed method firstly uses voxel grid filter for down-sampling. Next, the feature points are extracted by the ISS algorithm and described by the 3DSC. Afterwards, the ISS-3DSC features are used for rough registration with the RANSAC algorithm. Finally, the ICP algorithm is used for accurate registration. The experimental results show that the proposed algorithm has faster registration speed than the compared algorithms, while maintaining high registration accuracy
    • …
    corecore