58 research outputs found
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation
Data augmentation is proven to be effective in many NLU tasks, especially for
those suffering from data scarcity. In this paper, we present a powerful and
easy to deploy text augmentation framework, Data Boost, which augments data
through reinforcement learning guided conditional generation. We evaluate Data
Boost on three diverse text classification tasks under five different
classifier architectures. The result shows that Data Boost can boost the
performance of classifiers especially in low-resource data scenarios. For
instance, Data Boost improves F1 for the three tasks by 8.7% on average when
given only 10% of the whole data for training. We also compare Data Boost with
six prior text augmentation methods. Through human evaluations (N=178), we
confirm that Data Boost augmentation has comparable quality as the original
data with respect to readability and class consistency.Comment: In proceedings of the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP 2020). Onlin
NECE: Narrative Event Chain Extraction Toolkit
To understand a narrative, it is essential to comprehend the temporal event
flows, especially those associated with main characters; however, this can be
challenging with lengthy and unstructured narrative texts. To address this, we
introduce NECE, an open-access, document-level toolkit that automatically
extracts and aligns narrative events in the temporal order of their occurrence.
Through extensive evaluations, we show the high quality of the NECE toolkit and
demonstrates its downstream application in analyzing narrative bias regarding
gender. We also openly discuss the shortcomings of the current approach, and
potential of leveraging generative models in future works. Lastly the NECE
toolkit includes both a Python library and a user-friendly web interface, which
offer equal access to professionals and layman audience alike, to visualize
event chain, obtain narrative flows, or study narrative bias
Genome-wide identification of cystathionine beta synthase genes in wheat and its relationship with anther male sterility under heat stress
Cystathionine beta synthase (CBS) domains containing proteins (CDCPs) plays an important role in plant development through regulation of the thioredoxin system, as well as its ability to respond to biotic and abiotic stress conditions. Despite this, no systematic study has examined the wheat CBS gene family and its relation to high temperature-induced male sterility. In this study, 66 CBS family members were identified in the wheat genome, and their gene or protein sequences were used for subsequent analysis. The TaCBS gene family was found to be unevenly distributed on 21 chromosomes, and they were classified into four subgroups according to their gene structure and phylogeny. The results of collinearity analysis showed that there were 25 shared orthologous genes between wheat, rice and Brachypodium distachyon, and one shared orthologous gene between wheat, millet and barley. The cis-regulatory elements of the TaCBS were related to JA, IAA, MYB, etc. GO and KEGG pathway analysis identified these TaCBS genes to be associated with pollination, reproduction, and signaling and cellular processes, respectively. A heatmap of wheat plants based on transcriptome data showed that TaCBS genes were expressed to a higher extent in spikelets relative to other tissues. In addition, 29 putative tae-miRNAs were identified, targeting 41 TaCBS genes. Moreover, qRT-PCR validation of six TaCBS genes indicated their critical role in anther development, as five of them were expressed at lower levels in heat-stressed male sterile anthers than in Normal anthers. Together with anther phenotypes, paraffin sections, starch potassium iodide staining, and qRT-PCR data, we hypothesized that the TaCBS gene has a very important connection with the heat-stressed sterility process in wheat, and these data provide a basis for further insight into their relationship
The Joint Training of Transition-Based AMR Parser
Abstract Meaning Representation(AMR) parsing converts a natural language sentence into a specially designed semantic graph(AMR), which captures the most essential semantic entities and relations of the input sentence. While the recent introduction of pretrained sequence- to-sequence models have brought performance improvement and pipeline simplification, the problem of how to best encode structural information into seq2seq models remains. This exploratory work proposes joint training of transition-based AMR parsers that incorporates not only the parsing objective, but also a denoising objective into training; it seeks to answer whether the improved understanding of structural alignment can benefit sequence- to-sequence AMR parsers. It also shows potential application of the joint-trained models: the joint-training setup can greatly liberate the transition-based parsers from State Machine’s alignment constraints and allow them to be easily repurposed for a set of related tasks that could theoretically benefit from the structural training, such as paraphrase generation and generation from keywords
A Fast Point Clouds Registration Algorithm for Laser Scanners
Point clouds registration is an important step for laser scanner data processing, and there have been numerous methods. However, the existing methods often suffer from low accuracy and low speed when registering large point clouds. To meet this challenge, an improved iterative closest point (ICP) algorithm combining random sample consensus (RANSAC) algorithm, intrinsic shape signatures (ISS), and 3D shape context (3DSC) is proposed. The proposed method firstly uses voxel grid filter for down-sampling. Next, the feature points are extracted by the ISS algorithm and described by the 3DSC. Afterwards, the ISS-3DSC features are used for rough registration with the RANSAC algorithm. Finally, the ICP algorithm is used for accurate registration. The experimental results show that the proposed algorithm has faster registration speed than the compared algorithms, while maintaining high registration accuracy
- …