Search CORE

58 research outputs found

Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation

Author: Jia Chenyan
Liu Ruibo
Ma Weicheng
Vosoughi Soroush
Wang Lili
Xu Guangxuan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Data augmentation is proven to be effective in many NLU tasks, especially for those suffering from data scarcity. In this paper, we present a powerful and easy to deploy text augmentation framework, Data Boost, which augments data through reinforcement learning guided conditional generation. We evaluate Data Boost on three diverse text classification tasks under five different classifier architectures. The result shows that Data Boost can boost the performance of classifiers especially in low-resource data scenarios. For instance, Data Boost improves F1 for the three tasks by 8.7% on average when given only 10% of the whole data for training. We also compare Data Boost with six prior text augmentation methods. Through human evaluations (N=178), we confirm that Data Boost augmentation has comparable quality as the original data with respect to readability and class consistency.Comment: In proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Onlin

arXiv.org e-Print Archive

Crossref

NECE: Narrative Event Chain Extraction Toolkit

Author: Adebiyi Aminat
Hou Yufang
Isaza Paulina Toro
Li Moshi
Oloko Akintoye
Peng Nanyun
Sanctos Cassia
Wang Dakuo
Xu Guangxuan
Yao Bingsheng
Publication venue
Publication date: 14/08/2023
Field of study

To understand a narrative, it is essential to comprehend the temporal event flows, especially those associated with main characters; however, this can be challenging with lengthy and unstructured narrative texts. To address this, we introduce NECE, an open-access, document-level toolkit that automatically extracts and aligns narrative events in the temporal order of their occurrence. Through extensive evaluations, we show the high quality of the NECE toolkit and demonstrates its downstream application in analyzing narrative bias regarding gender. We also openly discuss the shortcomings of the current approach, and potential of leveraging generative models in future works. Lastly the NECE toolkit includes both a Python library and a user-friendly web interface, which offer equal access to professionals and layman audience alike, to visualize event chain, obtain narrative flows, or study narrative bias

arXiv.org e-Print Archive

Genome-wide identification of cystathionine beta synthase genes in wheat and its relationship with anther male sterility under heat stress

Author: Fuli Zhang
Fuli Zhang
Guangxuan Tan
Hongzhan Liu
Kedong Xu
Lili Li
Lili Li
Liuyong Xie
Qi Wang
Xianle Ruan
Publication venue: 'Frontiers Media SA'
Publication date: 01/12/2022
Field of study

Cystathionine beta synthase (CBS) domains containing proteins (CDCPs) plays an important role in plant development through regulation of the thioredoxin system, as well as its ability to respond to biotic and abiotic stress conditions. Despite this, no systematic study has examined the wheat CBS gene family and its relation to high temperature-induced male sterility. In this study, 66 CBS family members were identified in the wheat genome, and their gene or protein sequences were used for subsequent analysis. The TaCBS gene family was found to be unevenly distributed on 21 chromosomes, and they were classified into four subgroups according to their gene structure and phylogeny. The results of collinearity analysis showed that there were 25 shared orthologous genes between wheat, rice and Brachypodium distachyon, and one shared orthologous gene between wheat, millet and barley. The cis-regulatory elements of the TaCBS were related to JA, IAA, MYB, etc. GO and KEGG pathway analysis identified these TaCBS genes to be associated with pollination, reproduction, and signaling and cellular processes, respectively. A heatmap of wheat plants based on transcriptome data showed that TaCBS genes were expressed to a higher extent in spikelets relative to other tissues. In addition, 29 putative tae-miRNAs were identified, targeting 41 TaCBS genes. Moreover, qRT-PCR validation of six TaCBS genes indicated their critical role in anther development, as five of them were expressed at lower levels in heat-stressed male sterile anthers than in Normal anthers. Together with anther phenotypes, paraffin sections, starch potassium iodide staining, and qRT-PCR data, we hypothesized that the TaCBS gene has a very important connection with the heat-stressed sterility process in wheat, and these data provide a basis for further insight into their relationship

Directory of Open Access Journals

The spatial distribution characteristics of soil salinity in coastal zone of the Yellow River Delta

Author: A Basile
A Hussein
A Lax
AB Arun
AX Cai
B Li
B Williams
BK Khosla
Bo Guan
BR Spies
C Ben Ahmed
C Huang
C Shi
CW Team
D Liu
Di Zhou
DL Corwin
DN Rietz
E Christen
G Liu
GI Metternicht
GJ Hoffman
Guangmei Wang
Guangxuan Han
H Fang
Huifeng Wu
I Celik
J Akhter
J Leeuw de
J Lei
J Yu
J Yu
JD Rhoades
JD Rhoades
JD Rhoades
JD Rhoades
JE Northey
Jihong Wang
JR Thomas
Junbao Yu
K Johnston
Kai Ning
L Zhang
M Cetin
M Cetin
M Ren
M Yang
M Zarroca
MK Mondal
MR Sampford
PT Cedfeldt
Q Liu
Q Liu
Q Liu
Q Wang
Q Ye
QE Guo
R Chhabra
R Yao
R Yao
RL Dehaan
S Wang
T Zhang
X Fan
X Xu
X Zhang
X Zhao
Y Guan
Y Guan
Y Guan
Y Li
Y Liu
Y Xue
Yunzhao Li
Yuqin Fu
Z Wu
Z Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Toward a general evaluation model for soil respiration (GEMSR)

Author: A. R. Townsend
A. Tufekcioglu
B. K. Northup
B. R. Jia
B. R. Jia
B. R. Jia
B. R. Jia
B. R. Jia
BingRui Jia
C. B. Zhang
C. Fang
C. L. Kucera
D. M. Eissenstat
D. R. Bryla
D. S. Schimel
E. A. Davidson
E. A. Davidson
E. D. Sotta
F. Y. Wang
G. A. Buyanovsky
G. H. Lu
G. M. Berntson
G. P. Yang
G. S. Chen
G. S. Zhou
G. X. Han
G. X. Han
GuangSheng Zhou
GuangXuan Han
H. Keith
H. S. Liu
J. B. Zhao
J. Cavelier
J. D. Reeder
J. F. Espeleta
J. J. Reinke
J. Lloyd
J. M. Craine
J. S. Amthor
K. H. Cui
K. M. Peterson
L. F. Yang
L. H. Li
L. Meng
Li Zhou
M. S. Lee
M. Xu
O. K. Atkin
P. Rochette
Q. S. Chen
R. A. Chimner
R. E. Wildung
R. F. Keeling
S. C. Wofsy
S. F. Oberbauer
S. Kang
S. Q. Chen
T. J. Bouma
T. J. Bouma
W. A. Reiners
W. Wang
X. Wand
X. Wang
X. Wang
Y. B. Xie
Y. Dong
Y. L. Jiang
Y. Liu
Y. P. Zhuge
Y. S. Dong
Y. S. Yang
Z. Q. Chang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The Joint Training of Transition-Based AMR Parser

Author: Xu Guangxuan
Publication venue
Publication date: 01/01/2022
Field of study

Abstract Meaning Representation(AMR) parsing converts a natural language sentence into a specially designed semantic graph(AMR), which captures the most essential semantic entities and relations of the input sentence. While the recent introduction of pretrained sequence- to-sequence models have brought performance improvement and pipeline simplification, the problem of how to best encode structural information into seq2seq models remains. This exploratory work proposes joint training of transition-based AMR parsers that incorporates not only the parsing objective, but also a denoising objective into training; it seeks to answer whether the improved understanding of structural alignment can benefit sequence- to-sequence AMR parsers. It also shows potential application of the joint-trained models: the joint-training setup can greatly liberate the transition-based parsers from State Machine’s alignment constraints and allow them to be easily repurposed for a set of related tasks that could theoretically benefit from the structural training, such as paraphrase generation and generation from keywords

Ezid

eScholarship - University of California

The Joint Training of Transition-Based AMR Parser

Author: Xu Guangxuan
Publication venue
Publication date: 09/12/2022
Field of study

Ezid

A Fast Point Clouds Registration Algorithm for Laser Scanners

Author: Guangxuan Xu
Publication venue: 'MDPI AG'
Publication date: 12/04/2021
Field of study

Point clouds registration is an important step for laser scanner data processing, and there have been numerous methods. However, the existing methods often suffer from low accuracy and low speed when registering large point clouds. To meet this challenge, an improved iterative closest point (ICP) algorithm combining random sample consensus (RANSAC) algorithm, intrinsic shape signatures (ISS), and 3D shape context (3DSC) is proposed. The proposed method firstly uses voxel grid filter for down-sampling. Next, the feature points are extracted by the ISS algorithm and described by the 3DSC. Afterwards, the ISS-3DSC features are used for rough registration with the RANSAC algorithm. Finally, the ICP algorithm is used for accurate registration. The experimental results show that the proposed algorithm has faster registration speed than the compared algorithms, while maintaining high registration accuracy

Multidisciplinary Digital Publishing Institute