18 research outputs found
Improving Coreference Resolution by Leveraging Entity-Centric Features with Graph Neural Networks and Second-order Inference
One of the major challenges in coreference resolution is how to make use of
entity-level features defined over clusters of mentions rather than mention
pairs. However, coreferent mentions usually spread far apart in an entire text,
which makes it extremely difficult to incorporate entity-level features. We
propose a graph neural network-based coreference resolution method that can
capture the entity-centric information by encouraging the sharing of features
across all mentions that probably refer to the same real-world entity. Mentions
are linked to each other via the edges modeling how likely two linked mentions
point to the same entity. Modeling by such graphs, the features between
mentions can be shared by message passing operations in an entity-centric
manner. A global inference algorithm up to second-order features is also
presented to optimally cluster mentions into consistent groups. Experimental
results show our graph neural network-based method combing with the
second-order decoding algorithm (named GNNCR) achieved close to
state-of-the-art performance on the English CoNLL-2012 Shared Task dataset
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback
Automatically evaluating the quality of language generation is critical.
Although recent learned metrics show high correlation with human judgement,
these metrics can not explain their verdict or associate the scores with
defects in generated text. To address this limitation, we present
InstructScore, an explainable evaluation metric for text generation. By
harnessing both explicit human instruction and the implicit knowledge of GPT-4,
we fine-tune a text evaluation metric based on LLaMA, producing both a score
for generated text and a human readable diagnostic report. We evaluate
InstructScore on a variety of generation tasks, including translation,
captioning, data-to-text and commonsense generation. Experiments show that our
7B model surpasses all other unsupervised metrics, including those based on
175B GPT-3 and GPT-4. Surprisingly, our InstructScore, even without direct
supervision from human-rated data, achieves performance levels on par with
state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.Comment: Accepted to EMNLP2023 Main Conferenc
Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions
How can large language models (LLMs) process and translate endangered
languages? Many languages lack a large corpus to train a decent LLM; therefore
existing LLMs rarely perform well in unseen, endangered languages. On the
contrary, we observe that 2000 endangered languages, though without a large
corpus, have a grammar book or a dictionary. We propose LINGOLLM, a
training-free approach to enable an LLM to process unseen languages that hardly
occur in its pre-training. Our key insight is to demonstrate linguistic
knowledge of an unseen language in an LLM's prompt, including a dictionary, a
grammar book, and morphologically analyzed input text. We implement LINGOLLM on
top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks
across 8 endangered or low-resource languages. Our results show that LINGOLLM
elevates translation capability from GPT-4's 0 to 10.5 BLEU for 10 language
directions. Our findings demonstrate the tremendous value of linguistic
knowledge in the age of LLMs for endangered languages. Our data, code, and
model generations can be found at https://github.com/LLiLab/llm4endangeredlang
Curative efficacy of entomopathogenic nematodes against white grubs in honeysuckle fields
Root-feeding white grubs are one of the most serious pests of honeysuckle trees (Lonicera japonica) in China, directly damaging their roots and facilitating infection by soil pathogens. Entomopathogenic nematodes (EPNs) are considered as potential control agents against soil-dwelling insect pests. This study aimed to identify effective EPN species against white grubs through bioassay and field experiments. Among the EPN species screened against Holotrichia oblita under laboratory conditions, Steinernema feltiae and Heterorhabditis indica had low virulence, while S. longicaudum, S. glaseri, and H. bacteriophora applied at a rate of 400 IJs/larva caused a higher corrected mortality (80.00 ± 5.77%), which screened them as good candidates for future applications. The field experiments showed that both S. longicaudum and H. bacteriophora were approximately as effective in reducing white grubs as the insecticide phoxim, whereas S. glaseri caused a significantly lower reduction compared with these two EPNs and phoxim. Plant mortalities obtained from S. longicaudum, H. bacteriophora and the insecticide treatment plots were significantly lower than those observed in the water-treated control plots. All EPNs examined could establish well in the treated honeysuckle fields for 42 d, confirmed by Tenebrio molitar larvae baiting technique. Our findings suggest that EPNs could provide curative efficacy against white grubs and significantly reduce plant death in honeysuckle fields
The Posttranscriptional Mechanism in <i>Salvia miltiorrhiza</i> Bunge Leaves in Response to Drought Stress Using Phosphoproteomics
Drought stress is a major constraint to the quality and production of Salvia miltiorrhiza Bunge (Danshen). This study aimed to investigate the posttranslational molecular mechanisms in S. miltiorrhiza leaves in response to drought stress using quantitative phosphoproteomics analysis. S. miltiorrhiza plants were stressed by withholding water for two (moderate drought stress) and four weeks (high drought stress). Leaf samples were prepared with tandem mass tag labeling. Liquid chromatography-tandem mass spectrometry was performed for the quantitative phosphoproteomics. Bioinformatics methods were used to identify the phosphosites and phosphoproteins that had significantly changed phosphorylation levels upon drought stresses. A total of 119 common phosphoproteins were significantly changed by both high and moderate drought stresses. The phosphorylation levels of proteins related to protein processing, photosynthesis, RNA binding, and splicing were significantly changed upon high drought, not moderate drought. Additionally, we identified that the Ser phosphorylation levels of most proteins related to terpene metabolism and RNA splicing were regulated by drought stresses. The Ser and Thr phosphorylation levels of energy metabolism proteins (including FBA2/8, PPC4, and PPCC) and heat shock proteins (including HSP70 and HSP90) were upregulated by drought stresses. Our study showed the posttranscriptional mechanisms in S. miltiorrhiza leaves in response to drought stress
De Novo Assembly and Comparative Transcriptome Analysis Provide Insight into Lysine Biosynthesis in Toona sinensis Roem
Toona sinensis Roem is a popular leafy vegetable in Chinese cuisine and is also used as a traditional Chinese medicine. In this study, leaf samples were collected from the same plant on two development stages and then used for high-throughput Illumina RNA-sequencing (RNA-Seq). 125,884 transcripts and 54,628 unigenes were obtained through de novo assembly. A total of 25,570 could be annotated with known biological functions, which indicated that the T. sinensis leaves and shoots were undergoing multiple developmental processes especially for active metabolic processes. Analysis of differentially expressed unigenes between the two libraries showed that the lysine biosynthesis was an enriched KEGG pathway, and candidate genes involved in the lysine biosynthesis pathway in T. sinensis leaves and shoots were identified. Our results provide a primary analysis of the gene expression files of T. sinensis leaf and shoot on different development stages and afford a valuable resource for genetic and genomic research on plant lysine biosynthesis
Identification of the Genome-Wide Expression Patterns of Non-Coding RNAs Associated with Tanshinones Synthesis Pathway in <i>Salvia miltiorrhiza</i>
The red root of Salvia miltiorrhiza Bunge, a famous traditional Chinese medicine (TCM), was caused by tanshinone in epidermis cells. In order to study the biological function of ncRNAs in the tanshinone synthesis, the expression patterns of mRNA and ncRNAs were comprehensively analyzed in red (high tanshinone content) and white root (low tanshinone content) tissues derived from the same plant. A total of 731 differentially expressed genes (DEGs) were mainly enriched in primary metabolic pathways such as galactose and nitrogen, and some secondary metabolic pathways such as phenylpropanoid and terpenoids. A total of 70 miRNAs, 48 lncRNAs, and 26 circRNAs were identified as differentially expressed (DE) ones. The enrichment pathway of the targets of DE-lncRNA were mainly in ribosome, carbon metabolism, plant hormone signal transduction, and glycerophospholipid metabolism. The function of the targets genes of 59 miRNAs combined with DE-circRNAs was mainly involved in plant–pathogen interaction, endocytosis, phenylpropanoid biosynthesis, and sesquiterpenoid and triterpenoid biosynthesis pathways. Most genes of the tanshinone synthesis pathway had a higher expression. Some ncRNAs were predicted to regulate several key enzyme genes of the tanshinone synthesis pathway, such as SmDXS2, SmGGPPS1, SmKSL. Furthermore, most target genes were related to the resistance of pathogens. The present study exhibited the tissue-specific expression patterns of ncRNAs, which would provide a basis for further research into the regulation mechanism of ncRNAs in the tanshinone synthesis process
Comparative RNA-Sequence Transcriptome Analysis of Phenolic Acid Metabolism in Salvia miltiorrhiza, a Traditional Chinese Medicine Model Plant
Salvia miltiorrhiza Bunge is an important traditional Chinese medicine (TCM). In this study, two S. miltiorrhiza genotypes (BH18 and ZH23) with different phenolic acid concentrations were used for de novo RNA sequencing (RNA-seq). A total of 170,787 transcripts and 56,216 unigenes were obtained. There were 670 differentially expressed genes (DEGs) identified between BH18 and ZH23, 250 of which were upregulated in ZH23, with genes involved in the phenylpropanoid biosynthesis pathway being the most upregulated genes. Nine genes involved in the lignin biosynthesis pathway were upregulated in BH18 and thus result in higher lignin content in BH18. However, expression profiles of most genes involved in the core common upstream phenylpropanoid biosynthesis pathway were higher in ZH23 than that in BH18. These results indicated that genes involved in the core common upstream phenylpropanoid biosynthesis pathway might play an important role in downstream secondary metabolism and demonstrated that lignin biosynthesis was a putative partially competing pathway with phenolic acid biosynthesis. The results of this study expanded our understanding of the regulation of phenolic acid biosynthesis in S. miltiorrhiza