18 research outputs found

    Improving Coreference Resolution by Leveraging Entity-Centric Features with Graph Neural Networks and Second-order Inference

    Full text link
    One of the major challenges in coreference resolution is how to make use of entity-level features defined over clusters of mentions rather than mention pairs. However, coreferent mentions usually spread far apart in an entire text, which makes it extremely difficult to incorporate entity-level features. We propose a graph neural network-based coreference resolution method that can capture the entity-centric information by encouraging the sharing of features across all mentions that probably refer to the same real-world entity. Mentions are linked to each other via the edges modeling how likely two linked mentions point to the same entity. Modeling by such graphs, the features between mentions can be shared by message passing operations in an entity-centric manner. A global inference algorithm up to second-order features is also presented to optimally cluster mentions into consistent groups. Experimental results show our graph neural network-based method combing with the second-order decoding algorithm (named GNNCR) achieved close to state-of-the-art performance on the English CoNLL-2012 Shared Task dataset

    INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

    Full text link
    Automatically evaluating the quality of language generation is critical. Although recent learned metrics show high correlation with human judgement, these metrics can not explain their verdict or associate the scores with defects in generated text. To address this limitation, we present InstructScore, an explainable evaluation metric for text generation. By harnessing both explicit human instruction and the implicit knowledge of GPT-4, we fine-tune a text evaluation metric based on LLaMA, producing both a score for generated text and a human readable diagnostic report. We evaluate InstructScore on a variety of generation tasks, including translation, captioning, data-to-text and commonsense generation. Experiments show that our 7B model surpasses all other unsupervised metrics, including those based on 175B GPT-3 and GPT-4. Surprisingly, our InstructScore, even without direct supervision from human-rated data, achieves performance levels on par with state-of-the-art metrics like COMET22, which were fine-tuned on human ratings.Comment: Accepted to EMNLP2023 Main Conferenc

    Hire a Linguist!: Learning Endangered Languages with In-Context Linguistic Descriptions

    Full text link
    How can large language models (LLMs) process and translate endangered languages? Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely perform well in unseen, endangered languages. On the contrary, we observe that 2000 endangered languages, though without a large corpus, have a grammar book or a dictionary. We propose LINGOLLM, a training-free approach to enable an LLM to process unseen languages that hardly occur in its pre-training. Our key insight is to demonstrate linguistic knowledge of an unseen language in an LLM's prompt, including a dictionary, a grammar book, and morphologically analyzed input text. We implement LINGOLLM on top of two models, GPT-4 and Mixtral, and evaluate their performance on 5 tasks across 8 endangered or low-resource languages. Our results show that LINGOLLM elevates translation capability from GPT-4's 0 to 10.5 BLEU for 10 language directions. Our findings demonstrate the tremendous value of linguistic knowledge in the age of LLMs for endangered languages. Our data, code, and model generations can be found at https://github.com/LLiLab/llm4endangeredlang

    Curative efficacy of entomopathogenic nematodes against white grubs in honeysuckle fields

    Get PDF
    Root-feeding white grubs are one of the most serious pests of honeysuckle trees (Lonicera japonica) in China, directly damaging their roots and facilitating infection by soil pathogens. Entomopathogenic nematodes (EPNs) are considered as potential control agents against soil-dwelling insect pests. This study aimed to identify effective EPN species against white grubs through bioassay and field experiments. Among the EPN species screened against Holotrichia oblita under laboratory conditions, Steinernema feltiae and Heterorhabditis indica had low virulence, while S. longicaudum, S. glaseri, and H. bacteriophora applied at a rate of 400 IJs/larva caused a higher corrected mortality (80.00 ± 5.77%), which screened them as good candidates for future applications. The field experiments showed that both S. longicaudum and H. bacteriophora were approximately as effective in reducing white grubs as the insecticide phoxim, whereas S. glaseri caused a significantly lower reduction compared with these two EPNs and phoxim. Plant mortalities obtained from S. longicaudum, H. bacteriophora and the insecticide treatment plots were significantly lower than those observed in the water-treated control plots. All EPNs examined could establish well in the treated honeysuckle fields for 42 d, confirmed by Tenebrio molitar larvae baiting technique. Our findings suggest that EPNs could provide curative efficacy against white grubs and significantly reduce plant death in honeysuckle fields

    The Posttranscriptional Mechanism in <i>Salvia miltiorrhiza</i> Bunge Leaves in Response to Drought Stress Using Phosphoproteomics

    No full text
    Drought stress is a major constraint to the quality and production of Salvia miltiorrhiza Bunge (Danshen). This study aimed to investigate the posttranslational molecular mechanisms in S. miltiorrhiza leaves in response to drought stress using quantitative phosphoproteomics analysis. S. miltiorrhiza plants were stressed by withholding water for two (moderate drought stress) and four weeks (high drought stress). Leaf samples were prepared with tandem mass tag labeling. Liquid chromatography-tandem mass spectrometry was performed for the quantitative phosphoproteomics. Bioinformatics methods were used to identify the phosphosites and phosphoproteins that had significantly changed phosphorylation levels upon drought stresses. A total of 119 common phosphoproteins were significantly changed by both high and moderate drought stresses. The phosphorylation levels of proteins related to protein processing, photosynthesis, RNA binding, and splicing were significantly changed upon high drought, not moderate drought. Additionally, we identified that the Ser phosphorylation levels of most proteins related to terpene metabolism and RNA splicing were regulated by drought stresses. The Ser and Thr phosphorylation levels of energy metabolism proteins (including FBA2/8, PPC4, and PPCC) and heat shock proteins (including HSP70 and HSP90) were upregulated by drought stresses. Our study showed the posttranscriptional mechanisms in S. miltiorrhiza leaves in response to drought stress

    De Novo Assembly and Comparative Transcriptome Analysis Provide Insight into Lysine Biosynthesis in Toona sinensis Roem

    Get PDF
    Toona sinensis Roem is a popular leafy vegetable in Chinese cuisine and is also used as a traditional Chinese medicine. In this study, leaf samples were collected from the same plant on two development stages and then used for high-throughput Illumina RNA-sequencing (RNA-Seq). 125,884 transcripts and 54,628 unigenes were obtained through de novo assembly. A total of 25,570 could be annotated with known biological functions, which indicated that the T. sinensis leaves and shoots were undergoing multiple developmental processes especially for active metabolic processes. Analysis of differentially expressed unigenes between the two libraries showed that the lysine biosynthesis was an enriched KEGG pathway, and candidate genes involved in the lysine biosynthesis pathway in T. sinensis leaves and shoots were identified. Our results provide a primary analysis of the gene expression files of T. sinensis leaf and shoot on different development stages and afford a valuable resource for genetic and genomic research on plant lysine biosynthesis

    Identification of the Genome-Wide Expression Patterns of Non-Coding RNAs Associated with Tanshinones Synthesis Pathway in <i>Salvia miltiorrhiza</i>

    No full text
    The red root of Salvia miltiorrhiza Bunge, a famous traditional Chinese medicine (TCM), was caused by tanshinone in epidermis cells. In order to study the biological function of ncRNAs in the tanshinone synthesis, the expression patterns of mRNA and ncRNAs were comprehensively analyzed in red (high tanshinone content) and white root (low tanshinone content) tissues derived from the same plant. A total of 731 differentially expressed genes (DEGs) were mainly enriched in primary metabolic pathways such as galactose and nitrogen, and some secondary metabolic pathways such as phenylpropanoid and terpenoids. A total of 70 miRNAs, 48 lncRNAs, and 26 circRNAs were identified as differentially expressed (DE) ones. The enrichment pathway of the targets of DE-lncRNA were mainly in ribosome, carbon metabolism, plant hormone signal transduction, and glycerophospholipid metabolism. The function of the targets genes of 59 miRNAs combined with DE-circRNAs was mainly involved in plant–pathogen interaction, endocytosis, phenylpropanoid biosynthesis, and sesquiterpenoid and triterpenoid biosynthesis pathways. Most genes of the tanshinone synthesis pathway had a higher expression. Some ncRNAs were predicted to regulate several key enzyme genes of the tanshinone synthesis pathway, such as SmDXS2, SmGGPPS1, SmKSL. Furthermore, most target genes were related to the resistance of pathogens. The present study exhibited the tissue-specific expression patterns of ncRNAs, which would provide a basis for further research into the regulation mechanism of ncRNAs in the tanshinone synthesis process

    Comparative RNA-Sequence Transcriptome Analysis of Phenolic Acid Metabolism in Salvia miltiorrhiza, a Traditional Chinese Medicine Model Plant

    No full text
    Salvia miltiorrhiza Bunge is an important traditional Chinese medicine (TCM). In this study, two S. miltiorrhiza genotypes (BH18 and ZH23) with different phenolic acid concentrations were used for de novo RNA sequencing (RNA-seq). A total of 170,787 transcripts and 56,216 unigenes were obtained. There were 670 differentially expressed genes (DEGs) identified between BH18 and ZH23, 250 of which were upregulated in ZH23, with genes involved in the phenylpropanoid biosynthesis pathway being the most upregulated genes. Nine genes involved in the lignin biosynthesis pathway were upregulated in BH18 and thus result in higher lignin content in BH18. However, expression profiles of most genes involved in the core common upstream phenylpropanoid biosynthesis pathway were higher in ZH23 than that in BH18. These results indicated that genes involved in the core common upstream phenylpropanoid biosynthesis pathway might play an important role in downstream secondary metabolism and demonstrated that lignin biosynthesis was a putative partially competing pathway with phenolic acid biosynthesis. The results of this study expanded our understanding of the regulation of phenolic acid biosynthesis in S. miltiorrhiza
    corecore