11 research outputs found

    Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

    Full text link
    Recent advancements in Large Language Models (LLMs) have expanded the horizons of natural language understanding and generation. Notably, the output control and alignment with the input of LLMs can be refined through instruction tuning. However, as highlighted in several studies, low-quality data in the training set are usually detrimental to instruction tuning, resulting in inconsistent or even misleading LLM outputs. We propose a novel method, termed "reflection-tuning," which addresses the problem by self-improvement and judging capabilities of LLMs. This approach utilizes an oracle LLM to recycle the original training data by introspecting and enhancing the quality of instructions and responses in the data. Extensive experiments on widely used evaluation benchmarks show that LLMs trained with our recycled data outperform those trained with existing datasets in various benchmarks

    From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

    Full text link
    In the realm of Large Language Models, the balance between instruction data quality and quantity has become a focal point. Recognizing this, we introduce a self-guided methodology for LLMs to autonomously discern and select cherry samples from vast open-source datasets, effectively minimizing manual curation and potential cost for instruction tuning an LLM. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal tool to identify discrepancies between a model's expected responses and its autonomous generation prowess. Through the adept application of IFD, cherry samples are pinpointed, leading to a marked uptick in model training efficiency. Empirical validations on renowned datasets like Alpaca and WizardLM underpin our findings; with a mere 10% of conventional data input, our strategy showcases improved results. This synthesis of self-guided cherry-picking and the IFD metric signifies a transformative leap in the optimization of LLMs, promising both efficiency and resource-conscious advancements. Codes, data, and models are available: https://github.com/MingLiiii/Cherry_LL

    Stemphylium lycopersici Nep1-like Protein (NLP) Is a Key Virulence Factor in Tomato Gray Leaf Spot Disease

    No full text
    The fungus Stemphylium lycopersici (S. lycopersici) is an economically important plant pathogen that causes grey leaf spot disease in tomato. However, functional genomic studies in S. lycopersici are lacking, and the factors influencing its pathogenicity remain largely unknown. Here, we present the first example of genetic transformation and targeted gene replacement in S. lycopersici. We functionally analyzed the NLP gene, which encodes a necrosis- and ethylene-inducing peptide 1 (Nep1)-like protein (NLP). We found that targeted disruption of the NLP gene in S. lycopersici significantly compromised its virulence on tomato. Moreover, our data suggest that NLP affects S. lycopersici conidiospore production and weakly affects its adaptation to osmotic and oxidative stress. Interestingly, we found that NLP suppressed the production of reactive oxygen species (ROS) in tomato leaves during S. lycopersici infection. Further, expressing the fungal NLP in tomato resulted in constitutive transcription of immune-responsive genes and inhibited plant growth. Through gene manipulation, we demonstrated the function of NLP in S. lycopersici virulence and development. Our work provides a paradigm for functional genomics studies in a non-model fungal pathogen system

    A Snapshot of the Emerging Tomato Genome Sequence

    Get PDF
    The genome of tomato (Solanum lycopersicum L.) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States) as part of the larger \u201cInternational Solanaceae Genome Project (SOL): Systems Approach to Diversity and Adaptation\u201d initiative. The tomato genome sequencing project uses an ordered bacterial artificial chromosome (BAC) approach to generate a high-quality tomato euchromatic genome sequence for use as a reference genome for the Solanaceae and euasterids. Sequence is deposited at GenBank and at the SOL Genomics Network (SGN). Currently, there are around 1000 BACs finished or in progress, representing more than a third of the projected euchromatic portion of the genome. An annotation effort is also underway by the International Tomato Annotation Group. The expected number of genes in the euchromatin is 3c40,000, based on an estimate from a preliminary annotation of 11% of finished sequence. Here, we present this first snapshot of the emerging tomato genome and its annotation, a short comparison with potato (Solanum tuberosum L.) sequence data, and the tools available for the researchers to exploit this new resource are also presented. In the future, whole-genome shotgun techniques will be combined with the BAC-by-BAC approach to cover the entire tomato genome. The high-quality reference euchromatic tomato sequence is expected to be near completion by 2010
    corecore