144 research outputs found

    CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models

    Full text link
    We are currently in an era of fierce competition among various large language models (LLMs) continuously pushing the boundaries of benchmark performance. However, genuinely assessing the capabilities of these LLMs has become a challenging and critical issue due to potential data contamination, and it wastes dozens of time and effort for researchers and engineers to download and try those contaminated models. To save our precious time, we propose a novel and useful method, Clean-Eval, which mitigates the issue of data contamination and evaluates the LLMs in a cleaner manner. Clean-Eval employs an LLM to paraphrase and back-translate the contaminated data into a candidate set, generating expressions with the same meaning but in different surface forms. A semantic detector is then used to filter the generated low-quality samples to narrow down this candidate set. The best candidate is finally selected from this set based on the BLEURT score. According to human assessment, this best candidate is semantically similar to the original contamination data but expressed differently. All candidates can form a new benchmark to evaluate the model. Our experiments illustrate that Clean-Eval substantially restores the actual evaluation results on contaminated LLMs under both few-shot learning and fine-tuning scenarios

    Experimental studies of instability process and energy evolution of tunnels under true triaxial stresses: The role of pre-existed flaws

    Get PDF
    In the natural geological environment, there are many joints, faults and cavities. These natural defects will have an impact on the stability of tunnels. This paper investigates different conditions of surrounding rock: intact surrounding rock, surrounding rock with open-flaw and surrounding rock with filled-flaw under the true triaxial test. The effect of different surrounding rock conditions on the internal failure characteristics of tunnel under true triaxial conditions is explored. According to the characteristics of energy evolution and chaos theory, the failure characteristics inside the tunnel is divided into stages. The results show that: 1) The failure characteristics in the tunnel are different for different surrounding rock conditions. The failure characteristics do not represent the stability of the surrounding rock of the tunnel; 2) The trend of energy dissipation is different under different surrounding rock conditions. The elastic stage of the surrounding rock is shortened and the dissipation energy shows an earlier upward trend as its integrity declines. 3) When analysing the tunnel, chaos theory can give early warnings about the instability of the surrounding rock, but it can not give early warning of particle spray and spalling inside the tunnel

    TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration

    Full text link
    Large language models (LLMs) have demonstrated exceptional performance in planning the use of various functional tools, such as calculators and retrievers, particularly in question-answering tasks. In this paper, we expand the definition of these tools, centering on conceptual tools within the context of dialogue systems. A conceptual tool specifies a cognitive concept that aids systematic or investigative thought. These conceptual tools play important roles in practice, such as multiple psychological or tutoring strategies being dynamically applied in a single turn to compose helpful responses. To further enhance the reasoning and planning capability of LLMs with these conceptual tools, we introduce a multi-persona collaboration framework: Think-Plan-Execute (TPE). This framework decouples the response generation process into three distinct roles: Thinker, Planner, and Executor. Specifically, the Thinker analyzes the internal status exhibited in the dialogue context, such as user emotions and preferences, to formulate a global guideline. The Planner then generates executable plans to call different conceptual tools (e.g., sources or strategies), while the Executor compiles all intermediate results into a coherent response. This structured approach not only enhances the explainability and controllability of responses but also reduces token redundancy. We demonstrate the effectiveness of TPE across various dialogue response generation tasks, including multi-source (FoCus) and multi-strategy interactions (CIMA and PsyQA). This reveals its potential to handle real-world dialogue interactions that require more complicated tool learning beyond just functional tools. The full code and data will be released for reproduction

    The effect of mixed La-Y doping on water resistance of phosphate glass

    Get PDF
    In this work, the effect of mixed La-Y doping on the water resistance of xLa2O3–(16-x)Y2O3–8Al2O3–10Na2O–66P2O5 (x = 0, 4, 8, 12, 16 mol%) glasses was studied. The glass structure, glass transition temperature (Tg), dc conductivity (σdc) and water resistance of glass were respectively characterized by Fourier transform infrared spectroscopy (FTIR), differential scanning calorimeter (DSC), electrochemical workstation and water resistance test. The results show that with the gradual replacement of Y2O3 by La2O3, the value of Q(2) (Q2 content as a percentage of the sum of Q1 and Q2 contents in glass structure) and water resistance characterized by mass loss per unit surface area indicate strong “mixed rare earth effect”. It is obvious that the change of glass structure causes water resistance of glass to vary nonlinearly and exhibit a positive deviation from linearity. The results can provide some useful information for tailoring the chemical durability of glass by mixed rare earth doping

    Antimicrobial Resistance in Non-typhoidal Salmonella from Retail Foods Collected in 2020 in China

    Get PDF
    Non-typhoidal Salmonella (NTS) is a major cause of human salmonellosis globally. Food animals are major NTS reservoirs. An increase in antimicrobial resistance (AMR) in foodborne NTS has led to clinical treatment failures. Here, to examine the prevalence and perform characterization of foodborne NTS with AMR in China, we tested the antimicrobial susceptibility of 1,256 NTS isolates cultured from retail foods in 2020 in China. The antimicrobial susceptibility of 26 antimicrobial agents representing 12 classes was evaluated with the broth-microdilution method; the presence of ten mcr genes was screened with multi-PCR. The complete closed genomes of mcr -gene-carrying isolates were generated by hybrid assembly through whole genome sequencing on both the PacBio and Illumina platforms. Genomic features and genetic environments of the mcr-1 gene were analysed. The overall drug resistance rate was 92.28%, and the multi-drug resistance (MDR) rate was 76.53%. A total of 341 AMR profiles were determined, and resistance was highest to nalidixic acid (63.38%). Among 887 NTS isolates with MDR, 232 showed co-resistance to cefotaxime and ciprofloxacin, and 25 were resistant to ten classes of antimicrobial agents. The resistance of NTS isolated from different regions varied. Isolates from raw chicken sources most frequently showed resistance. Four NTS carried the mcr-1 gene and represented four different serotypes. Four mcr-1 gene-bearing plasmids from the four Salmonella isolates were classified into two replicon types (IncI2 and IncHI2A). Two mcr-1 genes in IncI2 type plasmids were found to be located between a PAP2 family protein-encoding gene and a relaxase-encoding gene, whereas the other two mcr-1 gene structures in IncHI2A type plasmids showed variations in the presence of insertion sequences. Our data demonstrated severe AMR among foodborne NTS isolated from food in China, thus highlighting the importance of antimicrobial susceptibility surveillance to decrease the spread of AMR, particularly to critical drugs in human medicine

    Efficacy and Safety/Toxicity Study of Recombinant Vaccinia Virus JX-594 in Two Immunocompetent Animal Models of Glioma

    Get PDF
    The purpose of this study was to investigate the oncolytic potential of the recombinant, granulocyte macrophage colony-stimulating factor (GM-CSF)-expressing vaccinia virus (VV) JX-594 in experimental malignant glioma (MGs) in vitro and in immunocompetent rodent models. We have found that JX-594 killed all MG cell lines tested in vitro. Intratumoral (i.t.) administration of JX-594 significantly inhibited tumor growth and prolonged survival in rats-bearing RG2 intracranial (i.c.) tumors and mice-bearing GL261 brain tumors. Combination therapy with JX-594 and rapamycin significantly increased viral replication and further prolonged survival in both immunocompetent i.c. MG models with several animals considered “cured” (three out of seven rats >120 days, terminated experiment). JX-594 infected and killed brain tumor-initiating cells (BTICs) from patient samples grown ex vivo, and did so more efficiently than other oncolytic viruses MYXV, Reovirus type-3, and VSVΔM51. Additional safety/toxicity studies in nontumor-bearing rodents treated with a supratherapeutic dose of JX-594 demonstrated GM-CSF-dependent inflammation and necrosis. These results suggest that i.c. administered JX-594 triggers a predictable GM-CSF-mediated inflammation in murine models. Before proceeding to clinical trials, JX-594 should be evaluated in the brains of nonhuman primates and optimized for the viral doses, delivery routes as well as the combination agents (e.g., mTOR inhibitors)

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
    corecore