144 research outputs found
CLEAN-EVAL: Clean Evaluation on Contaminated Large Language Models
We are currently in an era of fierce competition among various large language
models (LLMs) continuously pushing the boundaries of benchmark performance.
However, genuinely assessing the capabilities of these LLMs has become a
challenging and critical issue due to potential data contamination, and it
wastes dozens of time and effort for researchers and engineers to download and
try those contaminated models. To save our precious time, we propose a novel
and useful method, Clean-Eval, which mitigates the issue of data contamination
and evaluates the LLMs in a cleaner manner. Clean-Eval employs an LLM to
paraphrase and back-translate the contaminated data into a candidate set,
generating expressions with the same meaning but in different surface forms. A
semantic detector is then used to filter the generated low-quality samples to
narrow down this candidate set. The best candidate is finally selected from
this set based on the BLEURT score. According to human assessment, this best
candidate is semantically similar to the original contamination data but
expressed differently. All candidates can form a new benchmark to evaluate the
model. Our experiments illustrate that Clean-Eval substantially restores the
actual evaluation results on contaminated LLMs under both few-shot learning and
fine-tuning scenarios
Experimental studies of instability process and energy evolution of tunnels under true triaxial stresses: The role of pre-existed flaws
In the natural geological environment, there are many joints, faults and cavities. These natural defects will have an impact on the stability of tunnels. This paper investigates different conditions of surrounding rock: intact surrounding rock, surrounding rock with open-flaw and surrounding rock with filled-flaw under the true triaxial test. The effect of different surrounding rock conditions on the internal failure characteristics of tunnel under true triaxial conditions is explored. According to the characteristics of energy evolution and chaos theory, the failure characteristics inside the tunnel is divided into stages. The results show that: 1) The failure characteristics in the tunnel are different for different surrounding rock conditions. The failure characteristics do not represent the stability of the surrounding rock of the tunnel; 2) The trend of energy dissipation is different under different surrounding rock conditions. The elastic stage of the surrounding rock is shortened and the dissipation energy shows an earlier upward trend as its integrity declines. 3) When analysing the tunnel, chaos theory can give early warnings about the instability of the surrounding rock, but it can not give early warning of particle spray and spalling inside the tunnel
TPE: Towards Better Compositional Reasoning over Conceptual Tools with Multi-persona Collaboration
Large language models (LLMs) have demonstrated exceptional performance in
planning the use of various functional tools, such as calculators and
retrievers, particularly in question-answering tasks. In this paper, we expand
the definition of these tools, centering on conceptual tools within the context
of dialogue systems. A conceptual tool specifies a cognitive concept that aids
systematic or investigative thought. These conceptual tools play important
roles in practice, such as multiple psychological or tutoring strategies being
dynamically applied in a single turn to compose helpful responses. To further
enhance the reasoning and planning capability of LLMs with these conceptual
tools, we introduce a multi-persona collaboration framework: Think-Plan-Execute
(TPE). This framework decouples the response generation process into three
distinct roles: Thinker, Planner, and Executor. Specifically, the Thinker
analyzes the internal status exhibited in the dialogue context, such as user
emotions and preferences, to formulate a global guideline. The Planner then
generates executable plans to call different conceptual tools (e.g., sources or
strategies), while the Executor compiles all intermediate results into a
coherent response. This structured approach not only enhances the
explainability and controllability of responses but also reduces token
redundancy. We demonstrate the effectiveness of TPE across various dialogue
response generation tasks, including multi-source (FoCus) and multi-strategy
interactions (CIMA and PsyQA). This reveals its potential to handle real-world
dialogue interactions that require more complicated tool learning beyond just
functional tools. The full code and data will be released for reproduction
The effect of mixed La-Y doping on water resistance of phosphate glass
In this work, the effect of mixed La-Y doping on the water resistance of xLa2O3–(16-x)Y2O3–8Al2O3–10Na2O–66P2O5 (x = 0, 4, 8, 12, 16 mol%) glasses was studied. The glass structure, glass transition temperature (Tg), dc conductivity (σdc) and water resistance of glass were respectively characterized by Fourier transform infrared spectroscopy (FTIR), differential scanning calorimeter (DSC), electrochemical workstation and water resistance test. The results show that with the gradual replacement of Y2O3 by La2O3, the value of Q(2) (Q2 content as a percentage of the sum of Q1 and Q2 contents in glass structure) and water resistance characterized by mass loss per unit surface area indicate strong “mixed rare earth effect”. It is obvious that the change of glass structure causes water resistance of glass to vary nonlinearly and exhibit a positive deviation from linearity. The results can provide some useful information for tailoring the chemical durability of glass by mixed rare earth doping
Antimicrobial Resistance in Non-typhoidal Salmonella from Retail Foods Collected in 2020 in China
Non-typhoidal Salmonella (NTS) is a major cause of human salmonellosis globally. Food animals are major NTS reservoirs. An increase in antimicrobial resistance (AMR) in foodborne NTS has led to clinical treatment failures. Here, to examine the prevalence and perform characterization of foodborne NTS with AMR in China, we tested the antimicrobial susceptibility of 1,256 NTS isolates cultured from retail foods in 2020 in China. The antimicrobial susceptibility of 26 antimicrobial agents representing 12 classes was evaluated with the broth-microdilution method; the presence of ten mcr genes was screened with multi-PCR. The complete closed genomes of mcr -gene-carrying isolates were generated by hybrid assembly through whole genome sequencing on both the PacBio and Illumina platforms. Genomic features and genetic environments of the mcr-1 gene were analysed. The overall drug resistance rate was 92.28%, and the multi-drug resistance (MDR) rate was 76.53%. A total of 341 AMR profiles were determined, and resistance was highest to nalidixic acid (63.38%). Among 887 NTS isolates with MDR, 232 showed co-resistance to cefotaxime and ciprofloxacin, and 25 were resistant to ten classes of antimicrobial agents. The resistance of NTS isolated from different regions varied. Isolates from raw chicken sources most frequently showed resistance. Four NTS carried the mcr-1 gene and represented four different serotypes. Four mcr-1 gene-bearing plasmids from the four Salmonella isolates were classified into two replicon types (IncI2 and IncHI2A). Two mcr-1 genes in IncI2 type plasmids were found to be located between a PAP2 family protein-encoding gene and a relaxase-encoding gene, whereas the other two mcr-1 gene structures in IncHI2A type plasmids showed variations in the presence of insertion sequences. Our data demonstrated severe AMR among foodborne NTS isolated from food in China, thus highlighting the importance of antimicrobial susceptibility surveillance to decrease the spread of AMR, particularly to critical drugs in human medicine
Efficacy and Safety/Toxicity Study of Recombinant Vaccinia Virus JX-594 in Two Immunocompetent Animal Models of Glioma
The purpose of this study was to investigate the oncolytic potential of the recombinant, granulocyte macrophage colony-stimulating factor (GM-CSF)-expressing vaccinia virus (VV) JX-594 in experimental malignant glioma (MGs) in vitro and in immunocompetent rodent models. We have found that JX-594 killed all MG cell lines tested in vitro. Intratumoral (i.t.) administration of JX-594 significantly inhibited tumor growth and prolonged survival in rats-bearing RG2 intracranial (i.c.) tumors and mice-bearing GL261 brain tumors. Combination therapy with JX-594 and rapamycin significantly increased viral replication and further prolonged survival in both immunocompetent i.c. MG models with several animals considered “cured” (three out of seven rats >120 days, terminated experiment). JX-594 infected and killed brain tumor-initiating cells (BTICs) from patient samples grown ex vivo, and did so more efficiently than other oncolytic viruses MYXV, Reovirus type-3, and VSVΔM51. Additional safety/toxicity studies in nontumor-bearing rodents treated with a supratherapeutic dose of JX-594 demonstrated GM-CSF-dependent inflammation and necrosis. These results suggest that i.c. administered JX-594 triggers a predictable GM-CSF-mediated inflammation in murine models. Before proceeding to clinical trials, JX-594 should be evaluated in the brains of nonhuman primates and optimized for the viral doses, delivery routes as well as the combination agents (e.g., mTOR inhibitors)
Finishing the euchromatic sequence of the human genome
The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
- …