109 research outputs found

    HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models

    Full text link
    Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i.e., content that conflicts with the source or cannot be verified by the factual knowledge. To understand what types of content and to which extent LLMs are apt to hallucinate, we introduce the Hallucination Evaluation benchmark for Large Language Models (HaluEval), a large collection of generated and human-annotated hallucinated samples for evaluating the performance of LLMs in recognizing hallucination. To generate these samples, we propose a ChatGPT-based two-step framework, i.e., sampling-then-filtering. Besides, we also hire some human labelers to annotate the hallucinations in ChatGPT responses. The empirical results suggest that ChatGPT is likely to generate hallucinated content in specific topics by fabricating unverifiable information (i.e., about 19.5%19.5\% responses). Moreover, existing LLMs face great challenges in recognizing the hallucinations in texts. However, our experiments also prove that providing external knowledge or adding reasoning steps can help LLMs recognize hallucinations. Our benchmark can be accessed at https://github.com/RUCAIBox/HaluEval.Comment: Accepted to EMNLP 2023 Main Conference (Long Paper

    The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models

    Full text link
    In the era of large language models (LLMs), hallucination (i.e., the tendency to generate factually incorrect content) poses great challenge to trustworthy and reliable deployment of LLMs in real-world applications. To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation). To address these challenges, this work presents a systematic empirical study on LLM hallucination, focused on the the three aspects of hallucination detection, source and mitigation. Specially, we construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet effective detection method for LLM hallucination. Furthermore, we zoom into the different training or utilization stages of LLMs and extensively analyze the potential factors that lead to the LLM hallucination. Finally, we implement and examine a series of widely used techniques to mitigate the hallucinations in LLMs. Our work has led to several important findings to understand the hallucination origin and mitigate the hallucinations in LLMs. Our code and data can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.Comment: 24 pages, 8 figures, 13 table

    Present-day kinematics and seismic potential of the Ganzi-Yushu fault, eastern Tibetan plateau, constrained from InSAR

    Get PDF
    In recent years, earthquakes have occurred frequently on the southeastern edge of the Tibetan Plateau, and the seismic hazard is high. However, because of the remote location of the Ganzi-Yushu fault zone, no high-resolution geodetic measurements of this region have been made. The radar line-of-sight deformation field of the Ganzi-Yushu fault was obtained using seven-track ascending and descending Sentinel-A/B interferometric synthetic aperture radar (InSAR) data from 2014 to 2020. Using the InSAR and published Global Navigation Satellite System (GNSS) data, we calculated the 3D deformation field in the study area, investigated the segment-specific fault slip rate, and inverted the fault slip distribution pattern using the steepest descent method. We then evaluated the seismic hazard using the strain rate field and slip deficit rate. The main findings of this study include the following. 1) The slip rate of the Ganzi-Yushu fault gradually increases from 2.5 to 6.8 mm/yr from northwest to southeast. 2) A high-resolution strain rate map shows high-value anomalies in the Yushu and Dangjiang areas. 3) Our comprehensive analysis suggests that the seismic hazard of the Dangjiang and Dengke segments with high slip deficits cannot be ignored

    A review on fundamentals for designing hydrogen evolution electrocatalyst

    Get PDF
    As a clean, efficient, and renewable energy source, hydrogen has always been recognized as a favourable replacement of fossil fuel. A primary challenge is an efficient generation of hydrogen to fulfil the requirements of hydrogen on a commercial scale. The electrocatalytic process of HER (hydrogen evolution reaction), as primary phase in water electrolytic process for H2 production, has undergone comprehensive observation from recent decades. Electrolytic water splitting presents a promised route to attain efficient hydrogen generation concerning energy conversion and storage, with electrolysis or catalysis playing a pivotal role. The advancement of catalyst or electrocatalysts that are effective, enduring and economical is necessary prerequisite for realizing the intended electrolytic hydrogen generation from water splitting for applicable considerations, embodying the primary emphasis of this article. In this extensive review, we initially summarize the basics of the Hydrogen evolution reaction and examine the latest cutting-edge progress in economical and highly efficiency catalysts utilizing both non-noble and noble metals. Moreover, the recent breakthroughs over the preceding years in electrolytic HER employing more affordable and widely available nanoparticles with a specific center of attention on economical and non-platinum electrocatalysts rooted in metal free (MF) and transition metal composite catalysts are deliberated here

    TextBox 2.0: A Text Generation Library with Pre-trained Language Models

    Full text link
    To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs). To be comprehensive, our library covers 1313 common text generation tasks and their corresponding 8383 datasets and further incorporates 4545 PLMs covering general, translation, Chinese, dialogue, controllable, distilled, prompting, and lightweight PLMs. We also implement 44 efficient training strategies and provide 44 generation objectives for pre-training new PLMs from scratch. To be unified, we design the interfaces to support the entire research pipeline (from data loading to training and evaluation), ensuring that each step can be fulfilled in a unified way. Despite the rich functionality, it is easy to use our library, either through the friendly Python API or command line. To validate the effectiveness of our library, we conduct extensive experiments and exemplify four types of research scenarios. The project is released at the link: https://github.com/RUCAIBox/TextBox.Comment: Accepted by EMNLP 202

    Integrated single-cell and bulk RNA sequencing analyses reveal a prognostic signature of cancer-associated fibroblasts in head and neck squamous cell carcinoma

    Get PDF
    Objectives: To identify a prognosis-related subtype of cancer-associated fibroblasts (CAFs) in head and neck squamous cell carcinoma (HNSCC) and comprehend its contributions to molecular characteristics, immune characteristics, and their potential benefits in immunotherapy and chemotherapy for HNSCC.Materials and Methods: We performed single-cell RNA sequencing (scRNA-seq) analysis of CAFs from the samples of HNSCC patients derived from Gene Expression Omnibus (GEO), to identify the prognosis-related subtype of CAFs. CAFs were clustered into five subtypes, and a prognosis-related subtype was identified. Univariate and multivariate cox regression analyses were performed on the cohort selected from The Cancer Genome Atlas (TCGA) to determine signature construction, which was validated in GSE65858 and GSE42743. A prognostic signature based on 4 genes was constructed, which were derived from prognosis-related CAFs. The molecular characteristics, immune characteristics as well as the predicted chemosensitivity and immunotherapeutic response in the signature-defined subgroups were analyzed subsequently.Results: The patients with higher CAF scores correlated with poor survival outcomes. Additionally, a high CAF score correlated with lower infiltration levels of many immune cells including M1 macrophages, CD8+ T cells, follicular T helper cells, monocytes, and naïve B cells. High CAF score also demonstrated different enrichment pathways, mutation genes and copy number variated genes. Furthermore, patients with high CAF scores showed lower sensitivity for chemotherapy and immunotherapy than those with low CAF scores.Conclusion: The results of our study indicate the potential of the CAF signature as a biomarker for the prognosis of HNSCC patients. Furthermore, the signature could be a prospective therapeutic target in HNSCC

    Fixed-Time Synchronization for Different Dimensional Complex Network Systems with Unknown Parameters via Adaptive Control

    No full text
    This article is related to the issue of fixed-time synchronization of different dimensional complex network systems with unknown parameters. Two suitable adaptive controllers and dynamic parameter estimations are proposed such that the complex network driving and response systems can be synchronized in the settling time. Based on fixed-time control theory and Lyapunov functional method, novel sufficient conditions are provided to guarantee the synchronization within the fixed times, and the settling times are explicitly evaluated, which are independent of the initial synchronization errors. Finally, a numerical example is given to illustrate the effectiveness of the proposed control algorithms

    Sequential FISH and GISH karyotypes of M8003 (a), Austrian rye (b), N9116H (c) and N9116M (d).

    No full text
    <p>(a, c, d) 4',6-diamidino-2-phenylindole (DAPI), blue fluorescence; rye genomic DNA and Oligo-pTa535, red fluorescence; Oligo-pSc119.2, green fluorescence. (b) Oligo-pSc119.2, red fluorescence. Alterations of wheat chromosomes were indicated in white box.</p
    corecore