109 research outputs found
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Large language models (LLMs), such as ChatGPT, are prone to generate
hallucinations, i.e., content that conflicts with the source or cannot be
verified by the factual knowledge. To understand what types of content and to
which extent LLMs are apt to hallucinate, we introduce the Hallucination
Evaluation benchmark for Large Language Models (HaluEval), a large collection
of generated and human-annotated hallucinated samples for evaluating the
performance of LLMs in recognizing hallucination. To generate these samples, we
propose a ChatGPT-based two-step framework, i.e., sampling-then-filtering.
Besides, we also hire some human labelers to annotate the hallucinations in
ChatGPT responses. The empirical results suggest that ChatGPT is likely to
generate hallucinated content in specific topics by fabricating unverifiable
information (i.e., about responses). Moreover, existing LLMs face
great challenges in recognizing the hallucinations in texts. However, our
experiments also prove that providing external knowledge or adding reasoning
steps can help LLMs recognize hallucinations. Our benchmark can be accessed at
https://github.com/RUCAIBox/HaluEval.Comment: Accepted to EMNLP 2023 Main Conference (Long Paper
The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models
In the era of large language models (LLMs), hallucination (i.e., the tendency
to generate factually incorrect content) poses great challenge to trustworthy
and reliable deployment of LLMs in real-world applications. To tackle the LLM
hallucination, three key questions should be well studied: how to detect
hallucinations (detection), why do LLMs hallucinate (source), and what can be
done to mitigate them (mitigation). To address these challenges, this work
presents a systematic empirical study on LLM hallucination, focused on the the
three aspects of hallucination detection, source and mitigation. Specially, we
construct a new hallucination benchmark HaluEval 2.0, and designs a simple yet
effective detection method for LLM hallucination. Furthermore, we zoom into the
different training or utilization stages of LLMs and extensively analyze the
potential factors that lead to the LLM hallucination. Finally, we implement and
examine a series of widely used techniques to mitigate the hallucinations in
LLMs. Our work has led to several important findings to understand the
hallucination origin and mitigate the hallucinations in LLMs. Our code and data
can be accessed at https://github.com/RUCAIBox/HaluEval-2.0.Comment: 24 pages, 8 figures, 13 table
Present-day kinematics and seismic potential of the Ganzi-Yushu fault, eastern Tibetan plateau, constrained from InSAR
In recent years, earthquakes have occurred frequently on the southeastern edge of the Tibetan Plateau, and the seismic hazard is high. However, because of the remote location of the Ganzi-Yushu fault zone, no high-resolution geodetic measurements of this region have been made. The radar line-of-sight deformation field of the Ganzi-Yushu fault was obtained using seven-track ascending and descending Sentinel-A/B interferometric synthetic aperture radar (InSAR) data from 2014 to 2020. Using the InSAR and published Global Navigation Satellite System (GNSS) data, we calculated the 3D deformation field in the study area, investigated the segment-specific fault slip rate, and inverted the fault slip distribution pattern using the steepest descent method. We then evaluated the seismic hazard using the strain rate field and slip deficit rate. The main findings of this study include the following. 1) The slip rate of the Ganzi-Yushu fault gradually increases from 2.5 to 6.8 mm/yr from northwest to southeast. 2) A high-resolution strain rate map shows high-value anomalies in the Yushu and Dangjiang areas. 3) Our comprehensive analysis suggests that the seismic hazard of the Dangjiang and Dengke segments with high slip deficits cannot be ignored
A review on fundamentals for designing hydrogen evolution electrocatalyst
As a clean, efficient, and renewable energy source, hydrogen has always been recognized as a favourable replacement of fossil fuel. A primary challenge is an efficient generation of hydrogen to fulfil the requirements of hydrogen on a commercial scale. The electrocatalytic process of HER (hydrogen evolution reaction), as primary phase in water electrolytic process for H2 production, has undergone comprehensive observation from recent decades. Electrolytic water splitting presents a promised route to attain efficient hydrogen generation concerning energy conversion and storage, with electrolysis or catalysis playing a pivotal role. The advancement of catalyst or electrocatalysts that are effective, enduring and economical is necessary prerequisite for realizing the intended electrolytic hydrogen generation from water splitting for applicable considerations, embodying the primary emphasis of this article. In this extensive review, we initially summarize the basics of the Hydrogen evolution reaction and examine the latest cutting-edge progress in economical and highly efficiency catalysts utilizing both non-noble and noble metals. Moreover, the recent breakthroughs over the preceding years in electrolytic HER employing more affordable and widely available nanoparticles with a specific center of attention on economical and non-platinum electrocatalysts rooted in metal free (MF) and transition metal composite catalysts are deliberated here
TextBox 2.0: A Text Generation Library with Pre-trained Language Models
To facilitate research on text generation, this paper presents a
comprehensive and unified library, TextBox 2.0, focusing on the use of
pre-trained language models (PLMs). To be comprehensive, our library covers
common text generation tasks and their corresponding datasets and
further incorporates PLMs covering general, translation, Chinese,
dialogue, controllable, distilled, prompting, and lightweight PLMs. We also
implement efficient training strategies and provide generation
objectives for pre-training new PLMs from scratch. To be unified, we design the
interfaces to support the entire research pipeline (from data loading to
training and evaluation), ensuring that each step can be fulfilled in a unified
way. Despite the rich functionality, it is easy to use our library, either
through the friendly Python API or command line. To validate the effectiveness
of our library, we conduct extensive experiments and exemplify four types of
research scenarios. The project is released at the link:
https://github.com/RUCAIBox/TextBox.Comment: Accepted by EMNLP 202
Integrated single-cell and bulk RNA sequencing analyses reveal a prognostic signature of cancer-associated fibroblasts in head and neck squamous cell carcinoma
Objectives: To identify a prognosis-related subtype of cancer-associated fibroblasts (CAFs) in head and neck squamous cell carcinoma (HNSCC) and comprehend its contributions to molecular characteristics, immune characteristics, and their potential benefits in immunotherapy and chemotherapy for HNSCC.Materials and Methods: We performed single-cell RNA sequencing (scRNA-seq) analysis of CAFs from the samples of HNSCC patients derived from Gene Expression Omnibus (GEO), to identify the prognosis-related subtype of CAFs. CAFs were clustered into five subtypes, and a prognosis-related subtype was identified. Univariate and multivariate cox regression analyses were performed on the cohort selected from The Cancer Genome Atlas (TCGA) to determine signature construction, which was validated in GSE65858 and GSE42743. A prognostic signature based on 4 genes was constructed, which were derived from prognosis-related CAFs. The molecular characteristics, immune characteristics as well as the predicted chemosensitivity and immunotherapeutic response in the signature-defined subgroups were analyzed subsequently.Results: The patients with higher CAF scores correlated with poor survival outcomes. Additionally, a high CAF score correlated with lower infiltration levels of many immune cells including M1 macrophages, CD8+ T cells, follicular T helper cells, monocytes, and naïve B cells. High CAF score also demonstrated different enrichment pathways, mutation genes and copy number variated genes. Furthermore, patients with high CAF scores showed lower sensitivity for chemotherapy and immunotherapy than those with low CAF scores.Conclusion: The results of our study indicate the potential of the CAF signature as a biomarker for the prognosis of HNSCC patients. Furthermore, the signature could be a prospective therapeutic target in HNSCC
Fixed-Time Synchronization for Different Dimensional Complex Network Systems with Unknown Parameters via Adaptive Control
This article is related to the issue of fixed-time synchronization of different dimensional complex network systems with unknown parameters. Two suitable adaptive controllers and dynamic parameter estimations are proposed such that the complex network driving and response systems can be synchronized in the settling time. Based on fixed-time control theory and Lyapunov functional method, novel sufficient conditions are provided to guarantee the synchronization within the fixed times, and the settling times are explicitly evaluated, which are independent of the initial synchronization errors. Finally, a numerical example is given to illustrate the effectiveness of the proposed control algorithms
Sequential FISH and GISH karyotypes of M8003 (a), Austrian rye (b), N9116H (c) and N9116M (d).
<p>(a, c, d) 4',6-diamidino-2-phenylindole (DAPI), blue fluorescence; rye genomic DNA and Oligo-pTa535, red fluorescence; Oligo-pSc119.2, green fluorescence. (b) Oligo-pSc119.2, red fluorescence. Alterations of wheat chromosomes were indicated in white box.</p
- …