41 research outputs found
Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States
Large Language Models (LLMs) can make up answers that are not real, and this
is known as hallucination. This research aims to see if, how, and to what
extent LLMs are aware of hallucination. More specifically, we check whether and
how an LLM reacts differently in its hidden states when it answers a question
right versus when it hallucinates. To do this, we introduce an experimental
framework which allows examining LLM's hidden states in different hallucination
situations. Building upon this framework, we conduct a series of experiments
with language models in the LLaMA family (Touvron et al., 2023). Our empirical
findings suggest that LLMs react differently when processing a genuine response
versus a fabricated one. We then apply various model interpretation techniques
to help understand and explain the findings better. Moreover, informed by the
empirical observations, we show great potential of using the guidance derived
from LLM's hidden representation space to mitigate hallucination. We believe
this work provides insights into how LLMs produce hallucinated answers and how
to make them occur less often.Comment: 9 pages, 8 figures, 2 tables (13 pages, 12 figures, 13 tables
including references and appendices
Exploring the Relationship between In-Context Learning and Instruction Tuning
In-Context Learning (ICL) and Instruction Tuning (IT) are two primary
paradigms of adopting Large Language Models (LLMs) to downstream applications.
However, they are significantly different. In ICL, a set of demonstrations are
provided at inference time but the LLM's parameters are not updated. In IT, a
set of demonstrations are used to tune LLM's parameters in training time but no
demonstrations are used at inference time. Although a growing body of
literature has explored ICL and IT, studies on these topics have largely been
conducted in isolation, leading to a disconnect between these two paradigms. In
this work, we explore the relationship between ICL and IT by examining how the
hidden states of LLMs change in these two paradigms. Through carefully designed
experiments conducted with LLaMA-2 (7B and 13B), we find that ICL is implicit
IT. In other words, ICL changes an LLM's hidden states as if the demonstrations
were used to instructionally tune the model. Furthermore, the convergence
between ICL and IT is largely contingent upon several factors related to the
provided demonstrations. Overall, this work offers a unique perspective to
explore the connection between ICL and IT and sheds light on understanding the
behaviors of LLM
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads
Transformer-based pretrained large language models (PLM) such as BERT and GPT
have achieved remarkable success in NLP tasks. However, PLMs are prone to
encoding stereotypical biases. Although a burgeoning literature has emerged on
stereotypical bias mitigation in PLMs, such as work on debiasing gender and
racial stereotyping, how such biases manifest and behave internally within PLMs
remains largely unknown. Understanding the internal stereotyping mechanisms may
allow better assessment of model fairness and guide the development of
effective mitigation strategies. In this work, we focus on attention heads, a
major component of the Transformer architecture, and propose a bias analysis
framework to explore and identify a small set of biased heads that are found to
contribute to a PLM's stereotypical bias. We conduct extensive experiments to
validate the existence of these biased heads and to better understand how they
behave. We investigate gender and racial bias in the English language in two
types of Transformer-based PLMs: the encoder-based BERT model and the
decoder-based autoregressive GPT model. Overall, the results shed light on
understanding the bias behavior in pretrained language models
Exploring the Common Appearance-Boundary Adaptation for Nighttime Optical Flow
We investigate a challenging task of nighttime optical flow, which suffers
from weakened texture and amplified noise. These degradations weaken
discriminative visual features, thus causing invalid motion feature matching.
Typically, existing methods employ domain adaptation to transfer knowledge from
auxiliary domain to nighttime domain in either input visual space or output
motion space. However, this direct adaptation is ineffective, since there
exists a large domain gap due to the intrinsic heterogeneous nature of the
feature representations between auxiliary and nighttime domains. To overcome
this issue, we explore a common-latent space as the intermediate bridge to
reinforce the feature alignment between auxiliary and nighttime domains. In
this work, we exploit two auxiliary daytime and event domains, and propose a
novel common appearance-boundary adaptation framework for nighttime optical
flow. In appearance adaptation, we employ the intrinsic image decomposition to
embed the auxiliary daytime image and the nighttime image into a
reflectance-aligned common space. We discover that motion distributions of the
two reflectance maps are very similar, benefiting us to consistently transfer
motion appearance knowledge from daytime to nighttime domain. In boundary
adaptation, we theoretically derive the motion correlation formula between
nighttime image and accumulated events within a spatiotemporal gradient-aligned
common space. We figure out that the correlation of the two spatiotemporal
gradient maps shares significant discrepancy, benefitting us to contrastively
transfer boundary knowledge from event to nighttime domain. Moreover,
appearance adaptation and boundary adaptation are complementary to each other,
since they could jointly transfer global motion and local boundary knowledge to
the nighttime domain
Potential Blood Pressure Goals in IgA Nephropathy: Prevalence, Awareness, and Treatment Rates in Chronic Kidney Disease Among Patients with Hypertension in China (PATRIOTIC) Study
Background/Aims: IgA nephropathy is the most prevalent form of primary glomerulonephritis worldwide. Among patients with kidney disease, hypertension is one of the most important risk factors of disease progression. Considering the limited evidence regarding the appropriate blood pressure (BP) goal for patients with IgA nephropathy, our aim was to critically appraise the potential BP goal in IgA nephropathy. Methods: We performed a retrospective analysis of the BP data from 1055 patients with IgA nephropathy, extracted from the database of a nationwide, multi-center, cross-sectional study, including 61 tertiary hospitals in China. Hypertension was defined by a BP ≥140/90 mmHg. Three BP cutoff levels were evaluated as control values: < 140/90 mmHg, < 130/80 mmHg and < 125/75 mmHg. The primary outcome of our study was the prevalence of BP control among patients with a 24-h proteinuria < 1 g/d or ≥ 1 g/d. Multivariate logistic regression analysis was used to identify demographic and clinical factors associated with a decrease in renal function for the different target levels of BP. Results: The overall prevalence of hypertension was 63.3%. BP was controlled under 140/90 mmHg in 49.1% of patients, with 34.3% of patients with proteinuria < 1 g/d reaching the target BP < 130/80 mmHg and only 12.9% of patients with proteinuria > 1 g/d achieving a BP < 125/75 mmHg. Among patients with proteinuria < 1 g/d, the adjusted odds ratios (OR) and 95% confidence interval (95% CI) of a decrease in renal function, for the 3 target BP levels, were as follows (P > 0.05): < 140/90 mmHg, 0.9 (0.5 - 1.6); < 130/80 mmHg, 1.0 (0.5 - 1.8); and < 125/75 mmHg, 1.0 (0.5 - 2.0). With proteinuria ≥1 g/d, the adjusted ORs (95%CI) of attaining the BP targets of < 140/90 mmHg, < 130/80 mmHg and < 125/75 mmHg were 0.4 (0.2 - 0.6), 0.2 (0.1 - 0.4) and 0.3 (0.1 - 0.5), respectively (P < 0.05). Conclusion: Hypertension was common in IgA nephropathy and hypertensive control was suboptimal. Our result supports a benefit of intensive control of BP < 130/80 mmHg for patients with proteinuria ≥1 g/d. However, in patients with proteinuria < 1 g/d, a renoprotective effect of this BP goal was not identified
Construction of an immunogenic cell death-based risk score prognosis model in breast cancer
Immunogenic cell death (ICD) is a form of regulated cell death that elicits immune response. Common inducers of ICD include cancer chemotherapy and radiation therapy. A better understanding of ICD might contribute to modify the current regimens of anti-cancer therapy, especially immunotherapy. This study aimed to identify ICD-related prognostic gene signatures in breast cancer (BC). An ICD-based gene prognostic signature was developed using Lasso-cox regression and Kaplan-Meier survival analysis based on datasets acquired from the Cancer Genome Atlas and Gene Expression Omnibus. A nomogram model was developed to predict the prognosis of BC patients. Gene Set Enrichment Analysis (GESA) and Gene Set Variation Analysis (GSVA) were used to explore the differentially expressed signaling pathways in high and low-risk groups. CIBERSORT and ESTIMATE algorithms were performed to investigate the difference of immune status in tumor microenvironment of different risk groups. Six genes (CALR, CLEC9A, BAX, TLR4, CXCR3, and PIK3CA) were selected for construction and validation of the prognosis model of BC based on public data. GSEA and GSVA analysis found that immune-related gene sets were enriched in low-risk group. Moreover, immune cell infiltration analysis showed that the immune features of the high-risk group were characterized by higher infiltration of tumor-associated macrophages and a lower proportion of CD8+ T cells, suggesting an immune evasive tumor microenvironment. We constructed and validated an ICD-based gene signature for predicting prognosis of breast cancer patients. Our model provides a tool with good discrimination and calibration abilities to predict the prognosis of BC, especially triple-negative breast cancer (TNBC)
Quantum hydrogen-bond symmetrization in the superconducting hydrogen sulfide system.
The quantum nature of the proton can crucially affect the structural and physical properties of hydrogen compounds. For example, in the high-pressure phases of H2O, quantum proton fluctuations lead to symmetrization of the hydrogen bond and reduce the boundary between asymmetric and symmetric structures in the phase diagram by 30 gigapascals (ref. 3). Here we show that an analogous quantum symmetrization occurs in the recently discovered sulfur hydride superconductor with a superconducting transition temperature Tc of 203 kelvin at 155 gigapascals--the highest Tc reported for any superconductor so far. Superconductivity occurs via the formation of a compound with chemical formula H3S (sulfur trihydride) with sulfur atoms arranged on a body-centred cubic lattice. If the hydrogen atoms are treated as classical particles, then for pressures greater than about 175 gigapascals they are predicted to sit exactly halfway between two sulfur atoms in a structure with Im3m symmetry. At lower pressures, the hydrogen atoms move to an off-centre position, forming a short H-S covalent bond and a longer H···S hydrogen bond in a structure with R3m symmetry. X-ray diffraction experiments confirm the H3S stoichiometry and the sulfur lattice sites, but were unable to discriminate between the two phases. Ab initio density-functional-theory calculations show that quantum nuclear motion lowers the symmetrization pressure by 72 gigapascals for H3S and by 60 gigapascals for D3S. Consequently, we predict that the Im3m phase dominates the pressure range within which the high Tc was measured. The observed pressure dependence of Tc is accurately reproduced in our calculations for the phase, but not for the R3m phase. Therefore, the quantum nature of the proton fundamentally changes the superconducting phase diagram of H3S.We acknowledge financial support from the Spanish Ministry of Economy and Competitiveness (FIS2013- 48286-C2-2-P), French Agence Nationale de la Recherche (Grant No. ANR-13-IS10-0003- 392 01), EPSRC (UK) (Grant No. EP/J017639/1), Cambridge Commonwealth Trust, National Natural Science Foundation of China (Grants No. 11204111, 11404148, and 11274136), and 2012 Changjiang Scholars Program of China. Work at Carnegie was supported by EFree, an Energy Frontier Research Center funded by the DOE, Office of Science, Basic Energy Sciences under Award No. DE-SC-0001057. Computer facilities were provided by the PRACE project AESFT and the Donostia International Physics Center (DIPC).This is the author accepted manuscript. The final version is available from Nature Publishing Group via http://dx.doi.org/10.1038/nature1717
The Phylogenetic Origin of oskar Coincided with the Origin of Maternally Provisioned Germ Plasm and Pole Cells at the Base of the Holometabola
The establishment of the germline is a critical, yet surprisingly evolutionarily
labile, event in the development of sexually reproducing animals. In the fly
Drosophila, germ cells acquire their fate early during
development through the inheritance of the germ plasm, a specialized maternal
cytoplasm localized at the posterior pole of the oocyte. The gene
oskar (osk) is both necessary and
sufficient for assembling this substance. Both maternal germ plasm and
oskar are evolutionary novelties within the insects, as the
germline is specified by zygotic induction in basally branching insects, and
osk has until now only been detected in dipterans. In order
to understand the origin of these evolutionary novelties, we used comparative
genomics, parental RNAi, and gene expression analyses in multiple insect
species. We have found that the origin of osk and its role in
specifying the germline coincided with the innovation of maternal germ plasm and
pole cells at the base of the holometabolous insects and that losses of
osk are correlated with changes in germline determination
strategies within the Holometabola. Our results indicate that the invention of
the novel gene osk was a key innovation that allowed the
transition from the ancestral late zygotic mode of germline induction to a
maternally controlled establishment of the germline found in many holometabolous
insect species. We propose that the ancestral role of osk was
to connect an upstream network ancestrally involved in mRNA localization and
translational control to a downstream regulatory network ancestrally involved in
executing the germ cell program
Space advanced technology demonstration satellite
The Space Advanced Technology demonstration satellite (SATech-01), a mission for low-cost space science and new technology experiments, organized by Chinese Academy of Sciences (CAS), was successfully launched into a Sun-synchronous orbit at an altitude of similar to 500 km on July 27, 2022, from the Jiuquan Satellite Launch Centre. Serving as an experimental platform for space science exploration and the demonstration of advanced common technologies in orbit, SATech-01 is equipped with 16 experimental payloads, including the solar upper transition region imager (SUTRI), the lobster eye imager for astronomy (LEIA), the high energy burst searcher (HEBS), and a High Precision Magnetic Field Measurement System based on a CPT Magnetometer (CPT). It also incorporates an imager with freeform optics, an integrated thermal imaging sensor, and a multi-functional integrated imager, etc. This paper provides an overview of SATech-01, including a technical description of the satellite and its scientific payloads, along with their on-orbit performance