98 research outputs found
Prompt-NER: Zero-shot Named Entity Recognition in Astronomy Literature via Large Language Models
This study delves into the application of Large Language Models (LLMs) for
Named Entity Recognition (NER) tasks in the field of astronomy literature. To
enhance the zero-shot recognition capabilities of LLMs for astronomical named
entities, we propose a strategy called Prompt-NER. Prompt-NER includes five
prompt elements: Task Descriptions, Entity Definitions, Task Emphasis, Task
Examples, and Second Conversation. To assess the effectiveness of the
Prompt-NER strategy, we utilize three representative LLMs (Claude-2, GPT-3.5,
and LLaMA-2-70b) to identify telescope and celestial object named entities in
astronomical literature. Our experiments are conducted based on two distinct
datasets. The first dataset comprises 30 original PDF documents, which we split
into paragraphs in sequential order, resulting in a second dataset consisting
of 30 paragraph collections. Additionally, we incorporate 30 astronomical
telegrams to diversify our experiments and assess the performance of LLMs based
on Prompt-NER on concise, complete texts. Our experimental results indicate
that the Prompt-NER strategy enables LLMs to effectively accomplish NER tasks
in the field of astronomy, even without prior astronomical knowledge during
training. We carefully analyze the experimental results, including the
mechanism of different prompt elements and the influence of different features
of long and short texts on their respective experimental results. This research
provides experience for zero-shot NER tasks in astronomical literature and
suggests future work in this area
mmPlace: Robust Place Recognition with Intermediate Frequency Signal of Low-cost Single-chip Millimeter Wave Radar
Place recognition is crucial for tasks like loop-closure detection and
re-localization. Single-chip millimeter wave radar (single-chip radar in short)
emerges as a low-cost sensor option for place recognition, with the advantage
of insensitivity to degraded visual environments. However, it encounters two
challenges. Firstly, sparse point cloud from single-chip radar leads to poor
performance when using current place recognition methods, which assume much
denser data. Secondly, its performance significantly declines in scenarios
involving rotational and lateral variations, due to limited overlap in its
field of view (FOV). We propose mmPlace, a robust place recognition system to
address these challenges. Specifically, mmPlace transforms intermediate
frequency (IF) signal into range azimuth heatmap and employs a spatial encoder
to extract features. Additionally, to improve the performance in scenarios
involving rotational and lateral variations, mmPlace employs a rotating
platform and concatenates heatmaps in a rotation cycle, effectively expanding
the system's FOV. We evaluate mmPlace's performance on the milliSonic dataset,
which is collected on the University of Science and Technology of China (USTC)
campus, the city roads surrounding the campus, and an underground parking
garage. The results demonstrate that mmPlace outperforms point cloud-based
methods and achieves 87.37% recall@1 in scenarios involving rotational and
lateral variations.Comment: 8 pages, 8 figure
RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
Tool learning has generated widespread interest as a vital means of
interaction between Large Language Models (LLMs) and the physical world.
Current research predominantly emphasizes LLMs' capacity to utilize tools in
well-structured environments while overlooking their stability when confronted
with the inevitable noise of the real world. To bridge this gap, we introduce
RoTBench, a multi-level benchmark for evaluating the robustness of LLMs in tool
learning. Specifically, we establish five external environments, each featuring
varying levels of noise (i.e., Clean, Slight, Medium, Heavy, and Union),
providing an in-depth analysis of the model's resilience across three critical
phases: tool selection, parameter identification, and content filling.
Experiments involving six widely-used models underscore the urgent necessity
for enhancing the robustness of LLMs in tool learning. For instance, the
performance of GPT-4 even drops significantly from 80.00 to 58.10 when there is
no substantial change in manual accuracy. More surprisingly, the noise
correction capability inherent in the GPT family paradoxically impedes its
adaptability in the face of mild noise. In light of these findings, we propose
RoTTuning, a strategy that enriches the diversity of training environments to
bolster the robustness of LLMs in tool learning. The code and data are
available at https://github.com/Junjie-Ye/RoTBench
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
Existing evaluations of tool learning primarily focus on validating the
alignment of selected tools for large language models (LLMs) with expected
outcomes. However, these approaches rely on a limited set of scenarios where
answers can be pre-determined, diverging from genuine needs. Furthermore, a
sole emphasis on outcomes disregards the intricate capabilities essential for
LLMs to effectively utilize tools. To tackle this issue, we propose ToolEyes, a
fine-grained system tailored for the evaluation of the LLMs' tool learning
capabilities in authentic scenarios. The system meticulously examines seven
real-world scenarios, analyzing five dimensions crucial to LLMs in tool
learning: format alignment, intent comprehension, behavior planning, tool
selection, and answer organization. Additionally, ToolEyes incorporates a tool
library boasting approximately 600 tools, serving as an intermediary between
LLMs and the physical world. Evaluations involving ten LLMs across three
categories reveal a preference for specific scenarios and limited cognitive
abilities in tool learning. Intriguingly, expanding the model size even
exacerbates the hindrance to tool learning. These findings offer instructive
insights aimed at advancing the field of tool learning. The data is available
att https://github.com/Junjie-Ye/ToolEyes
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
The advancement of large language models (LLMs) has significantly propelled
the field of code generation. Previous work integrated reinforcement learning
(RL) with compiler feedback for exploring the output space of LLMs to enhance
code generation quality. However, the lengthy code generated by LLMs in
response to complex human requirements makes RL exploration a challenge. Also,
since the unit tests may not cover the complicated code, optimizing LLMs by
using these unexecuted code snippets is ineffective. To tackle these
challenges, we introduce StepCoder, a novel RL framework for code generation,
consisting of two main components: CCCS addresses the exploration challenge by
breaking the long sequences code generation task into a Curriculum of Code
Completion Subtasks, while FGO only optimizes the model by masking the
unexecuted code segments to provide Fine-Grained Optimization. In addition, we
furthermore construct the APPS+ dataset for RL training, which is manually
verified to ensure the correctness of unit tests. Experimental results show
that our method improves the ability to explore the output space and
outperforms state-of-the-art approaches in corresponding benchmarks. Our
dataset APPS+ and StepCoder are available online.Comment: 13 pages, 5 figure
Abnormal regional signal in the left cerebellum as a potential neuroimaging biomarker of sudden sensorineural hearing loss
ObjectiveWhile prior reports have characterized visible changes in neuroimaging findings in individuals suffering from sudden sensorineural hearing loss (SSNHL), the utility of regional homogeneity (ReHo) as a means of diagnosing SSNHL has yet to be established. The present study was thus conducted to assess ReHo abnormalities in SSNHL patients and to establish whether these abnormalities offer value as a diagnostic neuroimaging biomarker of SSNHL through a support vector machine (SVM) analysis approach.MethodsResting-state functional magnetic resonance imaging (rs-fMRI) analyses of 27 SSNHL patients and 27 normal controls were conducted, with the resultant imaging data then being analyzed based on a combination of ReHo and SVM approaches.ResultsRelative to normal control individuals, patients diagnosed with SSNHL exhibited significant reductions in ReHo values in the left cerebellum, bilateral inferior temporal gyrus (ITG), left superior temporal pole (STP), right parahippocampal gyrus (PHG), left posterior cingulum cortex (PCC), and right superior frontal gyrus (SFG). SVM analyses suggested that reduced ReHo values in the left cerebellum were associated with high levels of diagnostic accuracy (96.30%, 52/54), sensitivity (92.59%, 25/27), and specificity (100.00%, 27/27) when distinguishing between SSNHL patients and control individuals.ConclusionThese data suggest that SSNHL patients exhibit abnormal resting-state neurological activity, with changes in the ReHo of the left cerebellum offering value as a diagnostic neuroimaging biomarker associated with this condition
Towards Flexible Wireless Charging for Medical Implants Using Distributed Antenna System
This paper presents the design, implementation and evaluation of In-N-Out, a
software-hardware solution for far-field wireless power transfer. In-N-Out can
continuously charge a medical implant residing in deep tissues at near-optimal
beamforming power, even when the implant moves around inside the human body. To
accomplish this, we exploit the unique energy ball pattern of distributed
antenna array and devise a backscatter-assisted beamforming algorithm that can
concentrate RF energy on a tiny spot surrounding the medical implant.
Meanwhile, the power levels on other body parts stay in low level, reducing the
risk of overheating. We prototype In-N-Out on 21 software-defined radios and a
printed circuit board (PCB). Extensive experiments demonstrate that In-N-Out
achieves 0.37~mW average charging power inside a 10~cm-thick pork belly, which
is sufficient to wirelessly power a range of commercial medical devices. Our
head-to-head comparison with the state-of-the-art approach shows that In-N-Out
achieves 5.4--18.1 power gain when the implant is stationary,
and 5.3--7.4 power gain when the implant is in motion.Comment: In MobiCom 2020: The 26th Annual International Conference on Mobile
Computing and Networking, London, 15 page
PERK-Mediated Cholesterol Excretion from IDH Mutant Glioma Determines Anti-Tumoral Polarization of Microglia
Isocitrate dehydrogenase (IDH) mutation, a known pathologic classifier, initiates metabolic reprogramming in glioma cells and has been linked to the reaction status of glioma-associated microglia/macrophages (GAMs). However, it remains unclear how IDH genotypes contribute to GAM phenotypes. Here, it is demonstrated that gliomas expressing mutant IDH determine M1-like polarization of GAMs, while archetypal IDH induces M2-like polarization. Intriguingly, IDH-mutant gliomas secrete excess cholesterol, resulting in cholesterol-rich, pro-inflammatory GAMs without altering their cholesterol biosynthesis, and simultaneously exhibiting low levels of tumoral cholesterol due to expression remodeling of cholesterol transport molecules, particularly upregulation of ABCA1 and downregulation of LDLR. Mechanistically, a miR-19a/LDLR axis-mediated novel post-transcriptional regulation of cholesterol uptake is identified, modulated by IDH mutation, and influencing tumor cell proliferation and invasion. IDH mutation-induced PERK activation enhances cholesterol export from glioma cells via the miR-19a/LDLR axis and ABCA1/APOE upregulation. Further, a synthetic PERK activator, CCT020312 is introduced, which markedly stimulates cholesterol efflux from IDH wild-type glioma cells, induces M1-like polarization of GAMs, and consequently suppresses glioma cell invasion. The findings reveal an essential role of the PERK/miR-19a/LDLR signaling pathway in orchestrating gliomal cholesterol transport and the subsequent phenotypes of GAMs, thereby highlighting a novel potential target pathway for glioma therapy
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Reinforcement Learning from Human Feedback (RLHF) has become a crucial
technology for aligning language models with human values and intentions,
enabling models to produce more helpful and harmless responses. Reward models
are trained as proxies for human preferences to drive reinforcement learning
optimization. While reward models are often considered central to achieving
high performance, they face the following challenges in practical applications:
(1) Incorrect and ambiguous preference pairs in the dataset may hinder the
reward model from accurately capturing human intent. (2) Reward models trained
on data from a specific distribution often struggle to generalize to examples
outside that distribution and are not suitable for iterative RLHF training.
In this report, we attempt to address these two issues. (1) From a data
perspective, we propose a method to measure the strength of preferences within
the data, based on a voting mechanism of multiple reward models. Experimental
results confirm that data with varying preference strengths have different
impacts on reward model performance. We introduce a series of novel methods to
mitigate the influence of incorrect and ambiguous preferences in the dataset
and fully leverage high-quality preference data. (2) From an algorithmic
standpoint, we introduce contrastive learning to enhance the ability of reward
models to distinguish between chosen and rejected responses, thereby improving
model generalization. Furthermore, we employ meta-learning to enable the reward
model to maintain the ability to differentiate subtle differences in
out-of-distribution samples, and this approach can be utilized for iterative
RLHF optimization
- …