919 research outputs found
CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models
Large Language Models (LLMs) have made significant progress in utilizing
tools, but their ability is limited by API availability and the instability of
implicit reasoning, particularly when both planning and execution are involved.
To overcome these limitations, we propose CREATOR, a novel framework that
enables LLMs to create their own tools using documentation and code
realization. CREATOR disentangles abstract tool creation and concrete decision
execution, resulting in improved performance. We evaluate CREATOR on MATH and
TabMWP benchmarks, respectively consisting of challenging math competition
problems and diverse tabular contents. Remarkably, CREATOR outperforms existing
chain-of-thought, program-of-thought, and tool-using baselines. Additionally,
we introduce the Creation Challenge dataset, featuring 2K diverse questions, to
emphasize the necessity and benefits of LLMs' tool creation ability. Further
research demonstrates that leveraging LLMs as tool creators facilitates
knowledge transfer, and LLMs exhibit varying levels of tool creation abilities,
enabling them to adapt to diverse situations. The tool creation ability
revolutionizes the LLM's problem-solving paradigm, driving us closer to the
next frontier of artificial intelligence. All the codes and data are released
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Large language models (LLMs) are often augmented with tools to solve complex
tasks. By generating code snippets and executing them through task-specific
Application Programming Interfaces (APIs), they can offload certain functions
to dedicated external modules, such as image encoding and performing
calculations. However, most existing approaches to augment LLMs with tools are
constrained by general-purpose APIs and lack the flexibility for tailoring them
to specific tasks. In this work, we present CRAFT, a general tool creation and
retrieval framework for LLMs. It creates toolsets specifically curated for the
tasks and equips LLMs with a component that retrieves tools from these sets to
enhance their capability to solve complex tasks. For each task, we collect
specific code solutions by prompting GPT-4 to solve the training examples.
Following a validation step ensuring the correctness, these solutions are
abstracted into code snippets to enhance reusability, and deduplicated for
higher quality. At inference time, the language model retrieves snippets from
the toolsets and then executes them or generates the output conditioning on the
retrieved snippets. Our method is designed to be flexible and offers a
plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and
modalities, without any finetuning. Experiments on vision-language, tabular
processing, and mathematical reasoning tasks show that our approach achieves
substantial improvements compared to strong baselines. In addition, our
in-depth analysis reveals that: (1) consistent performance improvement can be
achieved by scaling up the number of tools and the capability of the backbone
models; (2) each component of our approach contributes to the performance
gains; (3) the created tools are well-structured and reliable with low
complexity and atomicity. The code is available at
https://github.com/lifan-yuan/CRAFT.Comment: Accepted to ICLR 2024. Code is available at
https://github.com/lifan-yuan/CRAF
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly
Norm discovery is important for understanding and reasoning about the
acceptable behaviors and potential violations in human communication and
interactions. We introduce NormSage, a framework for addressing the novel task
of conversation-grounded multi-lingual, multi-cultural norm discovery, based on
language model prompting and self-verification. NormSAGE leverages the
expressiveness and implicit knowledge of the pretrained GPT-3 language model
backbone, to elicit knowledge about norms through directed questions
representing the norm discovery task and conversation context. It further
addresses the risk of language model hallucination with a self-verification
mechanism ensuring that the norms discovered are correct and are substantially
grounded to their source conversations. Evaluation results show that our
approach discovers significantly more relevant and insightful norms for
conversations on-the-fly compared to baselines (>10+% in Likert scale rating).
The norms discovered from Chinese conversation are also comparable to the norms
discovered from English conversation in terms of insightfulness and correctness
(<3% difference). In addition, the culture-specific norms are promising
quality, allowing for 80% accuracy in culture pair human identification.
Finally, our grounding process in norm discovery self-verification can be
extended for instantiating the adherence and violation of any norm for a given
conversation on-the-fly, with explainability and transparency. NormSAGE
achieves an AUC of 95.4% in grounding, with natural language explanation
matching human-written quality
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Building cross-model intelligence that can understand charts and communicate
the salient information hidden behind them is an appealing challenge in the
vision and language(V+L) community. The capability to uncover the underlined
table data of chart figures is a critical key to automatic chart understanding.
We introduce ChartT5, a V+L model that learns how to interpret table
information from chart images via cross-modal pre-training on plot table pairs.
Specifically, we propose two novel pre-training objectives: Masked Header
Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with
different skills to interpret the table information. We have conducted
extensive experiments on chart question answering and chart summarization to
verify the effectiveness of the proposed pre-training strategies. In
particular, on the ChartQA benchmark, our ChartT5 outperforms the
state-of-the-art non-pretraining methods by over 8% performance gains.Comment: Accepted by Findings of ACL 202
Quantum Hacking: Experimental demonstration of time-shift attack against practical quantum key distribution systems
Quantum key distribution (QKD) systems can send signals over more than 100 km
standard optical fiber and are widely believed to be secure. Here, we show
experimentally for the first time a technologically feasible attack, namely the
time-shift attack, against a commercial QKD system. Our result shows that,
contrary to popular belief, an eavesdropper, Eve, has a non-negligible
probability (~4%) to break the security of the system. Eve's success is due to
the well-known detection efficiency loophole in the experimental testing of
Bell inequalities. Therefore, the detection efficiency loophole plays a key
role not only in fundamental physics, but also in technological applications
such as QKD.Comment: 5 pages, 3 figures. Substantially revised versio
Defining a New NLP Playground
The recent explosion of performance of large language models (LLMs) has
changed the field of Natural Language Processing (NLP) more abruptly and
seismically than any other shift in the field's 80-year history. This has
resulted in concerns that the field will become homogenized and
resource-intensive. The new status quo has put many academic researchers,
especially PhD students, at a disadvantage. This paper aims to define a new NLP
playground by proposing 20+ PhD-dissertation-worthy research directions,
covering theoretical analysis, new and challenging problems, learning
paradigms, and interdisciplinary applications.Comment: EMNLP Findings 2023 "Theme Track: Large Language Models and the
Future of NLP
NewsClaims: A New Benchmark for Claim Detection from News with Attribute Knowledge
Claim detection and verification are crucial for news understanding and have
emerged as promising technologies for mitigating news misinformation. However,
most existing work has focused on claim sentence analysis while overlooking
crucial background attributes (e.g., claimer, claim objects). In this work, we
present NewsClaims, a new benchmark for knowledge-aware claim detection in the
news domain. We redefine the claim detection problem to include extraction of
additional background attributes related to each claim and release 889 claims
annotated over 143 news articles. NewsClaims aims to benchmark claim detection
systems in emerging scenarios, comprising unseen topics with little or no
training data. To this end, we provide a comprehensive evaluation of zero-shot
and prompt-based baselines for NewsClaims.Comment: Preprin
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
The prominent large language models (LLMs) of today differ from past language
models not only in size, but also in the fact that they are trained on a
combination of natural language and formal language (code). As a medium between
humans and computers, code translates high-level goals into executable steps,
featuring standard syntax, logical consistency, abstraction, and modularity. In
this survey, we present an overview of the various benefits of integrating code
into LLMs' training data. Specifically, beyond enhancing LLMs in code
generation, we observe that these unique properties of code help (i) unlock the
reasoning ability of LLMs, enabling their applications to a range of more
complex natural language tasks; (ii) steer LLMs to produce structured and
precise intermediate steps, which can then be connected to external execution
ends through function calls; and (iii) take advantage of code compilation and
execution environment, which also provides diverse feedback for model
improvement. In addition, we trace how these profound capabilities of LLMs,
brought by code, have led to their emergence as intelligent agents (IAs) in
situations where the ability to understand instructions, decompose goals, plan
and execute actions, and refine from feedback are crucial to their success on
downstream tasks. Finally, we present several key challenges and future
directions of empowering LLMs with code
Bacteremic community-acquired pneumonia due to Klebsiella pneumoniae: Clinical and microbiological characteristics in Taiwan, 2001-2008
<p>Abstract</p> <p>Background</p> <p><it>Klebsiella pneumoniae </it>is the major cause of community-acquired pyogenic infections in Taiwan. This retrospective study evaluated the clinical and microbiological characteristics of bacteremic community-acquired pneumonia due to <it>K. pneumoniae </it>in Taiwanese adults.</p> <p>Methods</p> <p>The clinical characteristics of bacteremic community-acquired pneumonia (CAP) in adults due to <it>K. pneumoniae </it>were compared to those of adults with bacteremic CAP due to <it>Streptococcus pneumoniae </it>at a tertiary medical center in Taiwan from 2001-2008. Risk factors for mortality of bacteremic CAP due to <it>K. pneumoniae </it>were analyzed. All clinical isolates of <it>K. pneumoniae </it>were examined for capsular serotypes, hypermucoviscosity phenotype, aerobactin and <it>rmpA </it>gene.</p> <p>Results</p> <p><it>K. pneumoniae </it>was the dominant cause of bacteremic CAP and was associated with a more fulminant course and a worse prognosis than bacteremic CAP due to <it>Streptococcus pneumoniae</it>. Initial presentation with septic shock and respiratory failure were independent risk factors for both early and total mortality. Serotype K1 and K2 comprised around half of all isolates. There were no significant differences in the clinical characteristics of patients with bacteremic CAP due to K1/K2 and non-K1/K2 isolates. Hypermucoviscosity phenotype as well as the aerobactin and <it>rmpA </it>genes were highly prevalent in the <it>K. pneumoniae </it>isolates.</p> <p>Conclusions</p> <p><it>K. pneumoniae </it>continued to be the dominant cause of bacteremic CAP in Taiwanese adults during 2001-2008. Initial presentation with septic shock and respiratory failure were independent risk factors for both early and total mortality from <it>K. pneumoniae </it>bacteremic CAP. Serotypes K1/K2 comprised around half of all isolates, but did not predispose patients to a poor clinical outcome. Physicians should be aware of the poor prognosis of any patient with bacteremic <it>K. pneumoniae </it>CAP and monitor these patients more closely.</p
Subaru/HiCIAO imaging of LkH 330 - multi-band detection of the gap and spiral-like structures
We present - and -bands observations of the LkH 330
disk with a multi-band detection of the large gap and spiral-like structures.
The morphology of the outer disk (0\farcs3) at PA=0--45 and
PA=180--290 are likely density wave-induced spirals and comparison
between our observational results and simulations suggests a planet formation.
We have also investigated the azimuthal profiles at the ring and the outer-disk
regions as well as radial profiles in the directions of the spiral-like
structures and semi-major axis. Azimuthal analysis shows a large variety in
wavelength and implies that the disk has non-axisymmetric dust distributions.
The radial profiles in the major-axis direction (PA=) suggest that
the outer region (r\geq0\farcs25) may be influenced by shadows of the inner
region of the disk. The spiral-like directions (PA=10 and 230)
show different radial profiles, which suggests that the surfaces of the
spiral-like structures are highly flared and/or have different dust properties.
Finally, a color-map of the disk shows a lack of an outer eastern region in the
-band disk, which may hint the presence of an inner object that casts a
directional shadow onto the disk.Comment: 12pages, 16 figures, 2 tables, accepted for publication in A
- …