919 research outputs found

    CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models

    Full text link
    Large Language Models (LLMs) have made significant progress in utilizing tools, but their ability is limited by API availability and the instability of implicit reasoning, particularly when both planning and execution are involved. To overcome these limitations, we propose CREATOR, a novel framework that enables LLMs to create their own tools using documentation and code realization. CREATOR disentangles abstract tool creation and concrete decision execution, resulting in improved performance. We evaluate CREATOR on MATH and TabMWP benchmarks, respectively consisting of challenging math competition problems and diverse tabular contents. Remarkably, CREATOR outperforms existing chain-of-thought, program-of-thought, and tool-using baselines. Additionally, we introduce the Creation Challenge dataset, featuring 2K diverse questions, to emphasize the necessity and benefits of LLMs' tool creation ability. Further research demonstrates that leveraging LLMs as tool creators facilitates knowledge transfer, and LLMs exhibit varying levels of tool creation abilities, enabling them to adapt to diverse situations. The tool creation ability revolutionizes the LLM's problem-solving paradigm, driving us closer to the next frontier of artificial intelligence. All the codes and data are released

    CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets

    Full text link
    Large language models (LLMs) are often augmented with tools to solve complex tasks. By generating code snippets and executing them through task-specific Application Programming Interfaces (APIs), they can offload certain functions to dedicated external modules, such as image encoding and performing calculations. However, most existing approaches to augment LLMs with tools are constrained by general-purpose APIs and lack the flexibility for tailoring them to specific tasks. In this work, we present CRAFT, a general tool creation and retrieval framework for LLMs. It creates toolsets specifically curated for the tasks and equips LLMs with a component that retrieves tools from these sets to enhance their capability to solve complex tasks. For each task, we collect specific code solutions by prompting GPT-4 to solve the training examples. Following a validation step ensuring the correctness, these solutions are abstracted into code snippets to enhance reusability, and deduplicated for higher quality. At inference time, the language model retrieves snippets from the toolsets and then executes them or generates the output conditioning on the retrieved snippets. Our method is designed to be flexible and offers a plug-and-play approach to adapt off-the-shelf LLMs to unseen domains and modalities, without any finetuning. Experiments on vision-language, tabular processing, and mathematical reasoning tasks show that our approach achieves substantial improvements compared to strong baselines. In addition, our in-depth analysis reveals that: (1) consistent performance improvement can be achieved by scaling up the number of tools and the capability of the backbone models; (2) each component of our approach contributes to the performance gains; (3) the created tools are well-structured and reliable with low complexity and atomicity. The code is available at https://github.com/lifan-yuan/CRAFT.Comment: Accepted to ICLR 2024. Code is available at https://github.com/lifan-yuan/CRAF

    NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly

    Full text link
    Norm discovery is important for understanding and reasoning about the acceptable behaviors and potential violations in human communication and interactions. We introduce NormSage, a framework for addressing the novel task of conversation-grounded multi-lingual, multi-cultural norm discovery, based on language model prompting and self-verification. NormSAGE leverages the expressiveness and implicit knowledge of the pretrained GPT-3 language model backbone, to elicit knowledge about norms through directed questions representing the norm discovery task and conversation context. It further addresses the risk of language model hallucination with a self-verification mechanism ensuring that the norms discovered are correct and are substantially grounded to their source conversations. Evaluation results show that our approach discovers significantly more relevant and insightful norms for conversations on-the-fly compared to baselines (>10+% in Likert scale rating). The norms discovered from Chinese conversation are also comparable to the norms discovered from English conversation in terms of insightfulness and correctness (<3% difference). In addition, the culture-specific norms are promising quality, allowing for 80% accuracy in culture pair human identification. Finally, our grounding process in norm discovery self-verification can be extended for instantiating the adherence and violation of any norm for a given conversation on-the-fly, with explainability and transparency. NormSAGE achieves an AUC of 95.4% in grounding, with natural language explanation matching human-written quality

    Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

    Full text link
    Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains.Comment: Accepted by Findings of ACL 202

    Quantum Hacking: Experimental demonstration of time-shift attack against practical quantum key distribution systems

    Full text link
    Quantum key distribution (QKD) systems can send signals over more than 100 km standard optical fiber and are widely believed to be secure. Here, we show experimentally for the first time a technologically feasible attack, namely the time-shift attack, against a commercial QKD system. Our result shows that, contrary to popular belief, an eavesdropper, Eve, has a non-negligible probability (~4%) to break the security of the system. Eve's success is due to the well-known detection efficiency loophole in the experimental testing of Bell inequalities. Therefore, the detection efficiency loophole plays a key role not only in fundamental physics, but also in technological applications such as QKD.Comment: 5 pages, 3 figures. Substantially revised versio

    Defining a New NLP Playground

    Full text link
    The recent explosion of performance of large language models (LLMs) has changed the field of Natural Language Processing (NLP) more abruptly and seismically than any other shift in the field's 80-year history. This has resulted in concerns that the field will become homogenized and resource-intensive. The new status quo has put many academic researchers, especially PhD students, at a disadvantage. This paper aims to define a new NLP playground by proposing 20+ PhD-dissertation-worthy research directions, covering theoretical analysis, new and challenging problems, learning paradigms, and interdisciplinary applications.Comment: EMNLP Findings 2023 "Theme Track: Large Language Models and the Future of NLP

    NewsClaims: A New Benchmark for Claim Detection from News with Attribute Knowledge

    Full text link
    Claim detection and verification are crucial for news understanding and have emerged as promising technologies for mitigating news misinformation. However, most existing work has focused on claim sentence analysis while overlooking crucial background attributes (e.g., claimer, claim objects). In this work, we present NewsClaims, a new benchmark for knowledge-aware claim detection in the news domain. We redefine the claim detection problem to include extraction of additional background attributes related to each claim and release 889 claims annotated over 143 news articles. NewsClaims aims to benchmark claim detection systems in emerging scenarios, comprising unseen topics with little or no training data. To this end, we provide a comprehensive evaluation of zero-shot and prompt-based baselines for NewsClaims.Comment: Preprin

    If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

    Full text link
    The prominent large language models (LLMs) of today differ from past language models not only in size, but also in the fact that they are trained on a combination of natural language and formal language (code). As a medium between humans and computers, code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity. In this survey, we present an overview of the various benefits of integrating code into LLMs' training data. Specifically, beyond enhancing LLMs in code generation, we observe that these unique properties of code help (i) unlock the reasoning ability of LLMs, enabling their applications to a range of more complex natural language tasks; (ii) steer LLMs to produce structured and precise intermediate steps, which can then be connected to external execution ends through function calls; and (iii) take advantage of code compilation and execution environment, which also provides diverse feedback for model improvement. In addition, we trace how these profound capabilities of LLMs, brought by code, have led to their emergence as intelligent agents (IAs) in situations where the ability to understand instructions, decompose goals, plan and execute actions, and refine from feedback are crucial to their success on downstream tasks. Finally, we present several key challenges and future directions of empowering LLMs with code

    Bacteremic community-acquired pneumonia due to Klebsiella pneumoniae: Clinical and microbiological characteristics in Taiwan, 2001-2008

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Klebsiella pneumoniae </it>is the major cause of community-acquired pyogenic infections in Taiwan. This retrospective study evaluated the clinical and microbiological characteristics of bacteremic community-acquired pneumonia due to <it>K. pneumoniae </it>in Taiwanese adults.</p> <p>Methods</p> <p>The clinical characteristics of bacteremic community-acquired pneumonia (CAP) in adults due to <it>K. pneumoniae </it>were compared to those of adults with bacteremic CAP due to <it>Streptococcus pneumoniae </it>at a tertiary medical center in Taiwan from 2001-2008. Risk factors for mortality of bacteremic CAP due to <it>K. pneumoniae </it>were analyzed. All clinical isolates of <it>K. pneumoniae </it>were examined for capsular serotypes, hypermucoviscosity phenotype, aerobactin and <it>rmpA </it>gene.</p> <p>Results</p> <p><it>K. pneumoniae </it>was the dominant cause of bacteremic CAP and was associated with a more fulminant course and a worse prognosis than bacteremic CAP due to <it>Streptococcus pneumoniae</it>. Initial presentation with septic shock and respiratory failure were independent risk factors for both early and total mortality. Serotype K1 and K2 comprised around half of all isolates. There were no significant differences in the clinical characteristics of patients with bacteremic CAP due to K1/K2 and non-K1/K2 isolates. Hypermucoviscosity phenotype as well as the aerobactin and <it>rmpA </it>genes were highly prevalent in the <it>K. pneumoniae </it>isolates.</p> <p>Conclusions</p> <p><it>K. pneumoniae </it>continued to be the dominant cause of bacteremic CAP in Taiwanese adults during 2001-2008. Initial presentation with septic shock and respiratory failure were independent risk factors for both early and total mortality from <it>K. pneumoniae </it>bacteremic CAP. Serotypes K1/K2 comprised around half of all isolates, but did not predispose patients to a poor clinical outcome. Physicians should be aware of the poor prognosis of any patient with bacteremic <it>K. pneumoniae </it>CAP and monitor these patients more closely.</p

    Subaru/HiCIAO HKsHK_{\rm s} imaging of LkHα\alpha 330 - multi-band detection of the gap and spiral-like structures

    Full text link
    We present HH- and KsK_{\rm s}-bands observations of the LkHα\alpha 330 disk with a multi-band detection of the large gap and spiral-like structures. The morphology of the outer disk (r∼r\sim0\farcs3) at PA=0--45∘^\circ and PA=180--290∘^\circ are likely density wave-induced spirals and comparison between our observational results and simulations suggests a planet formation. We have also investigated the azimuthal profiles at the ring and the outer-disk regions as well as radial profiles in the directions of the spiral-like structures and semi-major axis. Azimuthal analysis shows a large variety in wavelength and implies that the disk has non-axisymmetric dust distributions. The radial profiles in the major-axis direction (PA=271∘271^\circ) suggest that the outer region (r\geq0\farcs25) may be influenced by shadows of the inner region of the disk. The spiral-like directions (PA=10∘^\circ and 230∘^\circ) show different radial profiles, which suggests that the surfaces of the spiral-like structures are highly flared and/or have different dust properties. Finally, a color-map of the disk shows a lack of an outer eastern region in the HH-band disk, which may hint the presence of an inner object that casts a directional shadow onto the disk.Comment: 12pages, 16 figures, 2 tables, accepted for publication in A
    • …
    corecore