11 research outputs found

    Zero-shot audio captioning with audio-language model guidance and audio context keywords

    Full text link
    Zero-shot audio captioning aims at automatically generating descriptive textual captions for audio content without prior training for this task. Different from speech recognition which translates audio content that contains spoken language into text, audio captioning is commonly concerned with ambient sounds, or sounds produced by a human performing an action. Inspired by zero-shot image captioning methods, we propose ZerAuCap, a novel framework for summarising such general audio signals in a text caption without requiring task-specific training. In particular, our framework exploits a pre-trained large language model (LLM) for generating the text which is guided by a pre-trained audio-language model to produce captions that describe the audio content. Additionally, we use audio context keywords that prompt the language model to generate text that is broadly relevant to sounds. Our proposed framework achieves state-of-the-art results in zero-shot audio captioning on the AudioCaps and Clotho datasets. Our code is available at https://github.com/ExplainableML/ZerAuCap.Comment: NeurIPS 2023 - Machine Learning for Audio Workshop (Oral

    CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations

    Get PDF
    Providing explanations in the context of Visual Question Answering (VQA) presents a fundamental problem in machine learning. To obtain detailed insights into the process of generating natural language explanations for VQA, we introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with natural language explanations. For each image-question pair in the CLEVR dataset, CLEVR-X contains multiple structured textual explanations which are derived from the original scene graphs. By construction, the CLEVR-X explanations are correct and describe the reasoning and visual information that is necessary to answer a given question. We conducted a user study to confirm that the ground-truth explanations in our proposed dataset are indeed complete and relevant. We present baseline results for generating natural language explanations in the context of VQA using two state-of-the-art frameworks on the CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation generation quality for different question and answer types. Additionally, we study the influence of using different numbers of ground-truth explanations on the convergence of natural language generation (NLG) metrics. The CLEVR-X dataset is publicly available at \url{https://explainableml.github.io/CLEVR-X/}

    In-Context Impersonation Reveals Large Language Models' Strengths and Biases

    Full text link
    In everyday conversations, humans can take on different roles and adapt their vocabulary to their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles when they generate text in-context. We ask LLMs to assume different personas before solving vision and language tasks. We do this by prefixing the prompt with a persona that is associated either with a social identity or domain expertise. In a multi-armed bandit task, we find that LLMs pretending to be children of different ages recover human-like developmental stages of exploration. In a language-based reasoning task, we find that LLMs impersonating domain experts perform better than LLMs impersonating non-domain experts. Finally, we test whether LLMs' impersonations are complementary to visual information when describing different categories. We find that impersonation can improve performance: an LLM prompted to be a bird expert describes birds better than one prompted to be a car expert. However, impersonation can also uncover LLMs' biases: an LLM prompted to be a man describes cars better than one prompted to be a woman. These findings demonstrate that LLMs are capable of taking on diverse roles and that this in-context impersonation can be used to uncover their hidden strengths and biases

    DIII-D research advancing the physics basis for optimizing the tokamak approach to fusion energy

    Get PDF
    Funding Information: This material is based upon work supported by the US Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under Awards DE-FC02-04ER54698 and DE-AC52-07NA27344. Publisher Copyright: © 2022 IAEA, Vienna.DIII-D physics research addresses critical challenges for the operation of ITER and the next generation of fusion energy devices. This is done through a focus on innovations to provide solutions for high performance long pulse operation, coupled with fundamental plasma physics understanding and model validation, to drive scenario development by integrating high performance core and boundary plasmas. Substantial increases in off-axis current drive efficiency from an innovative top launch system for EC power, and in pressure broadening for Alfven eigenmode control from a co-/counter-I p steerable off-axis neutral beam, all improve the prospects for optimization of future long pulse/steady state high performance tokamak operation. Fundamental studies into the modes that drive the evolution of the pedestal pressure profile and electron vs ion heat flux validate predictive models of pedestal recovery after ELMs. Understanding the physics mechanisms of ELM control and density pumpout by 3D magnetic perturbation fields leads to confident predictions for ITER and future devices. Validated modeling of high-Z shattered pellet injection for disruption mitigation, runaway electron dissipation, and techniques for disruption prediction and avoidance including machine learning, give confidence in handling disruptivity for future devices. For the non-nuclear phase of ITER, two actuators are identified to lower the L-H threshold power in hydrogen plasmas. With this physics understanding and suite of capabilities, a high poloidal beta optimized-core scenario with an internal transport barrier that projects nearly to Q = 10 in ITER at ∼8 MA was coupled to a detached divertor, and a near super H-mode optimized-pedestal scenario with co-I p beam injection was coupled to a radiative divertor. The hybrid core scenario was achieved directly, without the need for anomalous current diffusion, using off-axis current drive actuators. Also, a controller to assess proximity to stability limits and regulate β N in the ITER baseline scenario, based on plasma response to probing 3D fields, was demonstrated. Finally, innovative tokamak operation using a negative triangularity shape showed many attractive features for future pilot plant operation.Peer reviewe
    corecore