11 research outputs found
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Zero-shot audio captioning aims at automatically generating descriptive
textual captions for audio content without prior training for this task.
Different from speech recognition which translates audio content that contains
spoken language into text, audio captioning is commonly concerned with ambient
sounds, or sounds produced by a human performing an action. Inspired by
zero-shot image captioning methods, we propose ZerAuCap, a novel framework for
summarising such general audio signals in a text caption without requiring
task-specific training. In particular, our framework exploits a pre-trained
large language model (LLM) for generating the text which is guided by a
pre-trained audio-language model to produce captions that describe the audio
content. Additionally, we use audio context keywords that prompt the language
model to generate text that is broadly relevant to sounds. Our proposed
framework achieves state-of-the-art results in zero-shot audio captioning on
the AudioCaps and Clotho datasets. Our code is available at
https://github.com/ExplainableML/ZerAuCap.Comment: NeurIPS 2023 - Machine Learning for Audio Workshop (Oral
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Providing explanations in the context of Visual Question Answering (VQA)
presents a fundamental problem in machine learning. To obtain detailed insights
into the process of generating natural language explanations for VQA, we
introduce the large-scale CLEVR-X dataset that extends the CLEVR dataset with
natural language explanations. For each image-question pair in the CLEVR
dataset, CLEVR-X contains multiple structured textual explanations which are
derived from the original scene graphs. By construction, the CLEVR-X
explanations are correct and describe the reasoning and visual information that
is necessary to answer a given question. We conducted a user study to confirm
that the ground-truth explanations in our proposed dataset are indeed complete
and relevant. We present baseline results for generating natural language
explanations in the context of VQA using two state-of-the-art frameworks on the
CLEVR-X dataset. Furthermore, we provide a detailed analysis of the explanation
generation quality for different question and answer types. Additionally, we
study the influence of using different numbers of ground-truth explanations on
the convergence of natural language generation (NLG) metrics. The CLEVR-X
dataset is publicly available at
\url{https://explainableml.github.io/CLEVR-X/}
In-Context Impersonation Reveals Large Language Models' Strengths and Biases
In everyday conversations, humans can take on different roles and adapt their
vocabulary to their chosen roles. We explore whether LLMs can take on, that is
impersonate, different roles when they generate text in-context. We ask LLMs to
assume different personas before solving vision and language tasks. We do this
by prefixing the prompt with a persona that is associated either with a social
identity or domain expertise. In a multi-armed bandit task, we find that LLMs
pretending to be children of different ages recover human-like developmental
stages of exploration. In a language-based reasoning task, we find that LLMs
impersonating domain experts perform better than LLMs impersonating non-domain
experts. Finally, we test whether LLMs' impersonations are complementary to
visual information when describing different categories. We find that
impersonation can improve performance: an LLM prompted to be a bird expert
describes birds better than one prompted to be a car expert. However,
impersonation can also uncover LLMs' biases: an LLM prompted to be a man
describes cars better than one prompted to be a woman. These findings
demonstrate that LLMs are capable of taking on diverse roles and that this
in-context impersonation can be used to uncover their hidden strengths and
biases
DIII-D research advancing the physics basis for optimizing the tokamak approach to fusion energy
Funding Information: This material is based upon work supported by the US Department of Energy, Office of Science, Office of Fusion Energy Sciences, using the DIII-D National Fusion Facility, a DOE Office of Science user facility, under Awards DE-FC02-04ER54698 and DE-AC52-07NA27344. Publisher Copyright: © 2022 IAEA, Vienna.DIII-D physics research addresses critical challenges for the operation of ITER and the next generation of fusion energy devices. This is done through a focus on innovations to provide solutions for high performance long pulse operation, coupled with fundamental plasma physics understanding and model validation, to drive scenario development by integrating high performance core and boundary plasmas. Substantial increases in off-axis current drive efficiency from an innovative top launch system for EC power, and in pressure broadening for Alfven eigenmode control from a co-/counter-I p steerable off-axis neutral beam, all improve the prospects for optimization of future long pulse/steady state high performance tokamak operation. Fundamental studies into the modes that drive the evolution of the pedestal pressure profile and electron vs ion heat flux validate predictive models of pedestal recovery after ELMs. Understanding the physics mechanisms of ELM control and density pumpout by 3D magnetic perturbation fields leads to confident predictions for ITER and future devices. Validated modeling of high-Z shattered pellet injection for disruption mitigation, runaway electron dissipation, and techniques for disruption prediction and avoidance including machine learning, give confidence in handling disruptivity for future devices. For the non-nuclear phase of ITER, two actuators are identified to lower the L-H threshold power in hydrogen plasmas. With this physics understanding and suite of capabilities, a high poloidal beta optimized-core scenario with an internal transport barrier that projects nearly to Q = 10 in ITER at ∼8 MA was coupled to a detached divertor, and a near super H-mode optimized-pedestal scenario with co-I p beam injection was coupled to a radiative divertor. The hybrid core scenario was achieved directly, without the need for anomalous current diffusion, using off-axis current drive actuators. Also, a controller to assess proximity to stability limits and regulate β N in the ITER baseline scenario, based on plasma response to probing 3D fields, was demonstrated. Finally, innovative tokamak operation using a negative triangularity shape showed many attractive features for future pilot plant operation.Peer reviewe