382 research outputs found
Multilingual Audio Captioning using machine translated data
Automated Audio Captioning (AAC) systems attempt to generate a natural
language sentence, a caption, that describes the content of an audio recording,
in terms of sound events. Existing datasets provide audio-caption pairs, with
captions written in English only. In this work, we explore multilingual AAC,
using machine translated captions. We translated automatically two prominent
AAC datasets, AudioCaps and Clotho, from English to French, German and Spanish.
We trained and evaluated monolingual systems in the four languages, on
AudioCaps and Clotho. In all cases, the models achieved similar performance,
about 75% CIDEr on AudioCaps and 43% on Clotho. In French, we acquired manual
captions of the AudioCaps eval subset. The French system, trained on the
machine translated version of AudioCaps, achieved significantly better results
on the manual eval subset, compared to the English system for which we
automatically translated the outputs to French. This advocates in favor of
building systems in a target language instead of simply translating to a target
language the English captions from the English system. Finally, we built a
multilingual model, which achieved results in each language comparable to each
monolingual system, while using much less parameters than using a collection of
monolingual systems
Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?
Automated Audio Captioning (AAC) aims to develop systems capable of
describing an audio recording using a textual sentence. In contrast, Audio-Text
Retrieval (ATR) systems seek to find the best matching audio recording(s) for a
given textual query (Text-to-Audio) or vice versa (Audio-to-Text). These tasks
require different types of systems: AAC employs a sequence-to-sequence model,
while ATR utilizes a ranking model that compares audio and text representations
within a shared projection subspace. However, this work investigates the
relationship between AAC and ATR by exploring the ATR capabilities of an
unmodified AAC system, without fine-tuning for the new task. Our AAC system
consists of an audio encoder (ConvNeXt-Tiny) trained on AudioSet for audio
tagging, and a transformer decoder responsible for generating sentences. For
AAC, it achieves a high SPIDEr-FL score of 0.298 on Clotho and 0.472 on
AudioCaps on average. For ATR, we propose using the standard Cross-Entropy loss
values obtained for any audio/caption pair. Experimental results on the Clotho
and AudioCaps datasets demonstrate decent recall values using this simple
approach. For instance, we obtained a Text-to-Audio R@1 value of 0.382 for
Au-dioCaps, which is above the current state-of-the-art method without external
data. Interestingly, we observe that normalizing the loss values was necessary
for Audio-to-Text retrieval.Comment: cam ready version (14/08/23
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
In this work, we propose to study the performance of a model trained with a
sentence embedding regression loss component for the Automated Audio Captioning
task. This task aims to build systems that can describe audio content with a
single sentence written in natural language. Most systems are trained with the
standard Cross-Entropy loss, which does not take into account the semantic
closeness of the sentence. We found that adding a sentence embedding loss term
reduces overfitting, but also increased SPIDEr from 0.397 to 0.418 in our first
setting on the AudioCaps corpus. When we increased the weight decay value, we
found our model to be much closer to the current state-of-the-art methods, with
a SPIDEr score up to 0.444 compared to a 0.475 score. Moreover, this model uses
eight times less trainable parameters. In this training setting, the sentence
embedding loss has no more impact on the model performance
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Automated Audio Captioning (AAC) involves generating natural language
descriptions of audio content, using encoder-decoder architectures. An audio
encoder produces audio embeddings fed to a decoder, usually a Transformer
decoder, for caption generation. In this work, we describe our model, which
novelty, compared to existing models, lies in the use of a ConvNeXt
architecture as audio encoder, adapted from the vision domain to audio
classification. This model, called CNext-trans, achieved state-of-the-art
scores on the AudioCaps (AC) dataset and performed competitively on Clotho
(CL), while using four to forty times fewer parameters than existing models. We
examine potential biases in the AC dataset due to its origin from AudioSet by
investigating unbiased encoder's impact on performance. Using the well-known
PANN's CNN14, for instance, as an unbiased encoder, we observed a 1.7% absolute
reduction in SPIDEr score (where higher scores indicate better performance). To
improve cross-dataset performance, we conducted experiments by combining
multiple AAC datasets (AC, CL, MACS, WavCaps) for training. Although this
strategy enhanced overall model performance across datasets, it still fell
short compared to models trained specifically on a single target dataset,
indicating the absence of a one-size-fits-all model. To mitigate performance
gaps between datasets, we introduced a Task Embedding (TE) token, allowing the
model to identify the source dataset for each input sample. We provide insights
into the impact of these TEs on both the form (words) and content (sound event
types) of the generated captions. The resulting model, named CoNeTTE, an
unbiased CNext-trans model enriched with dataset-specific Task Embeddings,
achieved SPIDEr scores of 44.1% and 30.5% on AC and CL, respectively. Code
available: https://github.com/Labbeti/conette-audio-captioning
A prequel to the Dantean Anomaly: The precipitation seesaw and droughts of 1302 to 1307 in Europe
The cold/wet anomaly of the 1310s ("Dantean Anomaly") has attracted a lot of attention from scholars, as it is commonly interpreted as a signal of the transition between the Medieval Climate Anomaly (MCA) and the Little Ice Age (LIA). The huge variability that can be observed during this decade, like the high interannual variability observed in the 1340s, has been highlighted as a side effect of this rapid climatic transition. In this paper, we demonstrate that a multiseasonal drought of almost 2 years occurred in the Mediterranean between 1302 and 1304, followed by a series of hot, dry summers north of the Alps from 1304 to 1306. We suggest that this outstanding dry anomaly, unique in the 13th and 14th centuries, together with cold anomalies of the 1310s and the 1340s, is part of the climatic shift from the MCA to the LIA. Our reconstruction of the predominant weather patterns of the first decade of the 14th century based on both documentary and proxy data identifies multiple European precipitation seesaw events between 1302 and 1307, with similarities to the seesaw conditions which prevailed over continental Europe in 2018. It can be debated to what extent the 1302 1307 period can be compared to what is currently discussed regarding the influence of the phenomenon of Arctic amplification on the increasing frequency of persistent stable weather patterns that have occurred since the late 1980s. Additionally, this paper deals with socioeconomic and cultural responses to drought risks in the Middle Ages as outlined in contemporary sources and provides evidence that there is a significant correlation between pronounced dry seasons and fires that devastated cities. © 2020 Copernicus GmbH. All rights reserved
Intrinsic colors and ages of extremely red elliptical galaxies at high redshift
In order to know the formation epoch of the oldest elliptical galaxies as a
function of mass and observed redshift, a statistical analysis for 333
extremely red objects (EROs) classified as old galaxies (OGs) at 0.8<z<2.3 is
carried out. Once we get M_V and (B-V) at rest for each galaxy, we calculate
the average variation of this intrinsic color with redshift and derive the
average age through a synthesis model (the code for the calculation of the age
has been made publicly available). The average gradient of the (B-V) color at
rest of EROs/OGs is 0.07-0.10 Gyr^{-1} for a fixed luminosity. The stars in
these extremely red elliptical galaxies were formed when the Universe was ~2
Gyr old on average. We have not found a significant enough dependence on the
observed redshift and stellar mass: dt_{formation}/dt_{observed}=-0.46+/-0.32,
dt_{formation}/(d log_10 M_*)=-0.81+/-0.98 Gyr. This fits a scenario in which
the stellar formation of the objects that we denominate as EROs-OGs is more
intense at higher redshifts, at which the stellar populations of the most
massive galaxies form earlier than or at the same time as less massive
galaxies.Comment: accepted to be published in A
Heterotrimeric G protein signaling functions with dynein to promote spindle positioning in C. elegans
Proper orientation and positioning of the mitotic spindle is essential for the correct segregation of fate determinants during asymmetric cell division. Although heterotrimeric G proteins and their regulators are essential for spindle positioning in many cell types, their mechanism of action remains unclear. In this study, we show that dyrb-1, which encodes a dynein light chain, provides a functional link between heterotrimeric G protein signaling and dynein activity during spindle positioning in Caenorhabditis elegans. Embryos depleted of dyrb-1 display phenotypes similar to a weak loss of function of dynein activity, indicating that DYRB-1 is a positive regulator of dynein. We find that the depletion of dyrb-1 enhances the spindle positioning defect of weak loss of function alleles of two regulators of G protein signaling, LIN-5 and GPR-1/2, and that DYRB-1 physically associates with these two proteins. These results indicate that dynein activity functions with regulators of G protein signaling to regulate common downstream effectors during spindle positioning in the early C. elegans embryo
There's No Free Lunch: On the Hardness of Choosing a Correct Big-M in Bilevel Optimization
One of the most frequently used approaches to solve linear bilevel optimization problems consists in replacing the lower-level problem with its Karush-Kuhn-Tucker (KKT) conditions and by reformulating the KKT complementarity conditions using techniques from mixed-integer linear optimization. The latter step requires to determine some big-M constant in order to bound the lower level's dual feasible set such that no bilevel-optimal solution is cut off. In practice, heuristics are often used to find a big-M although it is known that these approaches may fail. In this paper, we consider the hardness of two proxies for the above mentioned concept of a bilevel-correct big-M. First, we prove that verifying that a given big-M does not cut off any feasible vertex of the lower level's dual polyhedron cannot be done in polynomial time unless P=NP. Second, we show that verifying that a given big-M does not cut off any optimal point of the lower level's dual problem (for any point in the projection of the high-point relaxation onto the leader's decision space) is as hard as solving the original bilevel problem
Closing the gap in linear bilevel optimization: a new valid primal-dual inequality
Abstract
Linear bilevel optimization problems are often tackled by replacing the linear lower-level problem with its Karush–Kuhn–Tucker conditions. The resulting single-level problem can be solved in a branch-and-bound fashion by branching on the complementarity constraints of the lower-level problem’s optimality conditions. While in mixed-integer single-level optimization branch-and-cut has proven to be a powerful extension of branch-and-bound, in linear bilevel optimization not too many bilevel-tailored valid inequalities exist. In this paper, we briefly review existing cuts for linear bilevel problems and introduce a new valid inequality that exploits the strong duality condition of the lower level. We further discuss strengthened variants of the inequality that can be derived from McCormick envelopes. In a computational study, we show that the new valid inequalities can help to close the optimality gap very effectively on a large test set of linear bilevel instances
Forty Years of Erratic Insecticide Resistance Evolution in the Mosquito Culex pipiens
One view of adaptation is that it proceeds by the slow and steady accumulation of beneficial mutations with small effects. It is difficult to test this model, since in most cases the genetic basis of adaptation can only be studied a posteriori with traits that have evolved for a long period of time through an unknown sequence of steps. In this paper, we show how ace-1, a gene involved in resistance to organophosphorous insecticide in the mosquito Culex pipiens, has evolved during 40 years of an insecticide control program. Initially, a major resistance allele with strong deleterious side effects spread through the population. Later, a duplication combining a susceptible and a resistance ace-1 allele began to spread but did not replace the original resistance allele, as it is sublethal when homozygous. Last, a second duplication, (also sublethal when homozygous) began to spread because heterozygotes for the two duplications do not exhibit deleterious pleiotropic effects. Double overdominance now maintains these four alleles across treated and nontreated areas. Thus, ace-1 evolution does not proceed via the steady accumulation of beneficial mutations. Instead, resistance evolution has been an erratic combination of mutation, positive selection, and the rearrangement of existing variation leading to complex genetic architecture
- …