382 research outputs found

    Multilingual Audio Captioning using machine translated data

    Full text link
    Automated Audio Captioning (AAC) systems attempt to generate a natural language sentence, a caption, that describes the content of an audio recording, in terms of sound events. Existing datasets provide audio-caption pairs, with captions written in English only. In this work, we explore multilingual AAC, using machine translated captions. We translated automatically two prominent AAC datasets, AudioCaps and Clotho, from English to French, German and Spanish. We trained and evaluated monolingual systems in the four languages, on AudioCaps and Clotho. In all cases, the models achieved similar performance, about 75% CIDEr on AudioCaps and 43% on Clotho. In French, we acquired manual captions of the AudioCaps eval subset. The French system, trained on the machine translated version of AudioCaps, achieved significantly better results on the manual eval subset, compared to the English system for which we automatically translated the outputs to French. This advocates in favor of building systems in a target language instead of simply translating to a target language the English captions from the English system. Finally, we built a multilingual model, which achieved results in each language comparable to each monolingual system, while using much less parameters than using a collection of monolingual systems

    Killing two birds with one stone: Can an audio captioning system also be used for audio-text retrieval?

    Full text link
    Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to find the best matching audio recording(s) for a given textual query (Text-to-Audio) or vice versa (Audio-to-Text). These tasks require different types of systems: AAC employs a sequence-to-sequence model, while ATR utilizes a ranking model that compares audio and text representations within a shared projection subspace. However, this work investigates the relationship between AAC and ATR by exploring the ATR capabilities of an unmodified AAC system, without fine-tuning for the new task. Our AAC system consists of an audio encoder (ConvNeXt-Tiny) trained on AudioSet for audio tagging, and a transformer decoder responsible for generating sentences. For AAC, it achieves a high SPIDEr-FL score of 0.298 on Clotho and 0.472 on AudioCaps on average. For ATR, we propose using the standard Cross-Entropy loss values obtained for any audio/caption pair. Experimental results on the Clotho and AudioCaps datasets demonstrate decent recall values using this simple approach. For instance, we obtained a Text-to-Audio R@1 value of 0.382 for Au-dioCaps, which is above the current state-of-the-art method without external data. Interestingly, we observe that normalizing the loss values was necessary for Audio-to-Text retrieval.Comment: cam ready version (14/08/23

    Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer

    Full text link
    In this work, we propose to study the performance of a model trained with a sentence embedding regression loss component for the Automated Audio Captioning task. This task aims to build systems that can describe audio content with a single sentence written in natural language. Most systems are trained with the standard Cross-Entropy loss, which does not take into account the semantic closeness of the sentence. We found that adding a sentence embedding loss term reduces overfitting, but also increased SPIDEr from 0.397 to 0.418 in our first setting on the AudioCaps corpus. When we increased the weight decay value, we found our model to be much closer to the current state-of-the-art methods, with a SPIDEr score up to 0.444 compared to a 0.475 score. Moreover, this model uses eight times less trainable parameters. In this training setting, the sentence embedding loss has no more impact on the model performance

    CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

    Full text link
    Automated Audio Captioning (AAC) involves generating natural language descriptions of audio content, using encoder-decoder architectures. An audio encoder produces audio embeddings fed to a decoder, usually a Transformer decoder, for caption generation. In this work, we describe our model, which novelty, compared to existing models, lies in the use of a ConvNeXt architecture as audio encoder, adapted from the vision domain to audio classification. This model, called CNext-trans, achieved state-of-the-art scores on the AudioCaps (AC) dataset and performed competitively on Clotho (CL), while using four to forty times fewer parameters than existing models. We examine potential biases in the AC dataset due to its origin from AudioSet by investigating unbiased encoder's impact on performance. Using the well-known PANN's CNN14, for instance, as an unbiased encoder, we observed a 1.7% absolute reduction in SPIDEr score (where higher scores indicate better performance). To improve cross-dataset performance, we conducted experiments by combining multiple AAC datasets (AC, CL, MACS, WavCaps) for training. Although this strategy enhanced overall model performance across datasets, it still fell short compared to models trained specifically on a single target dataset, indicating the absence of a one-size-fits-all model. To mitigate performance gaps between datasets, we introduced a Task Embedding (TE) token, allowing the model to identify the source dataset for each input sample. We provide insights into the impact of these TEs on both the form (words) and content (sound event types) of the generated captions. The resulting model, named CoNeTTE, an unbiased CNext-trans model enriched with dataset-specific Task Embeddings, achieved SPIDEr scores of 44.1% and 30.5% on AC and CL, respectively. Code available: https://github.com/Labbeti/conette-audio-captioning

    A prequel to the Dantean Anomaly: The precipitation seesaw and droughts of 1302 to 1307 in Europe

    Get PDF
    The cold/wet anomaly of the 1310s ("Dantean Anomaly") has attracted a lot of attention from scholars, as it is commonly interpreted as a signal of the transition between the Medieval Climate Anomaly (MCA) and the Little Ice Age (LIA). The huge variability that can be observed during this decade, like the high interannual variability observed in the 1340s, has been highlighted as a side effect of this rapid climatic transition. In this paper, we demonstrate that a multiseasonal drought of almost 2 years occurred in the Mediterranean between 1302 and 1304, followed by a series of hot, dry summers north of the Alps from 1304 to 1306. We suggest that this outstanding dry anomaly, unique in the 13th and 14th centuries, together with cold anomalies of the 1310s and the 1340s, is part of the climatic shift from the MCA to the LIA. Our reconstruction of the predominant weather patterns of the first decade of the 14th century based on both documentary and proxy data identifies multiple European precipitation seesaw events between 1302 and 1307, with similarities to the seesaw conditions which prevailed over continental Europe in 2018. It can be debated to what extent the 1302 1307 period can be compared to what is currently discussed regarding the influence of the phenomenon of Arctic amplification on the increasing frequency of persistent stable weather patterns that have occurred since the late 1980s. Additionally, this paper deals with socioeconomic and cultural responses to drought risks in the Middle Ages as outlined in contemporary sources and provides evidence that there is a significant correlation between pronounced dry seasons and fires that devastated cities. © 2020 Copernicus GmbH. All rights reserved

    Intrinsic colors and ages of extremely red elliptical galaxies at high redshift

    Full text link
    In order to know the formation epoch of the oldest elliptical galaxies as a function of mass and observed redshift, a statistical analysis for 333 extremely red objects (EROs) classified as old galaxies (OGs) at 0.8<z<2.3 is carried out. Once we get M_V and (B-V) at rest for each galaxy, we calculate the average variation of this intrinsic color with redshift and derive the average age through a synthesis model (the code for the calculation of the age has been made publicly available). The average gradient of the (B-V) color at rest of EROs/OGs is 0.07-0.10 Gyr^{-1} for a fixed luminosity. The stars in these extremely red elliptical galaxies were formed when the Universe was ~2 Gyr old on average. We have not found a significant enough dependence on the observed redshift and stellar mass: dt_{formation}/dt_{observed}=-0.46+/-0.32, dt_{formation}/(d log_10 M_*)=-0.81+/-0.98 Gyr. This fits a scenario in which the stellar formation of the objects that we denominate as EROs-OGs is more intense at higher redshifts, at which the stellar populations of the most massive galaxies form earlier than or at the same time as less massive galaxies.Comment: accepted to be published in A

    Heterotrimeric G protein signaling functions with dynein to promote spindle positioning in C. elegans

    Get PDF
    Proper orientation and positioning of the mitotic spindle is essential for the correct segregation of fate determinants during asymmetric cell division. Although heterotrimeric G proteins and their regulators are essential for spindle positioning in many cell types, their mechanism of action remains unclear. In this study, we show that dyrb-1, which encodes a dynein light chain, provides a functional link between heterotrimeric G protein signaling and dynein activity during spindle positioning in Caenorhabditis elegans. Embryos depleted of dyrb-1 display phenotypes similar to a weak loss of function of dynein activity, indicating that DYRB-1 is a positive regulator of dynein. We find that the depletion of dyrb-1 enhances the spindle positioning defect of weak loss of function alleles of two regulators of G protein signaling, LIN-5 and GPR-1/2, and that DYRB-1 physically associates with these two proteins. These results indicate that dynein activity functions with regulators of G protein signaling to regulate common downstream effectors during spindle positioning in the early C. elegans embryo

    There's No Free Lunch: On the Hardness of Choosing a Correct Big-M in Bilevel Optimization

    Get PDF
    One of the most frequently used approaches to solve linear bilevel optimization problems consists in replacing the lower-level problem with its Karush-Kuhn-Tucker (KKT) conditions and by reformulating the KKT complementarity conditions using techniques from mixed-integer linear optimization. The latter step requires to determine some big-M constant in order to bound the lower level's dual feasible set such that no bilevel-optimal solution is cut off. In practice, heuristics are often used to find a big-M although it is known that these approaches may fail. In this paper, we consider the hardness of two proxies for the above mentioned concept of a bilevel-correct big-M. First, we prove that verifying that a given big-M does not cut off any feasible vertex of the lower level's dual polyhedron cannot be done in polynomial time unless P=NP. Second, we show that verifying that a given big-M does not cut off any optimal point of the lower level's dual problem (for any point in the projection of the high-point relaxation onto the leader's decision space) is as hard as solving the original bilevel problem

    Closing the gap in linear bilevel optimization: a new valid primal-dual inequality

    Get PDF
    Abstract Linear bilevel optimization problems are often tackled by replacing the linear lower-level problem with its Karush–Kuhn–Tucker conditions. The resulting single-level problem can be solved in a branch-and-bound fashion by branching on the complementarity constraints of the lower-level problem’s optimality conditions. While in mixed-integer single-level optimization branch-and-cut has proven to be a powerful extension of branch-and-bound, in linear bilevel optimization not too many bilevel-tailored valid inequalities exist. In this paper, we briefly review existing cuts for linear bilevel problems and introduce a new valid inequality that exploits the strong duality condition of the lower level. We further discuss strengthened variants of the inequality that can be derived from McCormick envelopes. In a computational study, we show that the new valid inequalities can help to close the optimality gap very effectively on a large test set of linear bilevel instances

    Forty Years of Erratic Insecticide Resistance Evolution in the Mosquito Culex pipiens

    Get PDF
    One view of adaptation is that it proceeds by the slow and steady accumulation of beneficial mutations with small effects. It is difficult to test this model, since in most cases the genetic basis of adaptation can only be studied a posteriori with traits that have evolved for a long period of time through an unknown sequence of steps. In this paper, we show how ace-1, a gene involved in resistance to organophosphorous insecticide in the mosquito Culex pipiens, has evolved during 40 years of an insecticide control program. Initially, a major resistance allele with strong deleterious side effects spread through the population. Later, a duplication combining a susceptible and a resistance ace-1 allele began to spread but did not replace the original resistance allele, as it is sublethal when homozygous. Last, a second duplication, (also sublethal when homozygous) began to spread because heterozygotes for the two duplications do not exhibit deleterious pleiotropic effects. Double overdominance now maintains these four alleles across treated and nontreated areas. Thus, ace-1 evolution does not proceed via the steady accumulation of beneficial mutations. Instead, resistance evolution has been an erratic combination of mutation, positive selection, and the rearrangement of existing variation leading to complex genetic architecture
    • …
    corecore