36 research outputs found
Listening to features
This work explores nonparametric methods which aim at synthesizing audio from
low-dimensionnal acoustic features typically used in MIR frameworks. Several
issues prevent this task to be straightforwardly achieved. Such features are
designed for analysis and not for synthesis, thus favoring high-level
description over easily inverted acoustic representation. Whereas some previous
studies already considered the problem of synthesizing audio from features such
as Mel-Frequency Cepstral Coefficients, they mainly relied on the explicit
formula used to compute those features in order to inverse them. Here, we
instead adopt a simple blind approach, where arbitrary sets of features can be
used during synthesis and where reconstruction is exemplar-based. After testing
the approach on a speech synthesis from well known features problem, we apply
it to the more complex task of inverting songs from the Million Song Dataset.
What makes this task harder is twofold. First, that features are irregularly
spaced in the temporal domain according to an onset-based segmentation. Second
the exact method used to compute these features is unknown, although the
features for new audio can be computed using their API as a black-box. In this
paper, we detail these difficulties and present a framework to nonetheless
attempting such synthesis by concatenating audio samples from a training
dataset, whose features have been computed beforehand. Samples are selected at
the segment level, in the feature space with a simple nearest neighbor search.
Additionnal constraints can then be defined to enhance the synthesis
pertinence. Preliminary experiments are presented using RWC and GTZAN audio
datasets to synthesize tracks from the Million Song Dataset.Comment: Technical Repor
Hydrogen emissions from Erebus volcano, Antarctica
International audienceThe continuous measurement of molecular hydrogen (H2) emissions from passively degassing volcanoes has recently been made possible using a new generation of low-cost electrochemical sensors. We have used such sensors to measure H2, along with SO2, H2O and CO2, in the gas and aerosol plume emitted from the phonolite lava lake at Erebus volcano, Antarctica. The measurements were made at the crater rim between December 2010 and January 2011. Combined with measurements of the long-term SO2 emission rate for Erebus, they indicate a characteristic H2 flux of 0.03 kg s-1 (2.8 Mg day-1). The observed H2 content in the plume is consistent with previous estimates of redox conditions in the lava lake inferred from mineral compositions and the observed CO2/CO ratio in the gas plume (∼0.9 log units below the quartz-fayalite-magnetite buffer). These measurements suggest that H2 does not combust at the surface of the lake, and that H2 is kinetically inert in the gas/aerosol plume, retaining the signature of the high-temperature chemical equilibrium reached in the lava lake. We also observe a cyclical variation in the H2/SO2 ratio with a period of ∼10 min. These cycles correspond to oscillatory patterns of surface motion of the lava lake that have been interpreted as signs of a pulsatory magma supply at the top of the magmatic conduit
Matching Pursuits with Random Sequential Subdictionaries
Matching pursuits are a class of greedy algorithms commonly used in signal
processing, for solving the sparse approximation problem. They rely on an atom
selection step that requires the calculation of numerous projections, which can
be computationally costly for large dictionaries and burdens their
competitiveness in coding applications. We propose using a non adaptive random
sequence of subdictionaries in the decomposition process, thus parsing a large
dictionary in a probabilistic fashion with no additional projection cost nor
parameter estimation. A theoretical modeling based on order statistics is
provided, along with experimental evidence showing that the novel algorithm can
be efficiently used on sparse approximation problems. An application to audio
signal compression with multiscale time-frequency dictionaries is presented,
along with a discussion of the complexity and practical implementations.Comment: 20 pages - accepted 2nd April 2012 at Elsevier Signal Processin
Audio Signal Representations for Factorization in the sparse domain
International audienceIn this paper, a new class of audio representations is introduced, together with a corresponding fast decomposition algorithm. The main feature of these representations is that they are both sparse and approximately shift-invariant, which allows similarity search in a sparse domain. The common sparse support of detected similar patterns is then used to factorize their representations. The potential of this method for simultaneous structural analysis and compressing tasks is illustrated by preliminary experiments on simple musical data
FastGAE: Scalable Graph Autoencoders with Stochastic Subgraph Decoding
Graph autoencoders (AE) and variational autoencoders (VAE) are powerful node
embedding methods, but suffer from scalability issues. In this paper, we
introduce FastGAE, a general framework to scale graph AE and VAE to large
graphs with millions of nodes and edges. Our strategy, based on stochastic
subgraph decoding, significantly speeds up the training of graph AE and VAE
while preserving or even improving performances. We demonstrate the
effectiveness of FastGAE on various real-world graphs, outperforming the few
existing approaches to scale graph AE and VAE by a wide margin
Représentations redondantes et hiérarchiques pour l'archivage et la compression de scènes sonores
L'objet de cette thèse est l'analyse et le traitement automatique de grands volumes de données audio. Plus particulièrement, on s'intéresse à l'archivage, tâche qui regroupe, au moins, deux problématiques: la compression des données, et l'indexation du contenu de celles-ci. Ces deux problématiques définissent chacune des objectifs, parfois concurrents, dont la prise en compte simultanée s'avère donc difficile. Au centre de cette thèse, il y a donc la volonté de construire un cadre cohérent à la fois pour la compression et pour l'indexation d'archives sonores. Les représentations parcimonieuses de signaux dans des dictionnaires redondants ont récemment montré leur capacité à remplir une telle fonction. Leurs propriétés ainsi que les méthodes et algorithmes permettant de les obtenir sont donc étudiés dans une première partie de cette thèse. Le cadre applicatif relativement contraignant (volume des données) va nous amener à choisir parmi ces derniers des algorithmes itératifs, appelés également gloutons. Une première contribution de cette thèse consiste en la proposition de variantes du célèbre Matching Pursuit basées sur un sous-échantillonnage aléatoire et dynamique de dictionnaires. L'adaptation au cas de dictionnaires temps-fréquence structurés (union de bases de cosinus locaux) nous permet d'espérer une amélioration significative des performances en compression de scènes sonores. Ces nouveaux algorithmes s'accompagnent d'une modélisation statistique originale des propriétés de convergence usant d'outils empruntés à la théorie des valeurs extrêmes. Les autres contributions de cette thèse s'attaquent au second membre du problème d'archivage: l'indexation. Le même cadre est cette fois-ci envisagé pour mettre à jour les différents niveaux de structuration des données. Au premier plan, la détection de redondances et répétitions. A grande échelle, un système robuste de détection de motifs récurrents dans un flux radiophonique par comparaison d'empreintes est proposé. Ses performances comparatives sur une campagne d'évaluation du projet QUAERO confirment la pertinence de cette approche. L'exploitation des structures pour un contexte autre que la compression est également envisagé. Nous proposons en particulier une application à la séparation de sources informée par la redondance pour illustrer la variété de traitements que le cadre choisi autorise. La synthèse des différents éléments permet alors d'envisager un système d'archivage répondant aux contraintes par la hiérarchisation des objectifs et des traitements.The main goal of this work is automated processing of large volumes of audio data. Most specifically, one is interested in archiving, a process that encompass at least two distinct problems: data compression and data indexing. Jointly addressing these problems is a difficult task since many of their objectives may be concurrent. Therefore, building a consistent framework for audio archival is the matter of this thesis. Sparse representations of signals in redundant dictionaries have recently been found of interest for many sub-problems of the archival task. Sparsity is a desirable property both for compression and for indexing. Methods and algorithms to build such representations are the first topic of this thesis. Given the dimensionality of the considered data, greedy algorithms will be particularly studied. A first contribution of this thesis is the proposal of a variant of the famous Matching Pursuit algorithm, that exploits randomness and sub-sampling of very large time frequency dictionaries. We show that audio compression (especially at low bit-rate) can be improved using this method. This new algorithms comes with an original modeling of asymptotic pursuit behaviors, using order statistics and tools from extreme values theory. Other contributions deal with the second member of the archival problem: indexing. The same framework is used and applied to different layers of signal structures. First, redundancies and musical repetition detection is addressed. At larger scale, we investigate audio fingerprinting schemes and apply it to radio broadcast on-line segmentation. Performances have been evaluated during an international campaign within the QUAERO project. Finally, the same framework is used to perform source separation informed by the redundancy. All these elements validate the proposed framework for the audio archiving task. The layered structures of audio data are accessed hierarchically by greedy decomposition algorithms and allow processing the different objectives of archival at different steps, thus addressing them within the same framework.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF
Explainability in Music Recommender Systems
The most common way to listen to recorded music nowadays is via streaming
platforms which provide access to tens of millions of tracks. To assist users
in effectively browsing these large catalogs, the integration of Music
Recommender Systems (MRSs) has become essential. Current real-world MRSs are
often quite complex and optimized for recommendation accuracy. They combine
several building blocks based on collaborative filtering and content-based
recommendation. This complexity can hinder the ability to explain
recommendations to end users, which is particularly important for
recommendations perceived as unexpected or inappropriate. While pure
recommendation performance often correlates with user satisfaction,
explainability has a positive impact on other factors such as trust and
forgiveness, which are ultimately essential to maintain user loyalty.
In this article, we discuss how explainability can be addressed in the
context of MRSs. We provide perspectives on how explainability could improve
music recommendation algorithms and enhance user experience. First, we review
common dimensions and goals of recommenders' explainability and in general of
eXplainable Artificial Intelligence (XAI), and elaborate on the extent to which
these apply -- or need to be adapted -- to the specific characteristics of
music consumption and recommendation. Then, we show how explainability
components can be integrated within a MRS and in what form explanations can be
provided. Since the evaluation of explanation quality is decoupled from pure
accuracy-based evaluation criteria, we also discuss requirements and strategies
for evaluating explanations of music recommendations. Finally, we describe the
current challenges for introducing explainability within a large-scale
industrial music recommender system and provide research perspectives.Comment: To appear in AI Magazine, Special Topic on Recommender Systems 202
WASABI: a Two Million Song Database Project with Audio and Cultural Metadata plus WebAudio enhanced Client Applications
This paper presents the WASABI project, started in 2017, which aims at (1) the construction of a 2 million song knowledge base that combines metadata collected from music databases on the Web, metadata resulting from the analysis of song lyrics, and metadata resulting from the audio analysis, and (2) the development of semantic applications with high added value to exploit this semantic database. A preliminary version of the WASABI database is already online1 and will be enriched all along the project. The main originality of this project is the collaboration between the algorithms that will extract semantic metadata from the web and from song lyrics with the algorithms that will work on the audio. The following WebAudio enhanced applications will be associated with each song in the database: an online mixing table, guitar amp simulations with a virtual pedal-board, audio analysis visualization tools, annotation tools, a similarity search tool that works by uploading audio extracts or playing some melody using a MIDI device are planned as companions for the WASABI database
QCD and strongly coupled gauge theories : challenges and perspectives
We highlight the progress, current status, and open challenges of QCD-driven physics, in theory and in experiment. We discuss how the strong interaction is intimately connected to a broad sweep of physical problems, in settings ranging from astrophysics and cosmology to strongly coupled, complex systems in particle and condensed-matter physics, as well as to searches for physics beyond the Standard Model. We also discuss how success in describing the strong interaction impacts other fields, and, in turn, how such subjects can impact studies of the strong interaction. In the course of the work we offer a perspective on the many research streams which flow into and out of QCD, as well as a vision for future developments.Peer reviewe