76 research outputs found

    Final Research Report for Sound Design and Audio Player

    Get PDF
    This deliverable describes the work on Task 4.3 Algorithms for sound design and feature developments for audio player. The audio player runs on the in-store player (ISP) and takes care of rendering the music playlists via beat-synchronous automatic DJ mixing, taking advantage of the rich musical content description extracted in T4.2 (beat markers, structural segmentation into intro and outro, musical and sound content classification). The deliverable covers prototypes and final results on: (1) automatic beat-synchronous mixing by beat alignment and time stretching – we developed an algorithm for beat alignment and scheduling of time-stretched tracks; (2) compensation of play duration changes introduced by time stretching – in order to make the playlist generator independent of beat mixing, we chose to readjust the tempo of played tracks such that their stretched duration is the same as their original duration; (3) prospective research on the extraction of data from DJ mixes – to alleviate the lack of extensive ground truth databases of DJ mixing practices, we propose steps towards extracting this data from existing mixes by alignment and unmixing of the tracks in a mix. We also show how these methods can be evaluated even without labelled test data, and propose an open dataset for further research; (4) a description of the software player module, a GUI-less application to run on the ISP that performs streaming of tracks from disk and beat-synchronous mixing. The estimation of cue points where tracks should cross-fade is now described in D4.7 Final Research Report on Auto-Tagging of Music.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D

    Informed Multiple-F0 Estimation Applied to Monaural Audio Source Separation

    No full text
    International audienceThis paper proposes a new informed source separation technique which combines music transcription with source separation. The presented system is based on a coder / decoder configuration where a classic (not informed) multiple-F0 estimation is applied on each separated source signal assumed known at the coder before the mixing process. Thus, the extra information required to recover the reference transcription of each isolated instrument is computed and inaudibly embedded into the mixture using a watermarking technique. At the decoder, where the original source signals are unknown, instruments are separated from the mixture using the informed transcription of each source signal. In this paper, we show that a classic (non-informed) F0 estimator can be used to reduce the amount of bits necessary to transmit the exact transcription of each isolated instrument

    Methods and Datasets for DJ-Mix Reverse Engineering

    Get PDF
    International audienceDJ techniques are an important part of popular music culture. However, they are also not sufficiently investigated by researchers due to the lack of annotated datasets of DJ mixes. Thus, this paper aims at filling this gap by introducing novel methods to automatically deconstruct and annotate recorded mixes for which the constituent tracks are known. A rough alignment first estimates where in the mix each track starts, and which time-stretching factor was applied. Second, a sample-precise alignment is applied to determine the exact offset of each track in the mix. Third, we propose a new method to estimate the cue points and the fade curves which operates in the time-frequency domain to increase its robustness to interference with other tracks. The proposed methods are finally evaluated on our new publicly available DJ-mix dataset. This dataset contains automatically generated beat-synchronous mixes based on freely available music tracks, and the ground truth about the placement of tracks in a mix

    Approche informée pour l'analyse du son et de la musique

    Get PDF
    En traitement du signal audio, l analyse est une étape essentielle permettant de comprendre et d inter-agir avec les signaux existants. En effet, la qualité des signaux obtenus par transformation ou par synthèse des paramètres estimés dépend de la précision des estimateurs utilisés. Cependant, des limitations théoriques existent et démontrent que la qualité maximale pouvant être atteinte avec une approche classique peut s avérer insuf sante dans les applications les plus exigeantes (e.g. écoute active de la musique). Le travail présenté dans cette thèse revisite certains problèmes d analyse usuels tels que l analyse spectrale, la transcription automatique et la séparation de sources en utilisant une approche dite informée . Cette nouvelle approche exploite la con guration des studios de musique actuels qui maitrisent la chaîne de traitement avant l étape de création du mélange. Dans les solutions proposées, de l information complémentaire minimale calculée est transmise en même temps que le signal de mélange a n de permettre certaines transformations sur celui-ci tout en garantissant le niveau de qualité. Lorsqu une compatibilité avec les formats audio existants est nécessaire, cette information est cachée à l intérieur du mélange lui-même de manière inaudible grâce au tatouage audionumérique. Ce travail de thèse présente de nombreux aspects théoriques et pratiques dans lesquels nous montrons que la combinaison d un estimateur avec de l information complémentaire permet d améliorer les performances des approches usuelles telles que l estimation non informée ou le codage pur.In the field of audio signal processing, analysis is an essential step which allows interactions with existing signals. In fact, the quality of transformed or synthesized audio signals depends on the accuracy over the estimated model parameters. However, theoretical limits exist and show that the best accuracy which can be reached by a classic estimator can be insufficient for the most demanding applications (e.g. active listening of music). The work which is developed in this thesis revisits well known audio analysis problems like spectral analysis, automatic transcription of music and audio sources separation using the novel informed'' approach. This approach takes advantage of a specific configuration where the parameters of the elementary signals which compose a mixture are known before the mixing process. Using the tools which are proposed in this thesis, the minimal side information is computed and transmitted with the mixture signal. This allows any kind of transformation of the mixture signal with a constraint over the resulting quality. When the compatibility with existing audio formats is required, the side information is embedded directly into the analyzed audio signal using a watermarking technique. This work describes several theoretical and practical aspects of audio signal processing. We show that a classic estimator combined with the sufficient side information can obtain better performances than classic approaches (classic estimation or pure coding).BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Going ba-na-nas: Prosodic analysis of spoken Japanese attitudes

    Get PDF
    International audienceThe aim of this paper is to examine cues for prosodic characterization of attitudes in Japanese. This work is based on previous studies where 16 communicative social affects were defined. The audio signal parameters (fundamental frequency, amplitude and duration) of previously recorded Japanese attitudes, are statistically analyzed. Interesting interactions among the parameters, the gender and the expression of specific attitude (e.g. politeness) were found, and we report on which parameters most significantly characterize each attitude. Index Terms: speech, prosody, attitude, social affect, emotional speech, Japanese languag

    Analyse prosodique des affects sociaux dans l'interaction face à face en japonais

    Get PDF
    International audienceLe but de cet article est de caractériser la prosodie attitudinale en langue japonaise. Ce travail s'appuie sur des travaux décrivant 16 attitudes correspondant à des situations de communication différentes. Ces situations peuvent, ou non être conventionnalisées dans la langue japonaise. Les paramètres estimés de fréquence fondamentale, d'amplitude et de durée ont été extraits d'énoncés exprimant ces 16 attitudes en japonais. Dans cette étude, nous présentons les effets sur ces paramètres des facteurs sexe du locuteur et expression attitudinale. Nous analysons également lesquels de ces paramètres prosodiques sont les plus discriminants pour caractériser acoustiquement chaque attitude

    Final Research Report on Auto-Tagging of Music

    Get PDF
    The deliverable D4.7 concerns the work achieved by IRCAM until M36 for the “auto-tagging of music”. The deliverable is a research report. The software libraries resulting from the research have been integrated into Fincons/HearDis! Music Library Manager or are used by TU Berlin. The final software libraries are described in D4.5. The research work on auto-tagging has concentrated on four aspects: 1) Further improving IRCAM’s machine-learning system ircamclass. This has been done by developing the new MASSS audio features, including audio augmentation and audio segmentation into ircamclass. The system has then been applied to train HearDis! “soft” features (Vocals-1, Vocals-2, Pop-Appeal, Intensity, Instrumentation, Timbre, Genre, Style). This is described in Part 3. 2) Developing two sets of “hard” features (i.e. related to musical or musicological concepts) as specified by HearDis! (for integration into Fincons/HearDis! Music Library Manager) and TU Berlin (as input for the prediction model of the GMBI attributes). Such features are either derived from previously estimated higher-level concepts (such as structure, key or succession of chords) or by developing new signal processing algorithm (such as HPSS) or main melody estimation. This is described in Part 4. 3) Developing audio features to characterize the audio quality of a music track. The goal is to describe the quality of the audio independently of its apparent encoding. This is then used to estimate audio degradation or music decade. This is to be used to ensure that playlists contain tracks with similar audio quality. This is described in Part 5. 4) Developing innovative algorithms to extract specific audio features to improve music mixes. So far, innovative techniques (based on various Blind Audio Source Separation algorithms and Convolutional Neural Network) have been developed for singing voice separation, singing voice segmentation, music structure boundaries estimation, and DJ cue-region estimation. This is described in Part 6.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D

    Informed Spectral Analysis: audio signal parameters estimation using side information

    No full text
    International audienceParametric models are of great interest for representing and manipulating sounds. However, the quality of the resulting signals depends on the precision of the parameters. When the signals are available, these parameters can be estimated but the presence of noise decreases the resulting precision of the estimation. Furthermore, the Cramér-Rao bound shows the minimal error reachable with the best estimator, which can be insufficient for the demanding applications. These limitations can be overcome by using the coding approach which consists in directly transmitting the parameters with the best precision using the minimal bitrate. However, this approach does not take advantage of the information provided by the estimation from the signal and may require a larger bitrate and a loss of compatibility with existing file formats. The purpose of this article is to propose a compromised approach, called the ''informed approach'', which combines analysis with (coded) side information in order to increase the precision of parameter estimation using a lower bitrate than pure coding approaches, the audio signal being known. Thus, the analysis problem is presented in a coder/decoder configuration where the side information is computed and inaudibly embedded into the mixture signal at the coder. At the decoder, the extra information is extracted and is used to assist the analysis process. This study proposes applying this approach to audio spectral analysis using sinusoidal modeling which is a well-known model with practical applications and where theoretical bounds have been calculated. This work aims at uncovering new approaches for audio quality-based applications. It provides a solution for challenging problems like active listening of music, source separation, and realistic sound transformations
    corecore