12 research outputs found

    Data-Driven Sound Track Generation

    Get PDF
    Background music is often used to generate a specific atmosphere or to draw our attention to specific events. For example in movies or computer games it is often the accompanying music that conveys the emotional state of a scene and plays an important role for immersing the viewer or player into the virtual environment. In view of home-made videos, slide shows, and other consumer-generated visual media streams, there is a need for computer-assisted tools that allow users to generate aesthetically appealing music tracks in an easy and intuitive way. In this contribution, we consider a data-driven scenario where the musical raw material is given in form of a database containing a variety of audio recordings. Then, for a given visual media stream, the task consists in identifying, manipulating, overlaying, concatenating, and blending suitable music clips to generate a music stream that satisfies certain constraints imposed by the visual data stream and by user specifications. It is our main goal to give an overview of various content-based music processing and retrieval techniques that become important in data-driven sound track generation. In particular, we sketch a general pipeline that highlights how the various techniques act together and come into play when generating musically plausible transitions between subsequent music clips

    A Review of Time-Scale Modification of Music Signals

    No full text
    Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures

    Verarbeitung von Musiksignalen unter Verwendung von Zerlegungstechniken fĂĽr Audiodaten

    Get PDF
    Music signals are complex. When musicians play together, their instruments' sounds superimpose and form a single complex sound mixture. Furthermore, even the sound of a single instrument may already comprise sound components of harmonic, percussive, noise-like, and transient nature, among others. The complexity of music signal processing tasks such as time-scale modifcation - the task of stretching or compressing the duration of a music signal - or music source separation - the task of separating a music recording into signals that correspond to the individual instruments - is therefore often directly derived from the complexity of music signals themselves. In this thesis, our goal is to explore novel ways of approaching music signal processing tasks. One of our core ideas is to reduce a task's complexity by decomposing a given music signal into a set of two or more mid-level components and then process these components individually. Depending on the audio decomposition technique, a mid-level component may reflect certain aspects of the music signal, such as its harmonic or percussive sounds. This explicit interpretation often allows us to apply more specialized methods for processing the mid-level components. In a last step, the processed component signals are recombined to form a global result. As part of our contributions, we propose various novel audio decomposition techniques for splitting a music signal into mid-level components. For example, we present a method for decomposing a signal into three components that contain the signal's harmonic-, percussive-, and noise-like sounds, respectively. Furthermore, we apply the general strategy described previously to approach different tasks in the fields of digital signal processing and music information retrieval. In particular, we propose novel procedures for time-scale modification, singing voice separation, vibrato analysis, and audio mosaicing. Built upon these methods, we additionally present various prototype user interfaces and tools for analyzing, modifying, editing, and synthesizing music signals.Musiksignale sind typischerweise hoch komplexe Klanggemische, die sich aus der Überlagerung von einzelnen, miteinander interagierenden Instrumentalstimmen ergeben. Sogar der Klang eines einzelnen Instruments kann sich bereits aus vielen unterschiedlichen Klangkomponenten zusammensetzen, zum Beispiel aus harmonischen, perkussiven, rauschartigen und transienten Anteilen. Diese Komplexität macht die automatisierte Verarbeitung von Musiksignalen, etwa für Aufgabenstellungen wie Time-Stretching (Stauchung oder Streckung der Länge einer Musikaufnahme) oder Quellentrennung (Zerlegung einer Musikaufnahme in Anteile die zu den einzelnen Instrumentalstimmen korrespondieren), zu einem äußerst schwierigen Problem. Eine Kernidee dieser Arbeit besteht darin, die Verarbeitung von komplexen Musikaufnahmen zu erleichtern, indem man die Aufnahme zunächst in zwei oder mehrere mid-level Klangkomponenten zerlegt und diese Teilsignale anschließend separat weiterverarbeitet. Da die extrahierten Komponenten gewisse, von der Zerlegungstechnik vorgegebene Eigenschaften haben, lassen sich für deren Verarbeitung oft spezialisierte Techniken verwenden. In einem letzten Schritt werden die verarbeiteten Komponenten wieder zusammengeführt. In dieser Arbeit stellen wir zunächst verschiedene neuartige Zerlegungstechniken für Audiodaten vor. Eines dieser Verfahren zerlegt beispielsweise eine Musikaufnahme in drei mid-level Komponenten, die zu den harmonischen, den perkussiven und den rauschartigen Klanganteilen der Aufnahme korrespondieren. Diese und weitere Zerlegungstechniken werden dann verwendet um neuartige Verfahren für Aufgabenstellungen aus den Bereichen der digitalen Audiosignalverarbeitung und des Music Information Retrievals zu entwickeln, beispielsweise zum Time-Stretching, zur Abtrennung der Singstimme aus polyphonen Musikaufnahmen, zur Analyse von Vibrato und für das Audio Mosaicing. Weiterhin stellen wir mehrere prototypische Benutzerschnittstellen und Werkzeuge zur Analyse, Modifikation, Editierung und Synthese von Musikaufnahmen vor

    A Review of Time-Scale Modification of Music Signals

    No full text
    Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music production software. Music signals are diverse—they comprise harmonic, percussive, and transient components, among others. Because of this wide range of acoustic and musical characteristics, there is no single TSM method that can cope with all kinds of audio signals equally well. Our main objective is to foster a better understanding of the capabilities and limitations of TSM procedures. To this end, we review fundamental TSM methods, discuss typical challenges, and indicate potential solutions that combine different strategies. In particular, we discuss a fusion approach that involves recent techniques for harmonic-percussive separation along with time-domain and frequency-domain TSM procedures

    TOWARDS EVALUATING MULTIPLE PREDOMINANT MELODY ANNOTATIONS IN JAZZ RECORDINGS

    No full text
    ABSTRACT Melody estimation algorithms are typically evaluated by separately assessing the task of voice activity detection and fundamental frequency estimation. For both subtasks, computed results are typically compared to a single human reference annotation. This is problematic since different human experts may differ in how they specify a predominant melody, thus leading to a pool of equally valid reference annotations. In this paper, we address the problem of evaluating melody extraction algorithms within a jazz music scenario. Using four human and two automatically computed annotations, we discuss the limitations of standard evaluation measures and introduce an adaptation of Fleiss' kappa that can better account for multiple reference annotations. Our experiments not only highlight the behavior of the different evaluation measures, but also give deeper insights into the melody extraction task

    Apparatus and method for harmonic-percussive-residual sound separation using a structure tensor on spectrograms

    No full text
    Apparatus and method for analysing a magnitude spectrogram of an audio signal for Harmonic-Percussive Residual Sound Separation HPSS comprising : Determining a change of a frequency for each time-frequency bin of a plurality of time-frequency bins of the magnitude spectrogram of the audio signal; classifying each time-frequency bin into a signal component group depending on the change of the frequency. A structural tensor is applied to the image of the spectogram for preprocessing or feature extraction by edge and corner detection, in particular by calculating predominant orientation angles in the spectrogram.The structure tensor can be considered a black box, where the input is a gray scale image and the outputs are angles n for each pixel corresponding to the direction of lowest change and a certainty or anisotropy measure for this direction for each pixel. A local frequency change is extracted from the angles : It can be determined, whether a time-frequency-bin in the spectrogram belongs to a harmonic component (= low local frequency change) or to a percussive component (= high or infinite local frequency change). Examples of application : (figure 1) Distinguish between harmonic, percussive, and residual signal components by employing this orientation information. (figure 5) Analyse an audio signal for upmixing to five audio output channels front left, center, right, left surround and right surround : - The harmonic weighting factor may be greater for generating the left, center and right output channels compared to the harmonic weighting factor for generating the left surround and right surround output channels. - The percussive weighting factor may be smaller for generating the left, center and right output channels compared to the percussive weighting factor for generating the left surround and right surround output channels. (figure 6) Compute source separation metrics (source to distortion ratio SDR, source to interference ratio SIR, and source to artifacts ratios SAR) in a recorded audio signal. For example : A vibrato in a singing voice has a high instantaneous frequency change rate; an assignment of a bin in the spectrogram to "residual" is dependent on the bin anisotropy

    Towards evaluating multiple predominant melody annotations in jazz recordings

    No full text
    Melody estimation algorithms are typically evaluated by separately assessing the task of voice activity detection and fundamental frequency estimation. For both subtasks, computed results are typically compared to a single human reference annotation. This is problematic since different human experts may differ in how they specify a predominant melody, thus leading to a pool of equally valid reference annotations. In this paper, we address the problem of evaluating melody extraction algorithms within a jazz music scenario. Using four human and two automatically computed annotations, we discuss the limitations of standard evaluation measures and introduce an adaptation of Fleiss’ kappa that can better account for multiple reference annotations. Our experiments not only highlight the behavior of the different evaluation measures, but also give deeper insights into the melody extraction task

    Automated modelling of spatially-distributed glacier ice thickness and volume

    No full text
    Ice thickness distribution and volume are both key parameters for glaciological and hydrological applications. This study presents VOLTA (Volume and Topography Automation), which is a Python script tool for ArcGISTM that requires just a digital elevation model (DEM) and glacier outline(s) to model distributed ice thickness, volume and bed topography. Ice thickness is initially estimated at points along an automatically generated centreline network based on the perfect-plasticity rheology assumption, taking into account a valley side drag component of the force balance equation. Distributed ice thickness is subsequently interpolated using a glaciologically correct algorithm. For five glaciers with independent field-measured bed topography, VOLTA modelled volumes were between 26.5% (underestimate) and 16.6% (overestimate) of that derived from field observations. Greatest differences were where an asymmetric valley cross section shape was present or where significant valley infill had occurred. Compared with other methods of modelling ice thickness and volume, key advantages of VOLTA are: a fully automated approach and a user friendly graphical user interface (GUI), GIS consistent geometry, fully automated centreline generation, inclusion of a side drag component in the force balance equation, estimation of glacier basal shear stress for each individual glacier, fully distributed ice thickness output and the ability to process multiple glaciers rapidly. VOLTA is capable of regional scale ice volume assessment, which is a key parameter for exploring glacier response to climate change. VOLTA also permits subtraction of modelled ice thickness from the input surface elevation to produce an ice-free DEM, which is a key input for reconstruction of former glaciers. VOLTA could assist with prediction of future glacier geometry changes and hence in projection of future meltwater fluxes
    corecore