Search CORE

21 research outputs found

Informed Separation of Spatial Images of Stereo Music Recordings Using Second-Order Statistics

Author: Gorlow Stanislaw
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 22/09/2013
Field of study

International audienceIn this work we address a reverse audio engineering problem, i.e. the separation of stereo tracks of professionally produced music recordings. More precisely, we apply a spatial filtering approach with a quadratic constraint using an explicit source-image-mixture model. The model parameters are "learned" from a given set of original stereo tracks, reduced in size and used afterwards to demix the desired tracks in best possible quality from a preexisting mixture. Our approach implicates a side-information rate of 10 kbps per source or channel and has a low computational complexity. The results obtained for the SiSEC 2013 dataset are intended to be used as reference for comparison with unpublished approaches

HAL-Université de Bretagne Occidentale

On the Informed Source Separation Approach for Interactive Remixing in Stereo

Author: Gorlow Stanislaw
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 04/05/2013
Field of study

International audienceInformed source separation (ISS) has become a popular trend in the audio signal processing community over the past few years. Its purpose is to decompose a mixture signal into its constituent parts at the desired or the best possible quality level given some metadata. In this paper we present a comparison between two ISS systems and relate the ISS approach in various configurations with conventional coding of separate tracks for interactive remixing in stereo. The compared systems are Underdetermined Source Signal Recovery (USSR) and Enhanced Audio Object Separation (EAOS). The latter forms a part of MPEG's Spatial Audio Object Coding technology. The performance is evaluated using objective difference grades computed with PEMO-Q. The results suggest that USSR performs perceptually better than EOAS and has a lower computational complexity

HAL-Université de Bretagne Occidentale

Frequency-domain bandwidth extension for low-delay audio coding applications

Author: Gorlow Stanislaw
Publication venue
Publication date: 12/05/2014
Field of study

MPEG-4 Spectral Band Replication (SBR) is a sophisticated high-frequency reconstruction (HFR) tool for speech and natural audio which when used in conjunction with an audio codec delivers a broadband high-quality signal at a bit rate of 48 kbps or even below. The major drawback of this technique is that it significantly increases the delay of the underlying core codec. The idea of synthetic signal reconstruction is of particular interest also in real-time communications. There, a HFR method can be employed to further loosen the channel capacity requirements. In this thesis a delay-optimized derivative of SBR is elaborated, which can be used together with a low-delay speech and audio coder like the Fraunhofer ULD. The presented approach is based on a short-time subband representation of an acoustic signal of natural or artificial origin, and as such it utilizes a filter bank for the extraction and the manipulation of sound characteristics. The system delay for a combination of the ULD coder with the proposed low-delay bandwidth extension (LD-BWE) tool adds up to 12 ms at a sampling rate of 48 kHz. At the present stage, LD-BWE generates a subjectively confirmed excellent-quality highband replica at a simulated mean data rate of 12.8 kbps.MPEG-4 Spectral Band Replication (SBR) ist ein technisch ausgereiftes Verfahren zur Rückgewinnung von hochfrequenten Signalkomponenten für Sprache und natürliches Audio, das in Verbindung mit einem Audiocodec angewandt ein hochwertiges Breitbandsignal bei einer Bitrate von nicht mehr als 48 kbps liefert. Ein wesentlicher Nachteil dieser Methode ist, dass sie die Zeitverzögerung des darunter liegenden Kerncodecs maßgeblich vergrößert. Die Idee der synthetischen Signalwiederherstellung ist in Echtzeitkommunikation ebenso von besonderem Interesse. Ein derartiges Verfahren könnte dort eingesetzt werden, um die Anforderungen an die Kanalkapazität weiter zu lockern. In dieser Arbeit wird ein latenzoptimiertes Derivat von SBR ausgearbeitet, welches zusammen mit einem minimal verzögernden Sprach- und Audiocoder, wie dem Fraunhofer ULD, verwendet werden kann. Der vorgestellte Ansatz basiert auf einer Kurzzeit-Teilband-Darstellung eines akustischen Signals natürlichen oder künstlichen Ursprungs, und greift als solcher auf eine Filterbank zur Extraktion und Manipulation von Klangcharakteristika zurück. Die Verzögerungszeit des Gesamtsystems bestehend aus dem ULD-Coder und der vorgeschlagenen Bandbreitenerweiterung beläuft sich bei einer Abtastrate von 48 kHz auf 12 ms. Einem subjektiven Hörtest zufolge, erzeugt die neu entwickelte Bandbreitenerweiterung in ihrem derzeitigen Stadium eine Kopie des Hochbandes von hervorragender Qualität bei einer simulierten mittleren Datenrate von 12.8 kbps.Ilmenau, Techn. Univ., Masterarbeit, 201

Digitale Bibliothek Thüringen

Coding Backward Compatible Audio Objects with Predictable Quality in a Very Spatial Way

Author: Gorlow Stanislaw
Publication venue: HAL CCSD
Publication date: 29/10/2015
Field of study

International audienceA gradual transition from channel-based to object-based audio can currently be observed throughout the film and the broadcast industries. One paramount example of this trend is the new MPEG-H 3D Audio standard, which is under development. Other object-based standards in the market place are DTS:X and Dolby Atmos. In this engineering brief, a newly developed prototype of an object-based audio coding system is introduced and discussed in terms of its technical characteristics. The codec can be of use everywhere where a given sound scene is to be re-rendered according to the listener's preference or environment in a backward compatible manner. The areas of application cover not only interactive music listening or remixing, but also location-dependent, immersive, and 3D audio rendering

Frequency-domain bandwidth extension for low-delay audio coding applications: Appendix C: Source MATLAB® Code

Author: Gorlow Stanislaw
Publication venue
Publication date: 12/05/2014
Field of study

Source MATLAB® code for the floating-point implementation of the LD-BWE coder. SBR and LD-SBR modes are also supported

Digitale Bibliothek Thüringen

REVERSE ENGINEERING STEREO MUSIC RECORDINGS PURSUING AN INFORMED TWO-STAGE APPROACH

Author: Stanislaw Gorlow
Sylvain Marchand
Publication venue
Publication date: 02/09/2013
Field of study

A cascade reverse engineering approach is presented which uses an explicit model of the music production chain. The model considers both the mixing and the mastering stages and incorporates a parametric signal model. The approach is further pursued in an informed scenario. This means that the model parameters are attached in the form of auxiliary data to the mastered mix. They are resorted to afterwards in order to undo the mastering and the mixing. The validity of the approach is demonstrated on a stereo mixture. 1

CiteSeerX

HAL-Université de Bretagne Occidentale

Informed audio source separation using linearly constrained spatial filters

Author: Stanislaw Gorlow
Sylvain Marchand
Publication venue
Publication date: 01/01/2013
Field of study

In this work we readdress the issue of audio source separation in an informed scenario, where certain information about the sound sources is embedded into their mixture as an imperceptible watermark. In doing so, we provide a description of an improved algorithm that follows the linearly constrained minimum-variance filtering approach in the subband domain, in order to obtain perceptually better estimates of the source signals in comparison to other published approaches. Just as its predecessor, the algorithm does not impose any restrictions on the number of simultaneously active sources, neither on their spectral overlap. It rather adapts to a given signal constellation and provides the best possible estimates under given constraints in linearithmic time. The validity of the approach is demonstrated on a stereo mixture with two levels of sound complexity. It is also shown by means of both objective and subjective evaluation that the proposed algorithm outperforms a reference algorithm by at least one grade. Bearing high perceptual resemblance to the original signals at a fairly tolerable data rate of 10–20 kbps per source, the algorithm hence seems well-suited for active listening applications such as re-mixing or re-spatialization in real time

CiteSeerX

HAL-Université de Bretagne Occidentale