24 research outputs found

    Informed Separation of Spatial Images of Stereo Music Recordings Using Second-Order Statistics

    No full text
    International audienceIn this work we address a reverse audio engineering problem, i.e. the separation of stereo tracks of professionally produced music recordings. More precisely, we apply a spatial filtering approach with a quadratic constraint using an explicit source-image-mixture model. The model parameters are "learned" from a given set of original stereo tracks, reduced in size and used afterwards to demix the desired tracks in best possible quality from a preexisting mixture. Our approach implicates a side-information rate of 10 kbps per source or channel and has a low computational complexity. The results obtained for the SiSEC 2013 dataset are intended to be used as reference for comparison with unpublished approaches

    Frequency-domain bandwidth extension for low-delay audio coding applications

    Get PDF
    MPEG-4 Spectral Band Replication (SBR) is a sophisticated high-frequency reconstruction (HFR) tool for speech and natural audio which when used in conjunction with an audio codec delivers a broadband high-quality signal at a bit rate of 48 kbps or even below. The major drawback of this technique is that it significantly increases the delay of the underlying core codec. The idea of synthetic signal reconstruction is of particular interest also in real-time communications. There, a HFR method can be employed to further loosen the channel capacity requirements. In this thesis a delay-optimized derivative of SBR is elaborated, which can be used together with a low-delay speech and audio coder like the Fraunhofer ULD. The presented approach is based on a short-time subband representation of an acoustic signal of natural or artificial origin, and as such it utilizes a filter bank for the extraction and the manipulation of sound characteristics. The system delay for a combination of the ULD coder with the proposed low-delay bandwidth extension (LD-BWE) tool adds up to 12 ms at a sampling rate of 48 kHz. At the present stage, LD-BWE generates a subjectively confirmed excellent-quality highband replica at a simulated mean data rate of 12.8 kbps.MPEG-4 Spectral Band Replication (SBR) ist ein technisch ausgereiftes Verfahren zur Rückgewinnung von hochfrequenten Signalkomponenten für Sprache und natürliches Audio, das in Verbindung mit einem Audiocodec angewandt ein hochwertiges Breitbandsignal bei einer Bitrate von nicht mehr als 48 kbps liefert. Ein wesentlicher Nachteil dieser Methode ist, dass sie die Zeitverzögerung des darunter liegenden Kerncodecs maßgeblich vergrößert. Die Idee der synthetischen Signalwiederherstellung ist in Echtzeitkommunikation ebenso von besonderem Interesse. Ein derartiges Verfahren könnte dort eingesetzt werden, um die Anforderungen an die Kanalkapazität weiter zu lockern. In dieser Arbeit wird ein latenzoptimiertes Derivat von SBR ausgearbeitet, welches zusammen mit einem minimal verzögernden Sprach- und Audiocoder, wie dem Fraunhofer ULD, verwendet werden kann. Der vorgestellte Ansatz basiert auf einer Kurzzeit-Teilband-Darstellung eines akustischen Signals natürlichen oder künstlichen Ursprungs, und greift als solcher auf eine Filterbank zur Extraktion und Manipulation von Klangcharakteristika zurück. Die Verzögerungszeit des Gesamtsystems bestehend aus dem ULD-Coder und der vorgeschlagenen Bandbreitenerweiterung beläuft sich bei einer Abtastrate von 48 kHz auf 12 ms. Einem subjektiven Hörtest zufolge, erzeugt die neu entwickelte Bandbreitenerweiterung in ihrem derzeitigen Stadium eine Kopie des Hochbandes von hervorragender Qualität bei einer simulierten mittleren Datenrate von 12.8 kbps.Ilmenau, Techn. Univ., Masterarbeit, 201

    On the Informed Source Separation Approach for Interactive Remixing in Stereo

    No full text
    International audienceInformed source separation (ISS) has become a popular trend in the audio signal processing community over the past few years. Its purpose is to decompose a mixture signal into its constituent parts at the desired or the best possible quality level given some metadata. In this paper we present a comparison between two ISS systems and relate the ISS approach in various configurations with conventional coding of separate tracks for interactive remixing in stereo. The compared systems are Underdetermined Source Signal Recovery (USSR) and Enhanced Audio Object Separation (EAOS). The latter forms a part of MPEG's Spatial Audio Object Coding technology. The performance is evaluated using objective difference grades computed with PEMO-Q. The results suggest that USSR performs perceptually better than EOAS and has a lower computational complexity

    Coding Backward Compatible Audio Objects with Predictable Quality in a Very Spatial Way

    No full text
    International audienceA gradual transition from channel-based to object-based audio can currently be observed throughout the film and the broadcast industries. One paramount example of this trend is the new MPEG-H 3D Audio standard, which is under development. Other object-based standards in the market place are DTS:X and Dolby Atmos. In this engineering brief, a newly developed prototype of an object-based audio coding system is introduced and discussed in terms of its technical characteristics. The codec can be of use everywhere where a given sound scene is to be re-rendered according to the listener's preference or environment in a backward compatible manner. The areas of application cover not only interactive music listening or remixing, but also location-dependent, immersive, and 3D audio rendering

    Frequency-domain bandwidth extension for low-delay audio coding applications: Appendix C: Source MATLAB® Code

    Get PDF
    Source MATLAB® code for the floating-point implementation of the LD-BWE coder. SBR and LD-SBR modes are also supported

    Informed source separation: Underdetermined source signal recovery from an instantaneous stereo mixture

    No full text
    International audienceThe present paper exposes a new technique that aims at solving an ill-posed source separation problem encountered in stereo mixtures. The proposed method is realized in an encoder-decoder framework: On the encoder side, a set of spectral envelopes is extracted from the original tracks, which are known. These envelopes are passed on to the decoder in attachment to the stereo mixture, whereas the frequency resolution of the former is adapted to the critical bands, and their magnitude is logarithmically quantized. On the decoder side, the mixture signal is decomposed by time-frequency selective iterative spatial filtering guided by a source activity index, which is derived from the spectral envelope values. A comparison with a similar algorithm reveals that the novel approach yields a higher perceptual audio quality at a much lower data rate

    Informed audio source separation using linearly constrained spatial filters

    No full text
    In this work we readdress the issue of audio source separation in an informed scenario, where certain information about the sound sources is embedded into their mixture as an imperceptible watermark. In doing so, we provide a description of an improved algorithm that follows the linearly constrained minimum-variance filtering approach in the subband domain, in order to obtain perceptually better estimates of the source signals in comparison to other published approaches. Just as its predecessor, the algorithm does not impose any restrictions on the number of simultaneously active sources, neither on their spectral overlap. It rather adapts to a given signal constellation and provides the best possible estimates under given constraints in linearithmic time. The validity of the approach is demonstrated on a stereo mixture with two levels of sound complexity. It is also shown by means of both objective and subjective evaluation that the proposed algorithm outperforms a reference algorithm by at least one grade. Bearing high perceptual resemblance to the original signals at a fairly tolerable data rate of 10–20 kbps per source, the algorithm hence seems well-suited for active listening applications such as re-mixing or re-spatialization in real time

    Peak-to-Average Power Ratio Reduction for OFDM Based on Dynamic Range Compression

    No full text
    International audienceWe present a peak-to-average power ratio (PAPR) reduction method for orthogonal frequency-division multiplexing (OFDM) or similar modulation schemes based on dynamic range compression and decompression. Initially, the decompressor was developed for compressed audio signals. With regard to OFDM, the greatest benefit of the method is that it can be easily adjusted to the system requirements and a tradeoff can be found between the PAPR gain and signal distortion. Practically, it requires no additional side information at the receiver. In a pilot experiment, we evaluate the method using four different metrics and give a brief interpretation of the obtained results
    corecore