406 research outputs found

    An Application of Spectral Translation and Spectral Envelope Extrapolation for High-frequency Bandwidth Extension of Generic Audio Signals

    Get PDF
    The scope of this work is to introduce a conceptually simple yet effective algorithm for blind high-frequency bandwidth extension of audio signals, a means of improving perceptual quality for sound which has been previously low-pass filtered or downsampled (typically due to storage considerations). The algorithm combines an application of the modulation theorem for discrete Fourier transform to regenerate the missing high-frequency end of the signal spectrum with a linear-regression-driven approach to shape the spectral envelope for the regenerated band. The results are graphically and acoustically compared to those obtained with existing audio restoration software for a variety of input signals. The source code and Windows binaries of the resulting algorithm implementation are also included

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    Frequency-domain bandwidth extension for low-delay audio coding applications

    Get PDF
    MPEG-4 Spectral Band Replication (SBR) is a sophisticated high-frequency reconstruction (HFR) tool for speech and natural audio which when used in conjunction with an audio codec delivers a broadband high-quality signal at a bit rate of 48 kbps or even below. The major drawback of this technique is that it significantly increases the delay of the underlying core codec. The idea of synthetic signal reconstruction is of particular interest also in real-time communications. There, a HFR method can be employed to further loosen the channel capacity requirements. In this thesis a delay-optimized derivative of SBR is elaborated, which can be used together with a low-delay speech and audio coder like the Fraunhofer ULD. The presented approach is based on a short-time subband representation of an acoustic signal of natural or artificial origin, and as such it utilizes a filter bank for the extraction and the manipulation of sound characteristics. The system delay for a combination of the ULD coder with the proposed low-delay bandwidth extension (LD-BWE) tool adds up to 12 ms at a sampling rate of 48 kHz. At the present stage, LD-BWE generates a subjectively confirmed excellent-quality highband replica at a simulated mean data rate of 12.8 kbps.MPEG-4 Spectral Band Replication (SBR) ist ein technisch ausgereiftes Verfahren zur Rückgewinnung von hochfrequenten Signalkomponenten für Sprache und natürliches Audio, das in Verbindung mit einem Audiocodec angewandt ein hochwertiges Breitbandsignal bei einer Bitrate von nicht mehr als 48 kbps liefert. Ein wesentlicher Nachteil dieser Methode ist, dass sie die Zeitverzögerung des darunter liegenden Kerncodecs maßgeblich vergrößert. Die Idee der synthetischen Signalwiederherstellung ist in Echtzeitkommunikation ebenso von besonderem Interesse. Ein derartiges Verfahren könnte dort eingesetzt werden, um die Anforderungen an die Kanalkapazität weiter zu lockern. In dieser Arbeit wird ein latenzoptimiertes Derivat von SBR ausgearbeitet, welches zusammen mit einem minimal verzögernden Sprach- und Audiocoder, wie dem Fraunhofer ULD, verwendet werden kann. Der vorgestellte Ansatz basiert auf einer Kurzzeit-Teilband-Darstellung eines akustischen Signals natürlichen oder künstlichen Ursprungs, und greift als solcher auf eine Filterbank zur Extraktion und Manipulation von Klangcharakteristika zurück. Die Verzögerungszeit des Gesamtsystems bestehend aus dem ULD-Coder und der vorgeschlagenen Bandbreitenerweiterung beläuft sich bei einer Abtastrate von 48 kHz auf 12 ms. Einem subjektiven Hörtest zufolge, erzeugt die neu entwickelte Bandbreitenerweiterung in ihrem derzeitigen Stadium eine Kopie des Hochbandes von hervorragender Qualität bei einer simulierten mittleren Datenrate von 12.8 kbps.Ilmenau, Techn. Univ., Masterarbeit, 201

    Differentiable Artificial Reverberation

    Full text link
    Artificial reverberation (AR) models play a central role in various audio applications. Therefore, estimating the AR model parameters (ARPs) of a target reverberation is a crucial task. Although a few recent deep-learning-based approaches have shown promising performance, their non-end-to-end training scheme prevents them from fully exploiting the potential of deep neural networks. This motivates to introduce differentiable artificial reverberation (DAR) models which allows loss gradients to be back-propagated end-to-end. However, implementing the AR models with their difference equations "as is" in the deep-learning framework severely bottlenecks the training speed when executed with a parallel processor like GPU due to their infinite impulse response (IIR) components. We tackle this problem by replacing the IIR filters with finite impulse response (FIR) approximations with the frequency-sampling method (FSM). Using the FSM, we implement three DAR models -- differentiable Filtered Velvet Noise (FVN), Advanced Filtered Velvet Noise (AFVN), and Feedback Delay Network (FDN). For each AR model, we train its ARP estimation networks for analysis-synthesis (RIR-to-ARP) and blind estimation (reverberant-speech-to-ARP) task in an end-to-end manner with its DAR model counterpart. Experiment results show that the proposed method achieves consistent performance improvement over the non-end-to-end approaches in both objective metrics and subjective listening test results.Comment: Manuscript submitted to TASL

    Surface Electromyographic (sEMG) Transduction of Hand Joint Angles for Human Interfacing Devices (HID)

    Get PDF
    This is an investigation of the use of surface electromyography (sEMG) as a tool to improve human interfacing devices (HID) information bandwidth through the transduction of the fingertip workspace. It combines the work of Merletti et al and Jarque-Bou et al to design an open-source framework for Fingertip Workspace based Human Interfacing Devices (HID). In this framework, the fingertip workspace is defined as the system of forearm and hand muscle force through a tensor which describes hand anthropometry. The thesis discusses the electrophysiology of muscle tissue along with the anatomy and physiology of the arm in pursuit of optimizing sensor location, muscle force measurements, and viable command gestures. Algorithms for correlating sEMG to hand joint angle are investigated using MATLAB for both static and moving gestures. Seven sEMG spots and Fingertip Joint Angles recorded by Jarque Bou et al are investigated for the application of sEMG to Human Interfacing Devices (HID). Such technology is termed Gesture Computer Interfacing (GCI) and has been shown feasible through devices such as CTRL Labs interface, and models such as those of Sartori, Merletti, and Zhao. Muscles under sEMG spots in this dataset and the actions related to them are discussed, along with what muscles and hand actions are not visible within this dataset. Viable gestures for detection algorithms are discussed based on the muscles discerned to be visible in the dataset through intensity, spectral moment, power spectra, and coherence. Detection and isolation of such viable actions is fundamental to designing an EMG driven musculoskeletal model of the hand needed to facilitate GCI. Enveloping, spectral moment, power spectrum, and coherence analysis are applied to a Sollerman Hand Function Test sEMG dataset of twenty-two subjects performing 26 activities of living to differentiate pinching and grasping tasks. Pinches and grasps were found to cause very different activation patterns in sEMG spot 3 relating to flexion of digits I - V. Spectral moment was found to be less correlated with differentiation and provided information about the degree of object manipulation performed and extent of fatigue during each task. Coherence was shown to increase between flexors and extensors with intensity of task but was found corrupted by crosstalk with increasing intensity of muscular activation. Some spectral results correlated between finger flexor and extensor power spectra showed anticipatory coherence between the muscle groups at the end of object manipulation. An sEMG amplification system capable of capturing HD-sEMG with a bandwidth of 300 and 500 Hz at a sampling frequency of 2 kHz was designed for future work. The system was designed in ordinance with current IEEE research on sensor-electrode characteristics. Furthermore, discussion of solutions to open issues in HD-sEMG is provided. This work did not implement the designed wristband but serves as a literature review and open-source design using commercially available technologies

    Developing a flexible and expressive realtime polyphonic wave terrain synthesis instrument based on a visual and multidimensional methodology

    Get PDF
    The Jitter extended library for Max/MSP is distributed with a gamut of tools for the generation, processing, storage, and visual display of multidimensional data structures. With additional support for a wide range of media types, and the interaction between these mediums, the environment presents a perfect working ground for Wave Terrain Synthesis. This research details the practical development of a realtime Wave Terrain Synthesis instrument within the Max/MSP programming environment utilizing the Jitter extended library. Various graphical processing routines are explored in relation to their potential use for Wave Terrain Synthesis

    Multiresolution models in image restoration and reconstruction with medical and other applications

    Get PDF
    corecore