1,237 research outputs found

    A Phase Vocoder based on Nonstationary Gabor Frames

    Full text link
    We propose a new algorithm for time stretching music signals based on the theory of nonstationary Gabor frames (NSGFs). The algorithm extends the techniques of the classical phase vocoder (PV) by incorporating adaptive time-frequency (TF) representations and adaptive phase locking. The adaptive TF representations imply good time resolution for the onsets of attack transients and good frequency resolution for the sinusoidal components. We estimate the phase values only at peak channels and the remaining phases are then locked to the values of the peaks in an adaptive manner. During attack transients we keep the stretch factor equal to one and we propose a new strategy for determining which channels are relevant for reinitializing the corresponding phase values. In contrast to previously published algorithms we use a non-uniform NSGF to obtain a low redundancy of the corresponding TF representation. We show that with just three times as many TF coefficients as signal samples, artifacts such as phasiness and transient smearing can be greatly reduced compared to the classical PV. The proposed algorithm is tested on both synthetic and real world signals and compared with state of the art algorithms in a reproducible manner.Comment: 10 pages, 6 figure

    High capacity data embedding schemes for digital media

    Get PDF
    High capacity image data hiding methods and robust high capacity digital audio watermarking algorithms are studied in this thesis. The main results of this work are the development of novel algorithms with state-of-the-art performance, high capacity and transparency for image data hiding and robustness, high capacity and low distortion for audio watermarking.En esta tesis se estudian y proponen diversos métodos de data hiding de imágenes y watermarking de audio de alta capacidad. Los principales resultados de este trabajo consisten en la publicación de varios algoritmos novedosos con rendimiento a la altura de los mejores métodos del estado del arte, alta capacidad y transparencia, en el caso de data hiding de imágenes, y robustez, alta capacidad y baja distorsión para el watermarking de audio.En aquesta tesi s'estudien i es proposen diversos mètodes de data hiding d'imatges i watermarking d'àudio d'alta capacitat. Els resultats principals d'aquest treball consisteixen en la publicació de diversos algorismes nous amb rendiment a l'alçada dels millors mètodes de l'estat de l'art, alta capacitat i transparència, en el cas de data hiding d'imatges, i robustesa, alta capacitat i baixa distorsió per al watermarking d'àudio.Societat de la informació i el coneixemen

    Multi-Channel Audio Time-Scale Modification

    Get PDF
    Phase vecoder based approaches to audio time-scale modification introduce a reverberant artefact into the time scaled output. Recent techniques have been developed to reduce the presence of this artefact; however, these techniques have the effect of introducing additional issues relating to their application to multi-channel recordings. This paper addresses these issues by collectively analysing all channels prior to time-scaling each individual channel

    An Efficient Phasiness Reduction Technique for Moderate Audio Time-scale Modification

    Get PDF
    Phase vocoder approaches to timescale modification of audio introduce a reverberant/phasy artifact into the time-scaled output due to a loss in phase coherence between short-time Fourier transform (STFT) bins. Recent improvements to the phase vocoder have reduced the presence of this artifact, however, it remains a problem. A method of time-scaling is presented that results in a further reduction in phasiness, for moderate timescale factors, by taking advantage of some flexibility that exists in the choice of phase required so as to maintain horizontal phase coherence between related STFT bins. Furthermore, the approach leads to a reduction in computational load within the range of time-scaling factors for which phasi-ness is reduced

    Effects of acoustic features modifications on the perception of dysarthric speech - preliminary study (pitch, intensity and duration modifications)

    Get PDF
    Marking stress is important in conveying meaning and drawing listener’s attention to specific parts of a message. Extensive research has shown that healthy speakers mark stress using three main acoustic cues; pitch, intensity, and duration. The relationship between acoustic and perception cues is vital in the development of a computer-based tool that aids the therapists in providing effective treatment to people with Dysarthria. It is, therefore, important to investigate the acoustic cues deficiency in dysarthric speech and the potential compensatory techniques needed for effective treatment. In this paper, the relationship between acoustic and perceptive cues in dysarthric speech are investigated. This is achieved by modifying stress marked sentences from 10 speakers with Ataxic dysarthria. Each speaker produced 30 sentences using the 10 Subject-Verb-Object-Adjective (SVOA) structured sentences across three stress conditions. These stress conditions are stress on the initial (S), medial (O) and final (A) target words respectively. To effectively measure the deficiencies in Dysarthria speech, the acoustic features (pitch, intensity, and duration) are modified incrementally. The paper presents the techniques involved in the modification of these acoustic features. The effects of these modifications are analysed based on steps of 25% increments in pitch, intensity and duration. For robustness and validation, 50 untrained listeners participated in the listening experiment. The results and the relationship between acoustic modifications (what is measured) and perception (what is heard) in Dysarthric speech are discussed

    Low bandwidth, image transmission amateur microsatellites

    Get PDF
    Some recent amateur packet satellites carry open access digital store- and-forward transponders which implement common communication protocols known as PACSAT PROTOCOL SUITE. These standard protocols have improved a "friendly" interaction of different users of packet satellites throughout the world, hence, making packet satellites a more realistic means of communication. Application developments using packet satellites have resulted in an interesting electronic-mail network for medical applications, the Health-Net, where medical professionals in developing countries exchange information with their counterparts. The introduction of a higher rate of data transmission at 9600 baud rate compared to the traditional 1200 baud rate has improved the performance of these satellites. However, this new rate demands some modifications to the existing standard radio receivers and transmitters widely used. In particular, in view of the fact that, digital image technology has transformed microcomputers into powerful visual communication tools, this type of networks can be used for visual communications. Unfortunately, due to the orbit mechanics of satellites involved, the nature of communication protocols and the speed of data transmission currently available, transmission of image data through such networks is difficult in terms of transmission time. This thesis describes the application development of still-continuous tone image transmissions for visual communications, through such networks. It focuses on how to start a packet satellite transmission ground-station, and minimising the transmission time required for image data uploading and downloading, by compressing image data to remove visually insignificant data in the images. Image compression techniques, the internationally recognised JPEG compression technique and a novel compression technique based on FRACTAL, which are known to achieve higher compression ratios are used and compared in this work. Although expensive, FRACTAL compression technique has many advantages over the JPEG compression technique. However, owing to the cost effectiveness of the JPEG compression technique, it is recommended in this thesis for image compression application through Health-Net communication network

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Onset detection by means of transient peak classification in harmonic bands

    Get PDF
    cote interne IRCAM: Roebel09aInternational audienceThe extended abstract describes an onset detection algorithm that is based on a classification of spectral peaks into transient and non-transient peaks and a statistical model of the classification results to prevent detection of random transient peaks due to noise. Compared to the version used for MIREX 2007 this algorithm focuses on the improvment of the detection of onsets of pitched notes

    Audiovisual preservation strategies, data models and value-chains

    No full text
    This is a report on preservation strategies, models and value-chains for digital file-based audiovisual content. The report includes: (a)current and emerging value-chains and business-models for audiovisual preservation;(b) a comparison of preservation strategies for audiovisual content including their strengths and weaknesses, and(c) a review of current preservation metadata models, and requirements for extension to support audiovisual files
    corecore