556 research outputs found

    The Pole Behaviour of the Phase Derivative of the Short-Time Fourier Transform

    Full text link
    The short-time Fourier transform (STFT) is a time-frequency representation widely used in applications, for example in audio signal processing. Recently it has been shown that not only the amplitude, but also the phase of this representation can be successfully exploited for improved analysis and processing. In this paper we describe a rather peculiar pole phenomenon in the phase derivative, a recurring pattern that appears in a characteristic way in the neighborhood around any of the zeros of the STFT, a negative peak followed by a positive one. We describe this phenomenon numerically and provide a complete analytical explanation.Comment: 15 pages, 4 figures; Applied and Computational Harmonic Analysis (in press), available online 22 October 201

    Adjusting the Spectral Envelope Evolution of Transposed Sounds with Gabor Mask Prototypes

    No full text
    International audienceAudio samplers often require to modify the pitch of recorded sounds in order to generate scales or chords. This article tackles the use of Gabor masks and their capacity to improve the perceptual realism of transposed notes obtained through the classical phase-vocoder algorithm. Gabor masks can be seen as operators that allows the modification of time-dependent spectral content of sounds by modifying their time-frequency representation. The goal here is to restore a distribution of energy that is more in line with the physics of the structure that generated the original sound. The Gabor mask is elaborated using an estimation of the spectral envelope evolution in the time-frequency plane, and then applied to the modified Gabor transform. This operation turns the modified Gabor transform into another one which respects the estimated spectral envelope evolution, and therefore leads to a note that is more perceptually convincing

    Statistical Spectral Parameter Estimation of Acoustic Signals with Applications to Byzantine Music

    Get PDF
    Digitized acoustical signals of Byzantine music performed by Iakovos Nafpliotis are used to extract the fundamental frequency of each note of the diatonic scale. These empirical results are then contrasted to the theoretical suggestions and previous empirical findings. Several parametric and non-parametric spectral parameter estimation methods are implemented. These include: (1) Phase vocoder method, (2) McAulay-Quatieri method, (3) Levinson-Durbin algorithm,(4) YIN, (5) Quinn & Fernandes Estimator, (6) Pisarenko Frequency Estimator, (7) MUltiple SIgnal Characterization (MUSIC) algorithm, (8) Periodogram method, (9) Quinn & Fernandes Filtered Periodogram, (10) Rife & Vincent Estimator, and (11) the Fourier transform. Algorithm performance was very precise. The psychophysical aspect of human pitch discrimination is explored. The results of eight (8) psychoacoustical experiments were used to determine the aural just noticeable difference (jnd) in pitch and deduce patterns utilized to customize acceptable performable pitch deviation to the application at hand. These customizations [Acceptable Performance Difference (a new measure of frequency differential acceptability), Perceptual Confidence Intervals (a new concept of confidence intervals based on psychophysical experiment rather than statistics of performance data), and one based purely on music-theoretical asymphony] are proposed, discussed, and used in interpretation of results. The results suggest that Nafpliotis\u27 intervals are closer to just intonation than Byzantine theory (with minor exceptions), something not generally found in Thrasivoulos Stanitsas\u27 data. Nafpliotis\u27 perfect fifth is identical to the just intonation, even though he overstretches his octaveby fifteen (15)cents. His perfect fourth is also more just, as opposed to Stanitsas\u27 fourth which is directionally opposite. Stanitsas\u27 tendency to exaggerate the major third interval A4-F4 is still seen in Nafpliotis, but curbed. This is the only noteworthy departure from just intonation, with Nafpliotis being exactly Chrysanthian (the most exaggerated theoretical suggestion of all) and Stanitsas overstretching it even more than Nafpliotis and Chrysanth. Nafpliotis ascends in the second tetrachord more robustly diatonically than Stanitsas. The results are reported and interpreted within the framework of Acceptable Performance Differences

    Designing Gabor windows using convex optimization

    Full text link
    Redundant Gabor frames admit an infinite number of dual frames, yet only the canonical dual Gabor system, constructed from the minimal l2-norm dual window, is widely used. This window function however, might lack desirable properties, e.g. good time-frequency concentration, small support or smoothness. We employ convex optimization methods to design dual windows satisfying the Wexler-Raz equations and optimizing various constraints. Numerical experiments suggest that alternate dual windows with considerably improved features can be found

    Improving Time-Scale Modification of Music Signals Using Harmonic-Percussive Separation

    Get PDF
    A major problem in time-scale modification (TSM) of music signals is that percussive transients are often perceptually degraded. To prevent this degradation, some TSM approaches try to explicitly identify transients in the input signal and to handle them in a special way. However, such approaches are problematic for two reasons. First, errors in the transient detection have an immediate influence on the final TSM result and, second, a perceptual transparent preservation of transients is by far not a trivial task. In this paper we present a TSM approach that handles transients implicitly by first separating the signal into a harmonic component as well as a percussive component which typically contains the transients. While the harmonic component is modified with a phase vocoder approach using a large frame size, the noise-like percussive component is modified with a simple time-domain overlap-add technique using a short frame size, which preserves the transients to a hig h degree without any explicit transient detection

    Application of the saber method for improved spectral analysis of noisy speech

    Get PDF
    technical reportA stand alone noise suppression algorithm is described for reducing the spectral effects of acoustically added noise in speech. A fundamental result is developed which shows that the spectral magnitude of speech plus noise can be effectively approximated as the sum of magnitudes of speech and noise. Using this simple phase independent additive model, the noise bias present in the short time spectrum is reduced by subtracting off the expected noise spectrum calculated during nonspeech activity. After bias removal, the time waveform is recalculated from the modified magnitude and saved phase. This Spectral Averaging for Bias Estimation and Removal, or SABER method requires only one FFT per time window for analysis and synthesis

    Novel Pitch Detection Algorithm With Application to Speech Coding

    Get PDF
    This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions
    corecore