195 research outputs found

    Towards Real-Time Non-Stationary Sinusoidal Modelling of Kick and Bass Sounds for Audio Analysis and Modification

    Get PDF
    Sinusoidal Modelling is a powerful and flexible parametric method for analysing and processing audio signals. These signals have an underlying structure that modern spectral models aim to exploit by separating the signal into sinusoidal, transient, and noise components. Each of these can then be modelled in a manner most appropriate to that component's inherent structure. The accuracy of the estimated parameters is directly related to the quality of the model's representation of the signal, and the assumptions made about its underlying structure. For sinusoidal models, these assumptions generally affect the non-stationary estimates related to amplitude and frequency modulations, and the type of amplitude change curve. This is especially true when using a single analysis frame in a non-overlapping framework, where biased estimates can result in discontinuities at frame boundaries. It is therefore desirable for such a model to distinguish between the shape of different amplitude changes and adapt the estimation of this accordingly. Intra-frame amplitude change can be interpreted as a change in the windowing function applied to a stationary sinusoid, which can be estimated from the derivative of the phase with respect to frequency at magnitude peaks in the DFT spectrum. A method for measuring monotonic linear amplitude change from single-frame estimates using the first-order derivative of the phase with respect to frequency (approximated by the first-order difference) is presented, along with a method of distinguishing between linear and exponential amplitude change. An adaption of the popular matching pursuit algorithm for refining model parameters in a segmented framework has been investigated using a dictionary comprised of sinusoids with parameters varying slightly from model estimates, based on Modelled Pursuit (MoP). Modelling of the residual signal using a segmented undecimated Wavelet Transform (segUWT) is presented. A generalisation for both the forward and inverse transforms, for delay compensations and overlap extensions for different lengths of Wavelets and the number of decomposition levels in an Overlap Save (OLS) implementation for dealing with convolution block-based artefacts is presented. This shift invariant implementation of the DWT is a popular tool for de-noising and shows promising results for the separation of transients from noise

    Real-time Sound Source Separation For Music Applications

    Get PDF
    Sound source separation refers to the task of extracting individual sound sources from some number of mixtures of those sound sources. In this thesis, a novel sound source separation algorithm for musical applications is presented. It leverages the fact that the vast majority of commercially recorded music since the 1950s has been mixed down for two channel reproduction, more commonly known as stereo. The algorithm presented in Chapter 3 in this thesis requires no prior knowledge or learning and performs the task of separation based purely on azimuth discrimination within the stereo field. The algorithm exploits the use of the pan pot as a means to achieve image localisation within stereophonic recordings. As such, only an interaural intensity difference exists between left and right channels for a single source. We use gain scaling and phase cancellation techniques to expose frequency dependent nulls across the azimuth domain, from which source separation and resynthesis is carried out. The algorithm is demonstrated to be state of the art in the field of sound source separation but also to be a useful pre-process to other tasks such as music segmentation and surround sound upmixing

    Traitement paramétrique des signaux audio dans le contexte des prothÚses auditives

    Get PDF
    ModÚle à moyenne mobile > -- ModÚle autorégressif > -- ModÚle autorégressif à moyenne mobile > -- Remarque sur le lien entre AR, MA et ARMA -- Evaluation des paramÚtres d'un processus AR(p) -- CritÚres de sélection de l'ordre d'un modÚle AR(p) -- Notion d'enveloppe spectrale -- Méthodes élaborées dans le domaine fréquentiel -- Méthodes élaborées dans le domaine de corrélation -- Réduction de bruit dans le domaine fréquentiel -- A two-microphone algorithm for speech enhancement -- State of the art -- Zelinski's approach in the case of two-microphone arrangement -- Two-microphone speech enhancement system -- Performance evaluation and results -- Réduction de bruit dans le domaine de corrélation -- Estimation de la puissance du bruit -- Compensation des effets du bruit -- Amélioration de la procédure de compensation -- Perspectives de développement -- Traitement paramétrique en présence de bruit -- Disposition du traitement combiné -- Amélioration de la précision de l'estimateur de variance du bruit

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Separation of musical sources and structure from single-channel polyphonic recordings

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    New Stategies for Single-channel Speech Separation

    Get PDF

    Modelling and detection of faults in axial-flux permanent magnet machines

    Get PDF
    The development of various topologies and configurations of axial-flux permanent magnet machine has spurred its use for electromechanical energy conversion in several applications. As it becomes increasingly deployed, effective condition monitoring built on reliable and accurate fault detection techniques is needed to ensure its engineering integrity. Unlike induction machine which has been rigorously investigated for faults, axial-flux permanent magnet machine has not. Thus in this thesis, axial-flux permanent magnet machine is investigated under faulty conditions. Common faults associated with it namely; static eccentricity and interturn short circuit are modelled, and detection techniques are established. The modelling forms a basis for; developing a platform for precise fault replication on a developed experimental test-rig, predicting and analysing fault signatures using both finite element analysis and experimental analysis. In the detection, the motor current signature analysis, vibration analysis and electrical impedance spectroscopy are applied. Attention is paid to fault-feature extraction and fault discrimination. Using both frequency and time-frequency techniques, features are tracked in the line current under steady-state and transient conditions respectively. Results obtained provide rich information on the pattern of fault harmonics. Parametric spectral estimation is also explored as an alternative to the Fourier transform in the steady-state analysis of faulty conditions. It is found to be as effective as the Fourier transform and more amenable to short signal-measurement duration. Vibration analysis is applied in the detection of eccentricities; its efficacy in fault detection is hinged on proper determination of vibratory frequencies and quantification of corresponding tones. This is achieved using analytical formulations and signal processing techniques. Furthermore, the developed fault model is used to assess the influence of cogging torque minimization techniques and rotor topologies in axial-flux permanent magnet machine on current signal in the presence of static eccentricity. The double-sided topology is found to be tolerant to the presence of static eccentricity unlike the single-sided topology due to the opposing effect of the resulting asymmetrical properties of the airgap. The cogging torque minimization techniques do not impair on the established fault detection technique in the single-sided topology. By applying electrical broadband impedance spectroscopy, interturn faults are diagnosed; a high frequency winding model is developed to analyse the impedance-frequency response obtained

    Dispersion, Controlled Dispersion, and Three Applications

    Get PDF
    Over the past 15 years, several groups have engineered media that are both strongly dispersive and roughly transparent for some finite bandwidth. Relationships and intuitive models that are satisfactory when it is reasonable to neglect dispersion may then fail. We analyze three such cases of failure. First, a simple generalization of the Abraham and Minkowski momenta to dispersive media entails multiplying each per-photon momentum by n/ngn/n_g, where nn is the refractive index and ngn_g is the group index. The resulting forms are experimentally relevant for the case of the Abraham momentum, but not for the Minkowski momentum. We show how dispersion modulates the displacement of a sphere embedded in a dispersive medium by a pulse. Second, pulse transformation in a nonstationary medium is modulated by the presence of dispersion. Using an explicit description of the kinetics of dispersive nonstationary inhomogeneous media, we show how the group velocity can modulate pulse response to a change in the refractive index and how Doppler shifts may become large in a dispersive medium as the velocity of the Doppler shifting surface approaches the group velocity. We explain a simple way to use existing technology to either compress or decompress a given pulse, changing its bandwidth and spatial extent by several orders of magnitude while otherwise preserving its envelope shape. Finally, we note that the nature of a single optical cavity quasimode depends on intracavity dispersion. We show that the quantum field noise associated with a single cavity mode may be modulated by dispersion. For a well-chosen mode in a high-Q cavity, this can amount to either an increase or a decrease in total vacuum field energy by several orders of magnitude. We focus on the "white light cavity," showing that the quantum noise of an ideal white light cavity diverges as the cavity finesse improves.Comment: 154 pages (inclusive), 12 figures, Dissertatio

    Trennung und SchĂ€tzung der Anzahl von Audiosignalquellen mit Zeit- und FrequenzĂŒberlappung

    Get PDF
    Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We ïŹrst address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, conïŹrming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to ïŹnd modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. FĂŒr diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine hĂ€uïŹge Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollstĂ€ndig ĂŒberlappen. In dieser Arbeit betrachten wir jedoch einige FĂ€lle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-ReprĂ€sentationen und leistungsfĂ€higere Modelle notwendig sind. Um die zwei genannten Probleme zu bewĂ€ltigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunĂ€chst auf das Problem der Quellentrennung fĂŒr Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine SchĂ€tzung der Grundfrequenz als zusĂ€tzliche Information nutzt. FĂŒr FĂ€lle, in denen diese SchĂ€tzungen nicht verfĂŒgbar sind, stellen wir ein unĂŒberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitverĂ€nderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthĂ€lt eine neuartige ReprĂ€sentation, die die Separierbarkeit fĂŒr ĂŒberlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschĂ€ftigen wir uns mit der SchĂ€tzung der Anzahl von Quellen in einer Mischung, was fĂŒr reale Szenarien wichtig ist. Unsere Arbeit an der SchĂ€tzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzufĂŒhren, die bestĂ€tigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschĂ€tzen. Um nun die Frage zu beantworten, ob Maschinen dies Ă€hnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der SchĂ€tzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adĂ€quate Darstellung, um die Überlappung von Signalen fĂŒr die Trennung zugĂ€nglich zu machen und eine Inspektion unseres DNN-ZĂ€hlmodells ergab schließlich, dass sich auch hier modulationsĂ€hnliche Merkmale ïŹnden lassen
    • 

    corecore