154 research outputs found

    Prediction-driven computational auditory scene analysis

    Get PDF
    The sound of a busy environment, such as a city street, gives rise to a perception of numerous distinct events in a human listener--the 'auditory scene analysis' of the acoustic information. Recent advances in the understanding of this process from experimental psychoacoustics have led to several efforts to build a computer model capable of the same function. This work is known as 'computational auditory scene analysis'. The dominant approach to this problem has been as a sequence of modules, the output of one forming the input to the next. Sound is converted to its spectrum, cues are picked out, and representations of the cues are grouped into an abstract description of the initial input. This 'data-driven' approach has some specific weaknesses in comparison to the auditory system: it will interpret a given sound in the same way regardless of its context, and it cannot 'infer' the presence of a sound for which direct evidence is hidden by other components. The 'prediction-driven' approach is presented as an alternative, in which analysis is a process of reconciliation between the observed acoustic features and the predictions of an internal model of the sound-producing entities in the environment. In this way, predicted sound events will form part of the scene interpretation as long as they are consistent with the input sound, regardless of whether direct evidence is found. A blackboard-based implementation of this approach is described which analyzes dense, ambient sound examples into a vocabulary of noise clouds, transient clicks, and a correlogram-based representation of wide-band periodic energy called the weft. The system is assessed through experiments that firstly investigate subjects' perception of distinct events in ambient sound examples, and secondly collect quality judgments for sound events resynthesized by the system. Although rated as far from perfect, there was good agreement between the events detected by the model and by the listeners. In addition, the experimental procedure does not depend on special aspects of the algorithm (other than the generation of resyntheses), and is applicable to the assessment and comparison of other models of human auditory organization

    Iterative Separation of Note Events from Single-Channel Polyphonic Recordings

    Get PDF
    This thesis is concerned with the separation of audio sources from single-channel polyphonic musical recordings using the iterative estimation and separation of note events. Each event is defined as a section of audio containing largely harmonic energy identified as coming from a single sound source. Multiple events can be clustered to form separated sources. This solution is a model-based algorithm that can be applied to a large variety of audio recordings without requiring previous training stages. The proposed system embraces two principal stages. The first one considers the iterative detection and separation of note events from within the input mixture. In every iteration, the pitch trajectory of the predominant note event is automatically selected from an array of fundamental frequency estimates and used to guide the separation of the event's spectral content using two different methods: time-frequency masking and time-domain subtraction. A residual signal is then generated and used as the input mixture for the next iteration. After convergence, the second stage considers the clustering of all detected note events into individual audio sources. Performance evaluation is carried out at three different levels. Firstly, the accuracy of the note-event-based multipitch estimator is compared with that of the baseline algorithm used in every iteration to generate the initial set of pitch estimates. Secondly, the performance of the semi-supervised source separation process is compared with that of another semi-automatic algorithm. Finally, a listening test is conducted to assess the audio quality and naturalness of the separated sources when they are used to create stereo mixes from monaural recordings. Future directions for this research focus on the application of the proposed system to other music-related tasks. Also, a preliminary optimisation-based approach is presented as an alternative method for the separation of overlapping partials, and as a high resolution time-frequency representation for digital signals

    Multichannel source separation and tracking with phase differences by random sample consensus

    Get PDF
    Blind audio source separation (BASS) is a fascinating problem that has been tackled from many different angles. The use case of interest in this thesis is that of multiple moving and simultaneously-active speakers in a reverberant room. This is a common situation, for example, in social gatherings. We human beings have the remarkable ability to focus attention on a particular speaker while effectively ignoring the rest. This is referred to as the ``cocktail party effect'' and has been the holy grail of source separation for many decades. Replicating this feat in real-time with a machine is the goal of BASS. Single-channel methods attempt to identify the individual speakers from a single recording. However, with the advent of hand-held consumer electronics, techniques based on microphone array processing are becoming increasingly popular. Multichannel methods record a sound field from various locations to incorporate spatial information. If the speakers move over time, we need an algorithm capable of tracking their positions in the room. For compact arrays with 1-10 cm of separation between the microphones, this can be accomplished by applying a temporal filter on estimates of the directions-of-arrival (DOA) of the speakers. In this thesis, we review recent work on BSS with inter-channel phase difference (IPD) features and provide extensions to the case of moving speakers. It is shown that IPD features compose a noisy circular-linear dataset. This data is clustered with the RANdom SAmple Consensus (RANSAC) algorithm in the presence of strong reverberation to simultaneously localize and separate speakers. The remarkable performance of RANSAC is due to its natural tendency to reject outliers. To handle the case of non-stationary speakers, a factorial wrapped Kalman filter (FWKF) and a factorial von Mises-Fisher particle filter (FvMFPF) are proposed that track source DOAs directly on the unit circle and unit sphere, respectively. These algorithms combine directional statistics, Bayesian filtering theory, and probabilistic data association techniques to track the speakers with mixtures of directional distributions

    Pertanika Journal of Science & Technology

    Get PDF

    Pertanika Journal of Science & Technology

    Get PDF

    Virtual metrology for plasma etch processes.

    Get PDF
    Plasma processes can present dicult control challenges due to time-varying dynamics and a lack of relevant and/or regular measurements. Virtual metrology (VM) is the use of mathematical models with accessible measurements from an operating process to estimate variables of interest. This thesis addresses the challenge of virtual metrology for plasma processes, with a particular focus on semiconductor plasma etch. Introductory material covering the essentials of plasma physics, plasma etching, plasma measurement techniques, and black-box modelling techniques is rst presented for readers not familiar with these subjects. A comprehensive literature review is then completed to detail the state of the art in modelling and VM research for plasma etch processes. To demonstrate the versatility of VM, a temperature monitoring system utilising a state-space model and Luenberger observer is designed for the variable specic impulse magnetoplasma rocket (VASIMR) engine, a plasma-based space propulsion system. The temperature monitoring system uses optical emission spectroscopy (OES) measurements from the VASIMR engine plasma to correct temperature estimates in the presence of modelling error and inaccurate initial conditions. Temperature estimates within 2% of the real values are achieved using this scheme. An extensive examination of the implementation of a wafer-to-wafer VM scheme to estimate plasma etch rate for an industrial plasma etch process is presented. The VM models estimate etch rate using measurements from the processing tool and a plasma impedance monitor (PIM). A selection of modelling techniques are considered for VM modelling, and Gaussian process regression (GPR) is applied for the rst time for VM of plasma etch rate. Models with global and local scope are compared, and modelling schemes that attempt to cater for the etch process dynamics are proposed. GPR-based windowed models produce the most accurate estimates, achieving mean absolute percentage errors (MAPEs) of approximately 1:15%. The consistency of the results presented suggests that this level of accuracy represents the best accuracy achievable for the plasma etch system at the current frequency of metrology. Finally, a real-time VM and model predictive control (MPC) scheme for control of plasma electron density in an industrial etch chamber is designed and tested. The VM scheme uses PIM measurements to estimate electron density in real time. A predictive functional control (PFC) scheme is implemented to cater for a time delay in the VM system. The controller achieves time constants of less than one second, no overshoot, and excellent disturbance rejection properties. The PFC scheme is further expanded by adapting the internal model in the controller in real time in response to changes in the process operating point

    Virtual metrology for plasma etch processes.

    Get PDF
    Plasma processes can present dicult control challenges due to time-varying dynamics and a lack of relevant and/or regular measurements. Virtual metrology (VM) is the use of mathematical models with accessible measurements from an operating process to estimate variables of interest. This thesis addresses the challenge of virtual metrology for plasma processes, with a particular focus on semiconductor plasma etch. Introductory material covering the essentials of plasma physics, plasma etching, plasma measurement techniques, and black-box modelling techniques is rst presented for readers not familiar with these subjects. A comprehensive literature review is then completed to detail the state of the art in modelling and VM research for plasma etch processes. To demonstrate the versatility of VM, a temperature monitoring system utilising a state-space model and Luenberger observer is designed for the variable specic impulse magnetoplasma rocket (VASIMR) engine, a plasma-based space propulsion system. The temperature monitoring system uses optical emission spectroscopy (OES) measurements from the VASIMR engine plasma to correct temperature estimates in the presence of modelling error and inaccurate initial conditions. Temperature estimates within 2% of the real values are achieved using this scheme. An extensive examination of the implementation of a wafer-to-wafer VM scheme to estimate plasma etch rate for an industrial plasma etch process is presented. The VM models estimate etch rate using measurements from the processing tool and a plasma impedance monitor (PIM). A selection of modelling techniques are considered for VM modelling, and Gaussian process regression (GPR) is applied for the rst time for VM of plasma etch rate. Models with global and local scope are compared, and modelling schemes that attempt to cater for the etch process dynamics are proposed. GPR-based windowed models produce the most accurate estimates, achieving mean absolute percentage errors (MAPEs) of approximately 1:15%. The consistency of the results presented suggests that this level of accuracy represents the best accuracy achievable for the plasma etch system at the current frequency of metrology. Finally, a real-time VM and model predictive control (MPC) scheme for control of plasma electron density in an industrial etch chamber is designed and tested. The VM scheme uses PIM measurements to estimate electron density in real time. A predictive functional control (PFC) scheme is implemented to cater for a time delay in the VM system. The controller achieves time constants of less than one second, no overshoot, and excellent disturbance rejection properties. The PFC scheme is further expanded by adapting the internal model in the controller in real time in response to changes in the process operating point
    corecore