194 research outputs found
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
Single channel overlapped-speech detection and separation of spontaneous conversations
PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms.
The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses.
The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively.
For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively.
Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Recommended from our members
Hidden states, hidden structures: Bayesian learning in time series models
This thesis presents methods for the inference of system state and the learning of model structure for a number of hidden-state time series models, within a Bayesian probabilistic framework. Motivating examples are taken from application areas including finance, physical object tracking and audio restoration. The work in this thesis can be broadly divided into three themes: system and parameter estimation in linear jump-diffusion systems, non-parametric model (system) estimation and batch audio restoration.
For linear jump-diffusion systems, efficient state estimation methods based on the variable rate particle filter are presented for the general linear case (chapter 3) and a new method of parameter estimation based on Particle MCMC methods is introduced and tested against an alternative method using reversible-jump MCMC (chapter 4).
Non-parametric model estimation is examined in two settings: the estimation of non-parametric environment models in a SLAM-style problem, and the estimation of the network structure and forms of linkage between multiple objects. In the former case, a non-parametric Gaussian process prior model is used to learn a potential field model of the environment in which a target moves. Efficient solution methods based on Rao-Blackwellized particle filters are given (chapter 5). In the latter case, a new way of learning non-linear inter-object relationships in multi-object systems is developed, allowing complicated inter-object dynamics to be learnt and causality between objects to be inferred. Again based on Gaussian process prior assumptions, the method allows the identification of a wide range of relationships between objects with minimal assumptions and admits efficient solution, albeit in batch form at present (chapter 6).
Finally, the thesis presents some new results in the restoration of audio signals, in particular the removal of impulse noise (pops and clicks) from audio recordings (chapter 7)This work was supported by the Engineering and Physical Sciences Research Council (EPSRC
Exploring Animal Behavior Through Sound: Volume 1
This open-access book empowers its readers to explore the acoustic world of animals. By listening to the sounds of nature, we can study animal behavior, distribution, and demographics; their habitat characteristics and needs; and the effects of noise. Sound recording is an efficient and affordable tool, independent of daylight and weather; and recorders may be left in place for many months at a time, continuously collecting data on animals and their environment. This book builds the skills and knowledge necessary to collect and interpret acoustic data from terrestrial and marine environments. Beginning with a history of sound recording, the chapters provide an overview of off-the-shelf recording equipment and analysis tools (including automated signal detectors and statistical methods); audiometric methods; acoustic terminology, quantities, and units; sound propagation in air and under water; soundscapes of terrestrial and marine habitats; animal acoustic and vibrational communication; echolocation; and the effects of noise. This book will be useful to students and researchers of animal ecology who wish to add acoustics to their toolbox, as well as to environmental managers in industry and government
Magnetic Tape Recording for the Eighties
The practical and theoretical aspects of state-of-the-art magnetic tape recording technology are reviewed. Topics covered include the following: (1) analog and digital magnetic tape recording, (2) tape and head wear, (3) wear testing, (4) magnetic tape certification, (5) care, handling, and management of magnetic tape, (6) cleaning, packing, and winding of magnetic tape, (7) tape reels, bands, and packaging, (8) coding techniques for high-density digital recording, and (9) tradeoffs of coding techniques
On the Recognition of Emotion from Physiological Data
This work encompasses several objectives, but is primarily concerned with an experiment where 33 participants were shown 32 slides in order to create ‗weakly induced emotions‘. Recordings of the participants‘ physiological state were taken as well as a self report of their emotional state. We then used an assortment of classifiers to predict emotional state from the recorded physiological signals, a process known as Physiological Pattern Recognition (PPR). We investigated techniques for recording, processing and extracting features from six different physiological signals: Electrocardiogram (ECG), Blood Volume Pulse (BVP), Galvanic Skin Response (GSR), Electromyography (EMG), for the corrugator muscle, skin temperature for the finger and respiratory rate. Improvements to the state of PPR emotion detection were made by allowing for 9 different weakly induced emotional states to be detected at nearly 65% accuracy. This is an improvement in the number of states readily detectable. The work presents many investigations into numerical feature extraction from physiological signals and has a chapter dedicated to collating and trialing facial electromyography techniques. There is also a hardware device we created to collect participant self reported emotional states which showed several improvements to experimental procedure
Exploring Animal Behavior Through Sound: Volume 1
This open-access book empowers its readers to explore the acoustic world of animals. By listening to the sounds of nature, we can study animal behavior, distribution, and demographics; their habitat characteristics and needs; and the effects of noise. Sound recording is an efficient and affordable tool, independent of daylight and weather; and recorders may be left in place for many months at a time, continuously collecting data on animals and their environment. This book builds the skills and knowledge necessary to collect and interpret acoustic data from terrestrial and marine environments. Beginning with a history of sound recording, the chapters provide an overview of off-the-shelf recording equipment and analysis tools (including automated signal detectors and statistical methods); audiometric methods; acoustic terminology, quantities, and units; sound propagation in air and under water; soundscapes of terrestrial and marine habitats; animal acoustic and vibrational communication; echolocation; and the effects of noise. This book will be useful to students and researchers of animal ecology who wish to add acoustics to their toolbox, as well as to environmental managers in industry and government
- …