Search CORE

194 research outputs found

Proceedings of the 7th Sound and Music Computing Conference

Author: Emilia Gómez
Perfecto Herrera
Rafael Ramirez
Publication venue: SMC Network
Publication date: 25/07/2010
Field of study

Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

ZENODO

Single channel overlapped-speech detection and separation of spontaneous conversations

Author: Kadhim Hasan Mohammad-Ali
Publication venue
Publication date: 01/01/2018
Field of study

PhD ThesisIn the thesis, spontaneous conversation containing both speech mixture and speech dialogue is considered. The speech mixture refers to speakers speaking simultaneously (i.e. the overlapped-speech). The speech dialogue refers to only one speaker is actively speaking and the other is silent. That Input conversation is firstly processed by the overlapped-speech detection. Two output signals are then segregated into dialogue and mixture formats. The dialogue is processed by speaker diarization. Its outputs are the individual speech of each speaker. The mixture is processed by speech separation. Its outputs are independent separated speech signals of the speaker. When the separation input contains only the mixture, blind speech separation approach is used. When the separation is assisted by the outputs of the speaker diarization, it is informed speech separation. The research presents novel: overlapped-speech detection algorithm, and two speech separation algorithms. The proposed overlapped-speech detection is an algorithm to estimate the switching instants of the input. Optimization loop is adapted to adopt the best capsulated audio features and to avoid the worst. The optimization depends on principles of the pattern recognition, and k-means clustering. For of 300 simulated conversations, averages of: False-Alarm Error is 1.9%, Missed-Speech Error is 0.4%, and Overlap-Speaker Error is 1%. Approximately, these errors equal the errors of best recent reliable speaker diarization corpuses. The proposed blind speech separation algorithm consists of four sequential techniques: filter-bank analysis, Non-negative Matrix Factorization (NMF), speaker clustering and filter-bank synthesis. Instead of the required speaker segmentation, effective standard framing is contributed. Average obtained objective tests (SAR, SDR and SIR) of 51 simulated conversations are: 5.06dB, 4.87dB and 12.47dB respectively. For the proposed informed speech separation algorithm, outputs of the speaker diarization are a generated-database. The database associated the speech separation by creating virtual targeted-speech and mixture. The contributed virtual signals are trained to facilitate the separation by homogenising them with the NMF-matrix elements of the real mixture. Contributed masking optimized the resulting speech. Average obtained SAR, SDR and SIR of 341 simulated conversations are 9.55dB, 1.12dB, and 2.97dB respectively. Per the objective tests of the two speech separation algorithms, they are in the mid-range of the well-known NMF-based audio and speech separation methods

Newcastle University eTheses

Grafting Acoustic Instruments and Signal Processing: Creative Control and Augmented Expressivity

Author: Freed Adrian
Overholt Daniel
Publication venue
Publication date: 01/01/2013
Field of study

VBN

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

Recommended from our members

Hidden states, hidden structures: Bayesian learning in time series models

Author: Murphy James Kevin
Publication venue: University of Cambridge
Publication date: 10/06/2014
Field of study

This thesis presents methods for the inference of system state and the learning of model structure for a number of hidden-state time series models, within a Bayesian probabilistic framework. Motivating examples are taken from application areas including finance, physical object tracking and audio restoration. The work in this thesis can be broadly divided into three themes: system and parameter estimation in linear jump-diffusion systems, non-parametric model (system) estimation and batch audio restoration. For linear jump-diffusion systems, efficient state estimation methods based on the variable rate particle filter are presented for the general linear case (chapter 3) and a new method of parameter estimation based on Particle MCMC methods is introduced and tested against an alternative method using reversible-jump MCMC (chapter 4). Non-parametric model estimation is examined in two settings: the estimation of non-parametric environment models in a SLAM-style problem, and the estimation of the network structure and forms of linkage between multiple objects. In the former case, a non-parametric Gaussian process prior model is used to learn a potential field model of the environment in which a target moves. Efficient solution methods based on Rao-Blackwellized particle filters are given (chapter 5). In the latter case, a new way of learning non-linear inter-object relationships in multi-object systems is developed, allowing complicated inter-object dynamics to be learnt and causality between objects to be inferred. Again based on Gaussian process prior assumptions, the method allows the identification of a wide range of relationships between objects with minimal assumptions and admits efficient solution, albeit in batch form at present (chapter 6). Finally, the thesis presents some new results in the restoration of audio signals, in particular the removal of impulse noise (pops and clicks) from audio recordings (chapter 7)This work was supported by the Engineering and Physical Sciences Research Council (EPSRC

Apollo (Cambridge)

Exploring Animal Behavior Through Sound: Volume 1

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/01/2023
Field of study

This open-access book empowers its readers to explore the acoustic world of animals. By listening to the sounds of nature, we can study animal behavior, distribution, and demographics; their habitat characteristics and needs; and the effects of noise. Sound recording is an efficient and affordable tool, independent of daylight and weather; and recorders may be left in place for many months at a time, continuously collecting data on animals and their environment. This book builds the skills and knowledge necessary to collect and interpret acoustic data from terrestrial and marine environments. Beginning with a history of sound recording, the chapters provide an overview of off-the-shelf recording equipment and analysis tools (including automated signal detectors and statistical methods); audiometric methods; acoustic terminology, quantities, and units; sound propagation in air and under water; soundscapes of terrestrial and marine habitats; animal acoustic and vibrational communication; echolocation; and the effects of noise. This book will be useful to students and researchers of animal ecology who wish to add acoustics to their toolbox, as well as to environmental managers in industry and government

Directory of Open Access Books (DOAB)

Magnetic Tape Recording for the Eighties

Author: Kalil Ford
Publication venue
Publication date
Field of study

The practical and theoretical aspects of state-of-the-art magnetic tape recording technology are reviewed. Topics covered include the following: (1) analog and digital magnetic tape recording, (2) tape and head wear, (3) wear testing, (4) magnetic tape certification, (5) care, handling, and management of magnetic tape, (6) cleaning, packing, and winding of magnetic tape, (7) tape reels, bands, and packaging, (8) coding techniques for high-density digital recording, and (9) tradeoffs of coding techniques

NASA Technical Reports Server

Proceedings of the Sixteenth Australasian International Conference on Speech Science and Technology

Author
Publication venue: ASSTA
Publication date: 31/12/2016
Field of study

UCL Discovery

On the Recognition of Emotion from Physiological Data

Author: Creemers Warren
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2013
Field of study

This work encompasses several objectives, but is primarily concerned with an experiment where 33 participants were shown 32 slides in order to create ‗weakly induced emotions‘. Recordings of the participants‘ physiological state were taken as well as a self report of their emotional state. We then used an assortment of classifiers to predict emotional state from the recorded physiological signals, a process known as Physiological Pattern Recognition (PPR). We investigated techniques for recording, processing and extracting features from six different physiological signals: Electrocardiogram (ECG), Blood Volume Pulse (BVP), Galvanic Skin Response (GSR), Electromyography (EMG), for the corrugator muscle, skin temperature for the finger and respiratory rate. Improvements to the state of PPR emotion detection were made by allowing for 9 different weakly induced emotional states to be detected at nearly 65% accuracy. This is an improvement in the number of states readily detectable. The work presents many investigations into numerical feature extraction from physiological signals and has a chapter dedicated to collating and trialing facial electromyography techniques. There is also a hardware device we created to collect participant self reported emotional states which showed several improvements to experimental procedure

Research Online @ ECU

Exploring Animal Behavior Through Sound: Volume 1

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library