641 research outputs found
Statistical models for natural sounds
It is important to understand the rich structure of natural sounds in order to solve important
tasks, like automatic speech recognition, and to understand auditory processing
in the brain. This thesis takes a step in this direction by characterising the statistics of
simple natural sounds. We focus on the statistics because perception often appears to
depend on them, rather than on the raw waveform. For example the perception of auditory
textures, like running water, wind, fire and rain, depends on summary-statistics,
like the rate of falling rain droplets, rather than on the exact details of the physical
source.
In order to analyse the statistics of sounds accurately it is necessary to improve a
number of traditional signal processing methods, including those for amplitude demodulation,
time-frequency analysis, and sub-band demodulation. These estimation tasks
are ill-posed and therefore it is natural to treat them as Bayesian inference problems.
The new probabilistic versions of these methods have several advantages. For example,
they perform more accurately on natural signals and are more robust to noise,
they can also fill-in missing sections of data, and provide error-bars. Furthermore,
free-parameters can be learned from the signal. Using these new algorithms we demonstrate
that the energy, sparsity, modulation depth and modulation time-scale in each
sub-band of a signal are critical statistics, together with the dependencies between the
sub-band modulators. In order to validate this claim, a model containing co-modulated
coloured noise carriers is shown to be capable of generating a range of realistic sounding
auditory textures.
Finally, we explored the connection between the statistics of natural sounds and perception.
We demonstrate that inference in the model for auditory textures qualitatively
replicates the primitive grouping rules that listeners use to understand simple acoustic
scenes. This suggests that the auditory system is optimised for the statistics of natural
sounds
Recommended from our members
Time-Frequency Analysis as Probabilistic Inference
This is the final published version. It was originally published by IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918491.This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.Funding was provided by EPSRC (grant numbers EP/G050821/1 and
EP/L000776/1) and Google (R.E.T.) and by the Gatsby Charitable Foundation
(M.S.)
A generative model for natural sounds based on latent force modelling
Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don’t incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective
Probabilistic generative modeling of speech
Speech processing refers to a set of tasks that involve speech analysis and synthesis. Most speech processing algorithms model a subset of speech parameters of interest and blur the rest using signal processing techniques and feature extraction. However, evidence shows that many speech parameters can be more accurately estimated if they are modeled jointly; speech synthesis also benefits from joint modeling.
This thesis proposes a probabilistic generative model for speech called the Probabilistic Acoustic Tube (PAT). The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiment shows that the proposed model has good potential for a number of speech processing tasks
Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia
Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The “phonological deficit” in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: “Stress” (~2 Hz), “Syllable” (~4 Hz) and “Sub-beat” (reduced syllables, ~14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either single AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or pairs of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from pairs of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages
Doppler Radar Techniques for Distinct Respiratory Pattern Recognition and Subject Identification.
Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017
On-line health monitoring of passive electronic components using digitally controlled power converter
This thesis presents System Identification based On-Line Health Monitoring to analyse the dynamic behaviour of the Switch-Mode Power Converter (SMPC), detect, and diagnose anomalies in passive electronic components. The anomaly detection in this research is determined by examining the change in passive component values due to degradation. Degradation, which is a long-term process, however, is characterised by inserting different component values in the power converter. The novel health-monitoring capability enables accurate detection of passive electronic components despite component variations and uncertainties and is valid for different topologies of the switch-mode power converter.
The need for a novel on-line health-monitoring capability is driven by the need to improve unscheduled in-service, logistics, and engineering costs, including the requirement of Integrated Vehicle Health Management (IVHM) for electronic systems and components. The detection and diagnosis of degradations and failures within power converters is of great importance for aircraft electronic manufacturers, such as Thales, where component failures result in equipment downtime and large maintenance costs. The fact that existing techniques, including built-in-self test, use of dedicated sensors, physics-of-failure, and data-driven based health-monitoring, have yet to deliver extensive application in IVHM, provides the motivation for this research ... [cont.]
An Overview on Application of Machine Learning Techniques in Optical Networks
Today's telecommunication networks have become sources of enormous amounts of
widely heterogeneous data. This information can be retrieved from network
traffic traces, network alarms, signal quality indicators, users' behavioral
data, etc. Advanced mathematical tools are required to extract meaningful
information from these data and take decisions pertaining to the proper
functioning of the networks from the network-generated data. Among these
mathematical tools, Machine Learning (ML) is regarded as one of the most
promising methodological approaches to perform network-data analysis and enable
automated network self-configuration and fault management. The adoption of ML
techniques in the field of optical communication networks is motivated by the
unprecedented growth of network complexity faced by optical networks in the
last few years. Such complexity increase is due to the introduction of a huge
number of adjustable and interdependent system parameters (e.g., routing
configurations, modulation format, symbol rate, coding schemes, etc.) that are
enabled by the usage of coherent transmission/reception technologies, advanced
digital signal processing and compensation of nonlinear effects in optical
fiber propagation. In this paper we provide an overview of the application of
ML to optical communications and networking. We classify and survey relevant
literature dealing with the topic, and we also provide an introductory tutorial
on ML for researchers and practitioners interested in this field. Although a
good number of research papers have recently appeared, the application of ML to
optical networks is still in its infancy: to stimulate further work in this
area, we conclude the paper proposing new possible research directions
Extending Critical Infrastructure Element Longevity using Constellation-based ID Verification
This work supports a technical cradle-to-grave protection strategy aimed at extending the useful lifespan of Critical Infrastructure (CI) elements. This is done by improving mid-life operational protection measures through integration of reliable physical (PHY) layer security mechanisms. The goal is to improve existing protection that is heavily reliant on higher-layer mechanisms that are commonly targeted by cyberattack. Relative to prior device ID discrimination works, results herein reinforce the exploitability of constellation-based PHY layer features and the ability for those features to be practically implemented to enhance CI security. Prior work is extended by formalizing a device ID verification process that enables rogue device detection demonstration under physical access attack conditions that include unauthorized devices mimicking bit-level credentials of authorized network devices. The work transitions from distance-based to probability-based measures of similarity derived from empirical Multivariate Normal Probability Density Function (MVNPDF) statistics of multiple discriminant analysis radio frequency fingerprint projections. Demonstration results for Constellation-Based Distinct Native Attribute (CB-DNA) fingerprinting of WirelessHART adapters from two manufacturers includes 1) average cross-class percent correct classification of %C \u3e 90% across 28 different networks comprised of six authorized devices, and 2) average rogue rejection rate of 83.4% ≤ RRR ≤ 99.9% based on two held-out devices serving as attacking rogue devices for each network (a total of 120 individual rogue attacks). Using the MVNPDF measure proved most effective and yielded nearly 12% RRR improvement over a Euclidean distance measure
- …