Search CORE

641 research outputs found

Statistical models for natural sounds

Author: Turner R.E.
Publication venue: UCL (University College London)
Publication date: 01/01/2010
Field of study

It is important to understand the rich structure of natural sounds in order to solve important tasks, like automatic speech recognition, and to understand auditory processing in the brain. This thesis takes a step in this direction by characterising the statistics of simple natural sounds. We focus on the statistics because perception often appears to depend on them, rather than on the raw waveform. For example the perception of auditory textures, like running water, wind, fire and rain, depends on summary-statistics, like the rate of falling rain droplets, rather than on the exact details of the physical source. In order to analyse the statistics of sounds accurately it is necessary to improve a number of traditional signal processing methods, including those for amplitude demodulation, time-frequency analysis, and sub-band demodulation. These estimation tasks are ill-posed and therefore it is natural to treat them as Bayesian inference problems. The new probabilistic versions of these methods have several advantages. For example, they perform more accurately on natural signals and are more robust to noise, they can also fill-in missing sections of data, and provide error-bars. Furthermore, free-parameters can be learned from the signal. Using these new algorithms we demonstrate that the energy, sparsity, modulation depth and modulation time-scale in each sub-band of a signal are critical statistics, together with the dependencies between the sub-band modulators. In order to validate this claim, a model containing co-modulated coloured noise carriers is shown to be capable of generating a range of realistic sounding auditory textures. Finally, we explored the connection between the statistics of natural sounds and perception. We demonstrate that inference in the model for auditory textures qualitatively replicates the primitive grouping rules that listeners use to understand simple acoustic scenes. This suggests that the auditory system is optimised for the statistics of natural sounds

CiteSeerX

UCL Discovery

Recommended from our members

Time-Frequency Analysis as Probabilistic Inference

Author: Sahani Maneesh
Turner Richard E
Publication venue: IEEE Transactions on Signal Processing
Publication date: 01/12/2014
Field of study

This is the final published version. It was originally published by IEEE at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6918491.This paper proposes a new view of time-frequency analysis framed in terms of probabilistic inference. Natural signals are assumed to be formed by the superposition of distinct time-frequency components, with the analytic goal being to infer these components by application of Bayes' rule. The framework serves to unify various existing models for natural time-series; it relates to both the Wiener and Kalman filters, and with suitable assumptions yields inferential interpretations of the short-time Fourier transform, spectrogram, filter bank, and wavelet representations. Value is gained by placing time-frequency analysis on the same probabilistic basis as is often employed in applications such as denoising, source separation, or recognition. Uncertainty in the time-frequency representation can be propagated correctly to application-specific stages, improving the handing of noise and missing data. Probabilistic learning allows modules to be co-adapted; thus, the time-frequency representation can be adapted to both the demands of the application and the time-varying statistics of the signal at hand. Similarly, the application module can be adapted to fine properties of the signal propagated by the initial time-frequency processing. We demonstrate these benefits by combining probabilistic time-frequency representations with non-negative matrix factorization, finding benefits in audio denoising and inpainting tasks, albeit with higher computational cost than incurred by the standard approach.Funding was provided by EPSRC (grant numbers EP/G050821/1 and EP/L000776/1) and Google (R.E.T.) and by the Gatsby Charitable Foundation (M.S.)

Apollo (Cambridge)

A generative model for natural sounds based on latent force modelling

Author: C Févotte
CE Rasmussen
GJ Mysore
JH McDermott
MA Alvarez
RE Turner
RE Turner
S Särkkä
T Virtanen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/07/2018
Field of study

Generative models based on subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitude modulation to be a crucial component of auditory perception. Probabilistic latent variable analysis can be particularly insightful, but existing approaches don’t incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay or feedback. We use latent force modelling, a probabilistic learning paradigm that encodes physical knowledge into Gaussian process regression, to model correlation across spectral subband envelopes. We augment the standard latent force model approach by explicitly modelling dependencies across multiple time steps. Incorporating this prior knowledge strengthens the interpretation of the latent functions as the source that generated the signal. We examine this interpretation via an experiment showing that sounds generated by sampling from our probabilistic model are perceived to be more realistic than those generated by comparative models based on nonnegative matrix factorisation, even in cases where our model is outperformed from a reconstruction error perspective

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

Probabilistic generative modeling of speech

Author: Zhang Yang
Publication venue
Publication date: 01/12/2015
Field of study

Speech processing refers to a set of tasks that involve speech analysis and synthesis. Most speech processing algorithms model a subset of speech parameters of interest and blur the rest using signal processing techniques and feature extraction. However, evidence shows that many speech parameters can be more accurately estimated if they are modeled jointly; speech synthesis also benefits from joint modeling. This thesis proposes a probabilistic generative model for speech called the Probabilistic Acoustic Tube (PAT). The highlights of the model are threefold. First, it is among the very first works to build a complete probabilistic model for speech. Second, it has a well-designed model for the phase spectrum of speech, which has been hard to model and often neglected. Third, it models the AM-FM effects in speech, which are perceptually significant but often ignored in frame-based speech processing algorithms. Experiment shows that the proposed model has good potential for a number of speech processing tasks

Illinois Digital Environment for Access to Learning and Scholarship Repository

Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia

Author: Usha Goswami
Victoria Leong
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2014
Field of study

Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The “phonological deficit” in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: “Stress” (~2 Hz), “Syllable” (~4 Hz) and “Sub-beat” (reduced syllables, ~14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either single AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or pairs of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from pairs of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Doppler Radar Techniques for Distinct Respiratory Pattern Recognition and Subject Identification.

Author: Rahman Ashikur
Publication venue: University of Hawaiʻi at Mānoa
Publication date: 01/08/2017
Field of study

Ph.D. Thesis. University of Hawaiʻi at Mānoa 2017

ScholarSpace at University of Hawai'i at Manoa

On-line health monitoring of passive electronic components using digitally controlled power converter

Author: Mann Jaspreet Kaur
Publication venue: Cranfield University
Publication date: 01/07/2016
Field of study

This thesis presents System Identification based On-Line Health Monitoring to analyse the dynamic behaviour of the Switch-Mode Power Converter (SMPC), detect, and diagnose anomalies in passive electronic components. The anomaly detection in this research is determined by examining the change in passive component values due to degradation. Degradation, which is a long-term process, however, is characterised by inserting different component values in the power converter. The novel health-monitoring capability enables accurate detection of passive electronic components despite component variations and uncertainties and is valid for different topologies of the switch-mode power converter. The need for a novel on-line health-monitoring capability is driven by the need to improve unscheduled in-service, logistics, and engineering costs, including the requirement of Integrated Vehicle Health Management (IVHM) for electronic systems and components. The detection and diagnosis of degradations and failures within power converters is of great importance for aircraft electronic manufacturers, such as Thales, where component failures result in equipment downtime and large maintenance costs. The fact that existing techniques, including built-in-self test, use of dedicated sensors, physics-of-failure, and data-driven based health-monitoring, have yet to deliver extensive application in IVHM, provides the motivation for this research ... [cont.]

Cranfield CERES

An Overview on Application of Machine Learning Techniques in Optical Networks

Author: Macaluso Irene
Musumeci Francesco
Nag Avishek
Rottondi Cristina
Ruffini Marco
Tornatore Massimo
Zibar Darko
Publication venue
Publication date: 01/12/2018
Field of study

Today's telecommunication networks have become sources of enormous amounts of widely heterogeneous data. This information can be retrieved from network traffic traces, network alarms, signal quality indicators, users' behavioral data, etc. Advanced mathematical tools are required to extract meaningful information from these data and take decisions pertaining to the proper functioning of the networks from the network-generated data. Among these mathematical tools, Machine Learning (ML) is regarded as one of the most promising methodological approaches to perform network-data analysis and enable automated network self-configuration and fault management. The adoption of ML techniques in the field of optical communication networks is motivated by the unprecedented growth of network complexity faced by optical networks in the last few years. Such complexity increase is due to the introduction of a huge number of adjustable and interdependent system parameters (e.g., routing configurations, modulation format, symbol rate, coding schemes, etc.) that are enabled by the usage of coherent transmission/reception technologies, advanced digital signal processing and compensation of nonlinear effects in optical fiber propagation. In this paper we provide an overview of the application of ML to optical communications and networking. We classify and survey relevant literature dealing with the topic, and we also provide an introductory tutorial on ML for researchers and practitioners interested in this field. Although a good number of research papers have recently appeared, the application of ML to optical networks is still in its infancy: to stimulate further work in this area, we conclude the paper proposing new possible research directions

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Online Research Database In Technology

Extending Critical Infrastructure Element Longevity using Constellation-based ID Verification

Author: Betances J. Addison
Rondeau Christopher M.
Schubert Kabban Christine M.
Temple Michael A.
Publication venue: AFIT Scholar
Publication date: 01/01/2021
Field of study

This work supports a technical cradle-to-grave protection strategy aimed at extending the useful lifespan of Critical Infrastructure (CI) elements. This is done by improving mid-life operational protection measures through integration of reliable physical (PHY) layer security mechanisms. The goal is to improve existing protection that is heavily reliant on higher-layer mechanisms that are commonly targeted by cyberattack. Relative to prior device ID discrimination works, results herein reinforce the exploitability of constellation-based PHY layer features and the ability for those features to be practically implemented to enhance CI security. Prior work is extended by formalizing a device ID verification process that enables rogue device detection demonstration under physical access attack conditions that include unauthorized devices mimicking bit-level credentials of authorized network devices. The work transitions from distance-based to probability-based measures of similarity derived from empirical Multivariate Normal Probability Density Function (MVNPDF) statistics of multiple discriminant analysis radio frequency fingerprint projections. Demonstration results for Constellation-Based Distinct Native Attribute (CB-DNA) fingerprinting of WirelessHART adapters from two manufacturers includes 1) average cross-class percent correct classification of %C \u3e 90% across 28 different networks comprised of six authorized devices, and 2) average rogue rejection rate of 83.4% ≤ RRR ≤ 99.9% based on two held-out devices serving as attacking rogue devices for each network (a total of 120 individual rogue attacks). Using the MVNPDF measure proved most effective and yielded nearly 12% RRR improvement over a Euclidean distance measure

AFTI Scholar (Air Force Institute of Technology)