99 research outputs found
Tomograms and other transforms. A unified view
A general framework is presented which unifies the treatment of wavelet-like,
quasidistribution, and tomographic transforms. Explicit formulas relating the
three types of transforms are obtained. The case of transforms associated to
the symplectic and affine groups is treated in some detail. Special emphasis is
given to the properties of the scale-time and scale-frequency tomograms.
Tomograms are interpreted as a tool to sample the signal space by a family of
curves or as the matrix element of a projector.Comment: 19 pages latex, submitted to J. Phys. A: Math and Ge
Hybrid Transforms
Hybrid transforms are constructed by associating the Wigner-Ville distribution (WVD) with widely-known signal processing tools, such as fractional Fourier transform, linear canonical transform, offset linear canonical transform (OLCT), and their quaternion-valued versions. We call them hybrid transforms because they combine the advantages of both transforms. Compared to classical transforms, they show better results in applications. The WVD associated with the OLCT (WVD-OLCT) is a class of hybrid transform that generalizes most hybrid transforms. This chapter summarizes research on hybrid transforms by reviewing a computationally efficient type of the WVD-OLCT, which has simplicity in marginal properties compared to WVD-OLCT and WVD
Compositional nonlinear audio signal processing with Volterra series
We develop a compositional theory of nonlinear audio signal processing based
on a categorification of the Volterra series. We augment the classical
definition of the Volterra series to be functorial with respect to a base
category whose objects are temperate distributions and whose morphisms are
certain linear transformations. This leads to formulae describing how the
outcomes of nonlinear transformations are affected if their input signals are
first linearly processed. We then consider how nonlinear audio systems change,
and introduce as a model thereof the notion of morphism of Volterra series. We
show how morphisms can be parameterized and used to generate indexed families
of Volterra series, which are well-suited to model nonstationary or
time-varying nonlinear phenomena. We describe how Volterra series and their
morphisms organize into a functor category, Volt, whose objects are Volterra
series and whose morphisms are natural transformations. We exhibit the
operations of sum, product, and series composition of Volterra series as
monoidal products on Volt and identify, for each in turn, its corresponding
universal property. We show, in particular, that the series composition of
Volterra series is associative. We then bridge between our framework and a
subject at the heart of audio signal processing: time-frequency analysis.
Specifically, we show that an equivalence between a certain class of
second-order Volterra series and the bilinear time-frequency distributions
(TFDs) can be extended to one between certain higher-order Volterra series and
the so-called polynomial TFDs. We end with prospects for future work, including
the incorporation of nonlinear system identification techniques and the
extension of our theory to the settings of compositional graph and topological
audio signal processing.Comment: Master's thesi
Characterization and processing of atrial fibrillation episodes by convolutive blind source separation algorithms and nonlinear analysis of spectral features
Las arritmias supraventriculares, en particular la fibrilación auricular (FA), son las enfermedades cardíacas más comúnmente encontradas en la práctica clínica rutinaria. La prevalencia de la FA es inferior al 1\% en la población menor de 60 años, pero aumenta de manera significativa a partir de los 70 años, acercándose al 10\% en los mayores de 80. El padecimiento de un episodio de FA sostenida, además de estar ligado a una mayor tasa de mortalidad, aumenta la probabilidad de sufrir tromboembolismo, infarto de miocardio y accidentes cerebrovasculares. Por otro lado, los episodios de FA paroxística, aquella que termina de manera espontánea, son los precursores de la FA sostenida, lo que suscita un alto interés entre la comunidad científica por conocer los mecanismos responsables de perpetuar o conducir a la terminación espontánea de los episodios de FA.
El análisis del ECG de superficie es la técnica no invasiva más extendida en la diagnosis médica de las patologías cardíacas. Para utilizar el ECG como herramienta de estudio de la FA, se necesita separar la actividad auricular (AA) de las demás señales cardioeléctricas. En este sentido, las técnicas de Separación Ciega de Fuentes (BSS) son capaces de realizar un análisis estadístico multiderivación con el objetivo de recuperar un conjunto de fuentes cardioeléctricas independientes, entre las cuales se encuentra la AA. A la hora de abordar un problema de BSS, se hace necesario considerar un modelo de mezcla de las fuentes lo más ajustado posible a la realidad para poder desarrollar algoritmos matemáticos que lo resuelvan. Un modelo viable es aquel que supone mezclas lineales. Dentro del modelo de mezclas lineales se puede además hacer la restricción de que estas sean instantáneas. Este modelo de mezcla lineal instantánea es el utilizado en el Análisis de Componentes Independientes (ICA).Vayá Salort, C. (2010). Characterization and processing of atrial fibrillation episodes by convolutive blind source separation algorithms and nonlinear analysis of spectral features [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8416Palanci
Audio Signal Processing Using Time-Frequency Approaches: Coding, Classification, Fingerprinting, and Watermarking
Audio signals are information rich nonstationary signals that play an important role in our day-to-day communication, perception of environment, and entertainment. Due to its non-stationary nature, time- or frequency-only approaches are inadequate in analyzing these signals. A joint time-frequency (TF) approach would be a better choice to efficiently process these signals. In this digital era, compression, intelligent indexing for content-based retrieval, classification, and protection of digital audio content are few of the areas that encapsulate a majority of the audio signal processing applications. In this paper, we present a comprehensive array of TF methodologies that successfully address applications in all of the above mentioned areas. A TF-based audio coding scheme with novel psychoacoustics model, music classification, audio classification of environmental sounds, audio fingerprinting, and audio watermarking will be presented to demonstrate the advantages of using time-frequency approaches in analyzing and extracting information from audio signals.</p
Recommended from our members
Optophone design: optical-to-auditory vision substitution for the blind
An optophone is a device that turns light into sound for the benefit of blind people. The present project is intended to produce a general-purpose optophone to be worn on the head about the house and in the street, to give the wearer a detailed description in sound of the'scene he is facing. The device will therefore consist'of an'electronic camera, some signal-processing electronics, earphones`, and a battery. The two major problems are the derivation of (a) the most suitable mapping from images to sounds, and (b) an algorithm to perform the mapping in real'time on existing electronic components. This thesis concerns problem (a). Chapter 2 goes into the general scene-to-sound mapping problem in some detail'and presents the work of earlier investigators. Chapter 3 1- discusses the design of tests to evaluate the performance of candidate mappings. A theoretical performance test (TPT) is derived. Chapter 4 applies the TPT to the most obvious mapping, the cartesian piano transform. Chapter 5 applies the TPT to a mapping based on the cosine transform. Chapter 6 attempts to derive a mapping by principal component analysis, using the inaccuracies of human sight and hearing and the statistical properties of real scenes and sounds. Chapter 7 presents a complete scheme, implemented in software, for representing digitised colour scenes by audible digitised stereo sound. Chapter 8 tries to decide how'many numbers are required to specify a steady spectrum with no noticeable degradation. Chapter 9 looks'at a scheme designed to produce more natural-sounding sounds related to more meaningful portions of the scene. This scheme maps windows in the scene to steady spectral patterns of short duration, the location of the window being conveyed by simulated free-field listening. Chapter 10 gives detailed recommendations as to further work
Global and Local Uncertainty Principles for Signals on Graphs
Uncertainty principles such as Heisenberg's provide limits on the time-frequency concentration of a signal, and constitute an important theoretical tool for designing and evaluating linear signal transforms. Generalizations of such principles to the graph setting can inform dictionary design for graph signals, lead to algorithms for reconstructing missing information from graph signals via sparse representations, and yield new graph analysis tools. While previous work has focused on generalizing notions of spreads of a graph signal in the vertex and graph spectral domains, our approach is to generalize the methods of Lieb in order to develop uncertainty principles that provide limits on the concentration of the analysis coefficients of any graph signal under a dictionary transform whose atoms are jointly localized in the vertex and graph spectral domains. One challenge we highlight is that due to the inhomogeneity of the underlying graph data domain, the local structure in a single small region of the graph can drastically affect the uncertainty bounds for signals concentrated in different regions of the graph, limiting the information provided by global uncertainty principles. Accordingly, we suggest a new way to incorporate a notion of locality, and develop local uncertainty principles that bound the concentration of the analysis coefficients of each atom of a localized graph spectral filter frame in terms of quantities that depend on the local structure of the graph around the center vertex of the given atom. Finally, we demonstrate how our proposed local uncertainty measures can improve the random sampling of graph signals
Bayesian Modeling and Estimation Techniques for the Analysis of Neuroimaging Data
Brain function is hallmarked by its adaptivity and robustness, arising from underlying neural activity that admits well-structured representations in the temporal, spatial, or spectral domains. While neuroimaging techniques such as Electroencephalography (EEG) and magnetoencephalography (MEG) can record rapid neural dynamics at high temporal resolutions, they face several signal processing challenges that hinder their full utilization in capturing these characteristics of neural activity. The objective of this dissertation is to devise statistical modeling and estimation methodologies that account for the dynamic and structured representations of neural activity and to demonstrate their utility in application to experimentally-recorded data.
The first part of this dissertation concerns spectral analysis of neural data. In order to capture the non-stationarities involved in neural oscillations, we integrate multitaper spectral analysis and state-space modeling in a Bayesian estimation setting. We also present a multitaper spectral analysis method tailored for spike trains that captures the non-linearities involved in neuronal spiking. We apply our proposed algorithms to both EEG and spike recordings, which reveal significant gains in spectral resolution and noise reduction.
In the second part, we investigate cortical encoding of speech as manifested in MEG responses. These responses are often modeled via a linear filter, referred to as the temporal response function (TRF). While the TRFs estimated from the sensor-level MEG data have been widely studied, their cortical origins are not fully understood. We define the new notion of Neuro-Current Response Functions (NCRFs) for simultaneously determining the TRFs and their cortical distribution. We develop an efficient algorithm for NCRF estimation and apply it to MEG data, which provides new insights into the cortical dynamics underlying speech processing.
Finally, in the third part, we consider the inference of Granger causal (GC) influences in high-dimensional time series models with sparse coupling. We consider a canonical sparse bivariate autoregressive model and define a new statistic for inferring GC influences, which we refer to as the LASSO-based Granger Causal (LGC) statistic. We establish non-asymptotic guarantees for robust identification of GC influences via the LGC statistic. Applications to simulated and real data demonstrate the utility of the LGC statistic in robust GC identification
Recommended from our members
Image processing methods to segment speech spectrograms for word level recognition
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone
- …