12,783 research outputs found
Graph Signal Processing: Overview, Challenges and Applications
Research in Graph Signal Processing (GSP) aims to develop tools for
processing data defined on irregular graph domains. In this paper we first
provide an overview of core ideas in GSP and their connection to conventional
digital signal processing. We then summarize recent developments in developing
basic GSP tools, including methods for sampling, filtering or graph learning.
Next, we review progress in several application areas using GSP, including
processing and analysis of sensor network data, biological data, and
applications to image processing and machine learning. We finish by providing a
brief historical perspective to highlight how concepts recently developed in
GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE
Speech Processing in Computer Vision Applications
Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data
Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms
We propose a novel method for constructing Hilbert transform (HT) pairs of
wavelet bases based on a fundamental approximation-theoretic characterization
of scaling functions--the B-spline factorization theorem. In particular,
starting from well-localized scaling functions, we construct HT pairs of
biorthogonal wavelet bases of L^2(R) by relating the corresponding wavelet
filters via a discrete form of the continuous HT filter. As a concrete
application of this methodology, we identify HT pairs of spline wavelets of a
specific flavor, which are then combined to realize a family of complex
wavelets that resemble the optimally-localized Gabor function for sufficiently
large orders.
Analytic wavelets, derived from the complexification of HT wavelet pairs,
exhibit a one-sided spectrum. Based on the tensor-product of such analytic
wavelets, and, in effect, by appropriately combining four separable
biorthogonal wavelet bases of L^2(R^2), we then discuss a methodology for
constructing 2D directional-selective complex wavelets. In particular,
analogous to the HT correspondence between the components of the 1D
counterpart, we relate the real and imaginary components of these complex
wavelets using a multi-dimensional extension of the HT--the directional HT.
Next, we construct a family of complex spline wavelets that resemble the
directional Gabor functions proposed by Daugman. Finally, we present an
efficient FFT-based filterbank algorithm for implementing the associated
complex wavelet transform.Comment: 36 pages, 8 figure
Statistically optimum pre- and postfiltering in quantization
We consider the optimization of pre- and postfilters surrounding a quantization system. The goal is to optimize the filters such that the mean square error is minimized under the key constraint that the quantization noise variance is directly proportional to the variance of the quantization system input. Unlike some previous work, the postfilter is not restricted to be the inverse of the prefilter. With no order constraint on the filters, we present closed-form solutions for the optimum pre- and postfilters when the quantization system is a uniform quantizer. Using these optimum solutions, we obtain a coding gain expression for the system under study. The coding gain expression clearly indicates that, at high bit rates, there is no loss in generality in restricting the postfilter to be the inverse of the prefilter. We then repeat the same analysis with first-order pre- and postfilters in the form 1+αz-1 and 1/(1+γz^-1 ). In specific, we study two cases: 1) FIR prefilter, IIR postfilter and 2) IIR prefilter, FIR postfilter. For each case, we obtain a mean square error expression, optimize the coefficients α and γ and provide some examples where we compare the coding gain performance with the case of α=γ. In the last section, we assume that the quantization system is an orthonormal perfect reconstruction filter bank. To apply the optimum preand postfilters derived earlier, the output of the filter bank must be wide-sense stationary WSS which, in general, is not true. We provide two theorems, each under a different set of assumptions, that guarantee the wide sense stationarity of the filter bank output. We then propose a suboptimum procedure to increase the coding gain of the orthonormal filter bank
Digital Signal Processing
Contains summary of research and reports on sixteen research projects.U.S. Navy - Office of Naval Research (Contract N00014-75-C-0852)National Science Foundation FellowshipNATO FellowshipU.S. Navy - Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ECS79-15226)U.S. Navy - Office of Naval Research (Contract N00014-77-C-0257)Bell LaboratoriesNational Science Foundation (Grant ECS80-07102)Schlumberger-Doll Research Center FellowshipHertz Foundation FellowshipGovernment of Pakistan ScholarshipU.S. Navy - Office of Naval Research (Contract N00014-77-C-0196)U.S. Air Force (Contract F19628-81-C-0002)Hughes Aircraft Company Fellowshi
Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals
In this paper, we propose a new method for the accurate estimation and
tracking of formants in speech signals using time-varying quasi-closed-phase
(TVQCP) analysis. Conventional formant tracking methods typically adopt a
two-stage estimate-and-track strategy wherein an initial set of formant
candidates are estimated using short-time analysis (e.g., 10--50 ms), followed
by a tracking stage based on dynamic programming or a linear state-space model.
One of the main disadvantages of these approaches is that the tracking stage,
however good it may be, cannot improve upon the formant estimation accuracy of
the first stage. The proposed TVQCP method provides a single-stage formant
tracking that combines the estimation and tracking stages into one. TVQCP
analysis combines three approaches to improve formant estimation and tracking:
(1) it uses temporally weighted quasi-closed-phase analysis to derive
closed-phase estimates of the vocal tract with reduced interference from the
excitation source, (2) it increases the residual sparsity by using the
optimization and (3) it uses time-varying linear prediction analysis over long
time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal
tract model and hence on the formant trajectories. Formant tracking experiments
with a wide variety of synthetic and natural speech signals show that the
proposed TVQCP method performs better than conventional and popular formant
tracking tools, such as Wavesurfer and Praat (based on dynamic programming),
the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on
deep neural networks trained in a supervised manner). Matlab scripts for the
proposed method can be found at: https://github.com/njaygowda/ftrac
Shift Estimation Algorithm for Dynamic Sensors With Frame-to-Frame Variation in Their Spectral Response
This study is motivated by the emergence of a new class of tunable infrared spectral-imaging sensors that offer the ability to dynamically vary the sensor\u27s intrinsic spectral response from frame to frame in an electronically controlled fashion. A manifestation of this is when a sequence of dissimilar spectral responses is periodically realized, whereby in every period of acquired imagery, each frame is associated with a distinct spectral band. Traditional scene-based global shift estimation algorithms are not applicable to such spectrally heterogeneous video sequences, as a pixel value may change from frame to frame as a result of both global motion and varying spectral response. In this paper, a novel algorithm is proposed and examined to fuse a series of coarse global shift estimates between periodically sampled pairs of nonadjacent frames to estimate motion between consecutive frames; each pair corresponds to two nonadjacent frames of the same spectral band. The proposed algorithm outperforms three alternative methods, with the average error being one half of that obtained by using an equal weights version of the proposed algorithm, one-fourth of that obtained by using a simple linear interpolation method, and one-twentieth of that obtained by using a naiÂżve correlation-based direct method
- …