Search CORE

12,783 research outputs found

Graph Signal Processing: Overview, Challenges and Applications

Author: Frossard Pascal
Kovačević Jelena
Moura José M. F.
Ortega Antonio
Vandergheynst Pierre
Publication venue
Publication date: 26/03/2018
Field of study

Research in Graph Signal Processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing. We then summarize recent developments in developing basic GSP tools, including methods for sampling, filtering or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning. We finish by providing a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas.Comment: To appear, Proceedings of the IEE

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Speech Processing in Computer Vision Applications

Author: Waterworth Nicholas
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2020
Field of study

Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

ScholarWorks@UARK

UARK (University of Arkansas )

Construction of Hilbert Transform Pairs of Wavelet Bases and Gabor-like Transforms

Author: Chaudhury Kunal Narayan
Unser Michael
Publication venue
Publication date: 01/01/2009
Field of study

We propose a novel method for constructing Hilbert transform (HT) pairs of wavelet bases based on a fundamental approximation-theoretic characterization of scaling functions--the B-spline factorization theorem. In particular, starting from well-localized scaling functions, we construct HT pairs of biorthogonal wavelet bases of L^2(R) by relating the corresponding wavelet filters via a discrete form of the continuous HT filter. As a concrete application of this methodology, we identify HT pairs of spline wavelets of a specific flavor, which are then combined to realize a family of complex wavelets that resemble the optimally-localized Gabor function for sufficiently large orders. Analytic wavelets, derived from the complexification of HT wavelet pairs, exhibit a one-sided spectrum. Based on the tensor-product of such analytic wavelets, and, in effect, by appropriately combining four separable biorthogonal wavelet bases of L^2(R^2), we then discuss a methodology for constructing 2D directional-selective complex wavelets. In particular, analogous to the HT correspondence between the components of the 1D counterpart, we relate the real and imaginary components of these complex wavelets using a multi-dimensional extension of the HT--the directional HT. Next, we construct a family of complex spline wavelets that resemble the directional Gabor functions proposed by Daugman. Finally, we present an efficient FFT-based filterbank algorithm for implementing the associated complex wavelet transform.Comment: 36 pages, 8 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Statistically optimum pre- and postfiltering in quantization

Author: Tuqan Jamal
Vaidyanathan P. P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/1997
Field of study

We consider the optimization of pre- and postfilters surrounding a quantization system. The goal is to optimize the filters such that the mean square error is minimized under the key constraint that the quantization noise variance is directly proportional to the variance of the quantization system input. Unlike some previous work, the postfilter is not restricted to be the inverse of the prefilter. With no order constraint on the filters, we present closed-form solutions for the optimum pre- and postfilters when the quantization system is a uniform quantizer. Using these optimum solutions, we obtain a coding gain expression for the system under study. The coding gain expression clearly indicates that, at high bit rates, there is no loss in generality in restricting the postfilter to be the inverse of the prefilter. We then repeat the same analysis with first-order pre- and postfilters in the form 1+αz-1 and 1/(1+γz^-1 ). In specific, we study two cases: 1) FIR prefilter, IIR postfilter and 2) IIR prefilter, FIR postfilter. For each case, we obtain a mean square error expression, optimize the coefficients α and γ and provide some examples where we compare the coding gain performance with the case of α=γ. In the last section, we assume that the quantization system is an orthonormal perfect reconstruction filter bank. To apply the optimum preand postfilters derived earlier, the output of the filter bank must be wide-sense stationary WSS which, in general, is not true. We provide two theorems, each under a different set of assumptions, that guarantee the wide sense stationarity of the filter bank output. We then propose a suboptimum procedure to increase the coding gain of the orthonormal filter bank

Caltech Authors

Digital Signal Processing

Author: Baggeroer Arthur B.
Bordley Thomas E.
Braccini Carlo
Dove Webster P.
Duckworth Gregory L.
Esmersoy Cengiz
Espy-Wilson Carol Y.
Frisk George V.
Hayes Monson H.
Kurkjian Andrew L.
Lang Stephen W.
Li Yen-ta
Lim Jae S.
Malik Naveed A.
McClellan James H.
Mook Douglas R.
Musicus Bruce R.
Myers Cory
Nawab Syed H.
Oppenheim Alan V.
Wang David L.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 01/01/1981
Field of study

Contains summary of research and reports on sixteen research projects.U.S. Navy - Office of Naval Research (Contract N00014-75-C-0852)National Science Foundation FellowshipNATO FellowshipU.S. Navy - Office of Naval Research (Contract N00014-75-C-0951)National Science Foundation (Grant ECS79-15226)U.S. Navy - Office of Naval Research (Contract N00014-77-C-0257)Bell LaboratoriesNational Science Foundation (Grant ECS80-07102)Schlumberger-Doll Research Center FellowshipHertz Foundation FellowshipGovernment of Pakistan ScholarshipU.S. Navy - Office of Naval Research (Contract N00014-77-C-0196)U.S. Air Force (Contract F19628-81-C-0002)Hughes Aircraft Company Fellowshi

DSpace@MIT

Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

Author: Alku Paavo
Gowda Dhananjaya
Kadiri Sudarsana Reddy
Story Brad
Publication venue
Publication date: 31/08/2023
Field of study

In this paper, we propose a new method for the accurate estimation and tracking of formants in speech signals using time-varying quasi-closed-phase (TVQCP) analysis. Conventional formant tracking methods typically adopt a two-stage estimate-and-track strategy wherein an initial set of formant candidates are estimated using short-time analysis (e.g., 10--50 ms), followed by a tracking stage based on dynamic programming or a linear state-space model. One of the main disadvantages of these approaches is that the tracking stage, however good it may be, cannot improve upon the formant estimation accuracy of the first stage. The proposed TVQCP method provides a single-stage formant tracking that combines the estimation and tracking stages into one. TVQCP analysis combines three approaches to improve formant estimation and tracking: (1) it uses temporally weighted quasi-closed-phase analysis to derive closed-phase estimates of the vocal tract with reduced interference from the excitation source, (2) it increases the residual sparsity by using the

L_1

optimization and (3) it uses time-varying linear prediction analysis over long time windows (e.g., 100--200 ms) to impose a continuity constraint on the vocal tract model and hence on the formant trajectories. Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner). Matlab scripts for the proposed method can be found at: https://github.com/njaygowda/ftrac

arXiv.org e-Print Archive

Shift Estimation Algorithm for Dynamic Sensors With Frame-to-Frame Variation in Their Spectral Response

Author: Giles Todd M.
Hayat Majeed M.
Krishna Sanjay
Publication venue: e-Publications@Marquette
Publication date: 01/03/2010
Field of study

This study is motivated by the emergence of a new class of tunable infrared spectral-imaging sensors that offer the ability to dynamically vary the sensor\u27s intrinsic spectral response from frame to frame in an electronically controlled fashion. A manifestation of this is when a sequence of dissimilar spectral responses is periodically realized, whereby in every period of acquired imagery, each frame is associated with a distinct spectral band. Traditional scene-based global shift estimation algorithms are not applicable to such spectrally heterogeneous video sequences, as a pixel value may change from frame to frame as a result of both global motion and varying spectral response. In this paper, a novel algorithm is proposed and examined to fuse a series of coarse global shift estimates between periodically sampled pairs of nonadjacent frames to estimate motion between consecutive frames; each pair corresponds to two nonadjacent frames of the same spectral band. The proposed algorithm outperforms three alternative methods, with the average error being one half of that obtained by using an equal weights version of the proposed algorithm, one-fourth of that obtained by using a simple linear interpolation method, and one-twentieth of that obtained by using a nai¿ve correlation-based direct method

epublications@Marquette