Search CORE

5,290 research outputs found

Methods for Focal Plane Array Resolution Estimation Using Random Laser Speckle in Non-paraxial Geometries

Author: Plummer Phillip J.
Publication venue: AFIT Scholar
Publication date: 01/06/2022
Field of study

The infrared (IR) imaging community has a need for direct IR detector evaluation due to the continued demand for small pixel pitch detectors, the emergence of strained-layer-super-lattice devices, and the associated lateral carrier diffusion issues. Conventional laser speckle-based modulation transfer function (MTF) estimation is dependent on Fresnel propagation and a wide-sense-stationary input random process, limiting the use of this approach for lambda (wavelength)-scale IR devices. This dissertation develops two alternative methodologies for speckle-based resolution evaluation of IR focal plane arrays (FPAs). Both techniques are formulated using Rayleigh-Sommerfield electric field propagation, making them valid in the non-paraxial geometries dictated for resolution estimation of lambda-scale devices. The generalized FPA MTF estimation approach numerically evaluates Rayleigh-Sommerfeld speckle irradiance autocorrelation functions (ACFs) to indirectly compute the power spectral density (PSD) of a non-wide-sense-stationary (WSS) speckle irradiance random process. The experimental error incurred by making WSS assumptions regarding the associated laser speckle random process is quantified utilizing the Wigner distribution function. This method is experimentally demonstrated on a lambda-scale longwave IR FPA, showing a 27% spatial frequency range improvement over established estimation methodology. Additionally, a resolution estimation approach, which utilizes an iterative maximum likelihood estimation approach and speckle irradiance ACFs to solve for a system impulse response, is developed and demonstrated with simulated speckle imagery

AFTI Scholar (Air Force Institute of Technology)

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering

Author: Alku P.
Nurminen J.
Pulakka H.
Raitio T.
Suni A.
Vainio M.
Yamagishi J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Edinburgh Research Explorer

A versatile pitch tracking algorithm : from human speech to killer whale vocalizations

Author: Shapiro Ari D.
Wang Chao
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2009
Field of study

Author Posting. © Acoustical Society of America, 2009. This article is posted here by permission of Acoustical Society of America for personal use, not for redistribution. The definitive version was published in Journal of the Acoustical Society of America 126 (2009): 451-459, doi:10.1121/1.3132525.In this article, a pitch tracking algorithm [named discrete logarithmic Fourier transformation-pitch detection algorithm (DLFT-PDA)], originally designed for human telephone speech, was modified for killer whale vocalizations. The multiple frequency components of some of these vocalizations demand a spectral (rather than temporal) approach to pitch tracking. The DLFT-PDA algorithm derives reliable estimations of pitch and the temporal change of pitch from the harmonic structure of the vocal signal. Scores from both estimations are combined in a dynamic programming search to find a smooth pitch track. The algorithm is capable of tracking killer whale calls that contain simultaneous low and high frequency components and compares favorably across most signal to noise ratio ranges to the peak-picking and sidewinder algorithms that have been used for tracking killer whale vocalizations previously.C.W. was supported by DARPA under Contract No. N66001-96-C-8526, monitored through Naval Command, Control, and Ocean Surveillance Center and by the National Science Foundation under Grant No. IRI-9618731. A.D.S. was supported by a National Defense Science and Engineering Graduate Fellowship

Crossref

Woods Hole Open Access Server

Recommended from our members

Modelling and extraction of fundamental frequency in speech signals

Author: Pawi Alipah
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method

Brunel University Research Archive

Informed algorithms for sound source separation in enclosed reverberant environments

Author: Muhammad Salman Khan (7202543)
Publication venue
Publication date: 01/01/2013
Field of study

While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

Loughborough University Institutional Repository