2,153 research outputs found
Recommended from our members
Modelling and extraction of fundamental frequency in speech signals
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.One of the most important parameters of speech is the fundamental frequency of vibration of voiced sounds. The audio sensation of the fundamental frequency is known as the pitch. Depending on the tonal/non-tonal category of language, the fundamental frequency conveys intonation, pragmatics and meaning. In addition the fundamental frequency and intonation carry speaker gender, age, identity, speaking style and emotional state. Accurate estimation of the fundamental frequency is critically important for functioning of speech processing applications such as speech coding, speech recognition, speech synthesis and voice morphing. This thesis makes contributions to the development of accurate pitch estimation research in three distinct ways: (1) an investigation of the impact of the window length on pitch estimation error, (2) an investigation of the use of the higher order moments and (3) an investigation of an analysis-synthesis method for selection of the best pitch value among N proposed candidates. Experimental evaluations show that the length of the speech window has a major impact on the accuracy of pitch estimation. Depending on the similarity criteria and the order of the statistical moment a window length of 37 to 80 ms gives the least error. In order to avoid excessive delay as a consequence of using a longer window, a method is proposed
ii where the current short window is concatenated with the previous frames to form a longer signal window for pitch extraction. The use of second order and higher order moments, and the magnitude difference function, as the similarity criteria were explored and compared. A novel method of calculation of moments is introduced where the signal is split, i.e. rectified, into positive and negative valued samples. The moments for the positive and negative parts of the signal are computed separately and combined. The new method of calculation of moments from positive and negative parts and the higher order criteria provide competitive results. A challenging issue in pitch estimation is the determination of the best candidate from N extrema of the similarity criteria. The analysis-synthesis method proposed in this thesis selects the pitch candidate that provides the best reproduction (synthesis) of the harmonic spectrum of the original speech. The synthesis method must be such that the distortion increases with the increasing error in the estimate of the fundamental frequency. To this end a new method of spectral synthesis is proposed using an estimate of the spectral envelop and harmonically spaced asymmetric Gaussian pulses as excitation. The N-best method provides consistent reduction in pitch estimation error. The methods described in this thesis result in a significant improvement in the pitch accuracy and outperform the benchmark YIN method
Characterizing Multiple Wireless Sensor Networks for Large-Scale Radio Tomography
Radio Tomographic Imaging (RTI) is an emerging Device-Free Passive Localization (DFPL) technology that uses a collection of cheap wireless transceivers to form a Wireless Sensor Network (WSN). Unlike device-based active localization, DFPL does not require a target of interest to be wearing any kind of device. The basic concept of RTI utilizes the changes in Received Signal Strength (RSS) between the links of each transceiver to create an attenuation image of the area. This image can then be used for target detection, tracking, and localization. Each transceiver in the WSN must transmit sequentially to prevent collisions. This is not a problem when the number of transceivers in the WSN are small. However, large-scale RTI with a large number of transceivers suffer from high computational complexity, low frame rates, and physical distance limitations on the range of the transceivers. The goal of this research is to determine the applicability and characterize the feasibility of using multiple WSNs to address the limitations with a large-scale RTI network. The concept to this new variant of RTI, called Multiple-Networks RTI (mnRTI), is to divide the transceivers into multiple WSNs as opposed to using one WSN. Analytical, simulated, and experimental data are computed, collected, and compared between a RTI network with one WSN to a mnRTI network with two WSNs. The WSN(s) comprise a total of 70 wireless transceivers covering an area of no more than 19 ft x 16 ft. Simulated and experimental results are presented from a series of stationary and moving target data collection. Preliminary results demonstrate multiple WSNs can potentially provide similar or better results than the traditional RTI method with one WSN. Multiple WSNs have higher frame rates and lower computational complexity. Also, position estimation accuracy are comparable, if not better, than the traditional RTI method with one WSN
- …