480 research outputs found
Recommended from our members
A Novel Approach for Continuous Speech Tracking and Dynamic Time Warping. Adaptive Framing Based Continuous Speech Similarity Measure and Dynamic Time Warping using Kalman Filter and Dynamic State Model
Dynamic speech properties such as time warping, silence removal and background noise interference are the most challenging issues in continuous speech signal matching. Among all of them, the time warped speech signal matching is of great interest and has been a tough challenge for the researchers. An adaptive framing based continuous speech tracking and similarity measurement approach is introduced in this work following a comprehensive research conducted in the diverse areas of speech processing. A dynamic state model is introduced based on system of linear motion equations which models the input (test) speech signal frame as a unidirectional moving object along the template speech signal. The most similar corresponding frame position in the template speech is estimated which is fused with a feature based similarity observation and the noise variances using a Kalman filter. The Kalman filter provides the final estimated frame position in the template speech at current time which is further used for prediction of a new frame size for the next step. In addition, a keyword spotting approach is proposed by introducing wavelet decomposition based dynamic noise filter and combination of beliefs. The Dempster’s theory of belief combination is deployed for the first time in relation to keyword spotting task. Performances for both; speech tracking and keyword spotting approaches are evaluated using the statistical metrics and gold standards for the binary classification. Experimental results proved the superiority of the proposed approaches over the existing methods.The appendices files are not available online
Adaptive framing based similarity measurement between time warped speech signals using Kalman filter
Similarity measurement between speech signals aims at calculating the degree of similarity using acoustic features that has been receiving much interest due to the processing of large volume of multimedia information. However, dynamic properties of speech signals such as varying silence segments and time warping factor make it more challenging to measure the similarity between speech signals. This manuscript entails further extension of our research towards the adaptive framing based similarity measurement between speech signals using a Kalman filter. Silence removal is enhanced by integrating multiple features for voiced and unvoiced speech segments detection. The adaptive frame size measurement is improved by using the acceleration/deceleration phenomenon of object linear motion. A dominate feature set is used to represent the speech signals along with the pre-calculated model parameters that are set by the offline tuning of a Kalman filter. Performance is evaluated using additional datasets to evaluate the impact of the proposed model and silence removal approach on the time warped speech similarity measurement. Detailed statistical results are achieved indicating the overall accuracy improvement from 91 to 98% that proves the superiority of the extended approach on our previous research work towards the time warped continuous speech similarity measurement
Frequency-warped autoregressive modeling and filtering
This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles.
Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications.
Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe
Multiresolution techniques for audio signal restoration
This thesis describes a study of techniques for the restoration of musical audio signals using a multiresolution signal representation called the multiresolution Fourier transform (MFT), a time-frequency-scale representation. This representation allows the restoration to adapt to the local signal structure, which typically consists of a set of approximately sinusoidal partials, each consisting of an “onset” of rapid energy variation followed by more slowly varying “sustain” and “decay” phases.
It must be decided what components of a noisy audio signal are to be kept in the restored version and, conversely, which must be removed. A simple filter is introduced that retains only musical signal —that is signal which adheres to the musical model — and rejects everything else. It is shown that this filter used in conjunction with the MIT has a low computational complexity. The MIT is used to capture the transient energy present at the onset of notes by splitting the time axis of a musical signal into steady-state and transient zones using a simple onset detector, which measures the expected energy at a given lime against the actual energy present.
Past audio signal restoration systems have relied on estimating a restored audio signal’s spectrum from the noisy audio signal presented to the algorithm. In this thesis the idea of having more than one version of a recording is used in order to gain further information about the ideal spectrum of the noisy signal. This poses a number of problems with regards to matching the time scales of two versions of the same piece. These are addressed and solutions are offered, based on a novel multiresolution warping algorithm.
Finally, various methods for using the detected signal spectrum of a clean modern signal to restore a noisy signal using the warping techniques and musical event detection filters are shown. These account for variations in scale and input signal to noise ratio (SNR) in the noisy signal. It is also shown how the simple adaptive filter introduced earlier can be used to restore audio signals with impulse noise as well as while additive noise. This filter and the time-warping technique is compared to adaptive Wiener filtering as an audio restoration method
Multiframe Super-Resolution of Color Image Sequences Using a Global Motion Model
The development of efficient software tools capable of super- resolving multi-spectral image sequences on-the-fly is an important step toward the production of imaging systems capable of acquiring vital imagery of hostile environments at an affordable price. A number of image processing tools already available for use in target recognition and identification rely on the availability of high-resolution imagery which cannot be safely acquired at a reasonable price. This thesis investigates the use of multiframe super-resolution as a tool to increase the spatial resolution of image sequences acquired with sensors commonly used in consumer video cameras. Multiframe super-resolution is the branch of imaging science which tries to restore high-resolution estimates of a scene utilizing a sequence of under-sampled images of that scene. Although a number of algorithms have already been developed to deal with this problem, they have unfortunately not been extended to deal with multi-spectral images acquired from moving imaging platforms. This thesis performs such extension for one of the most successful super-resolution algorithm and demonstrates that it can be used to improve the performance of common multi-spectral imaging systems utilizing Color Filter Arrays to acquire spectral data
ECG Biometric Authentication: A Comparative Analysis
Robust authentication and identification methods become an indispensable urgent task to protect the integrity of the devices and the sensitive data. Passwords have provided access control and authentication, but have shown their inherent vulnerabilities. The speed and convenience factor are what makes biometrics the ideal authentication solution as they could have a low probability of circumvention. To overcome the limitations of the traditional biometric systems, electrocardiogram (ECG) has received the most attention from the biometrics community due to the highly individualized nature of the ECG signals and the fact that they are ubiquitous and difficult to counterfeit. However, one of the main challenges in ECG-based biometric development is the lack of large ECG databases. In this paper, we contribute to creating a new large gallery off-the-person ECG datasets that can provide new opportunities for the ECG biometric research community. We explore the impact of filtering type, segmentation, feature extraction, and health status on ECG biometric by using the evaluation metrics. Our results have shown that our ECG biometric authentication outperforms existing methods lacking the ability to efficiently extract features, filtering, segmentation, and matching. This is evident by obtaining 100% accuracy for PTB, MIT-BHI, CEBSDB, CYBHI, ECG-ID, and in-house ECG-BG database in spite of noisy, unhealthy ECG signals while performing five-fold cross-validation. In addition, an average of 2.11% EER among 1,694 subjects is obtained
Computerized Evaluatution of Microsurgery Skills Training
The style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead to increased emphasis on assessment of clinical and surgical techniques of the neurosurgeons. However, the subjective assessment of microsurgical techniques like micro-suturing under the apprenticeship model cannot be completely unbiased. A few initiatives using computer-based techniques, have been made to introduce objective evaluation of surgical skills. This thesis presents a novel approach involving computerized evaluation of different components of micro-suturing techniques, to eliminate the bias of subjective assessment. The work involved acquisition of cine clips of micro-suturing activity on synthetic material. Image processing and computer vision based techniques were then applied to these videos to assess different characteristics of micro-suturing viz. speed, dexterity and effectualness. In parallel subjective grading on these was done by a senior neurosurgeon. Further correlation and comparative study of both the assessments was done to analyze the efficacy of objective and subjective evaluation
Speech recognition using linear dynamic models.
The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results
Weighted Time Warping for Temporal Segmentation of Multi-Parameter Physiological Signals
We present a novel approach to segmenting a quasiperiodic multi-parameter physiological signal in the presence of noise and transient corruption. We use Weighted Time Warping (WTW), to combine the partially correlated signals. We then use the relationship between the channels and the repetitive morphology of the time series to partition it into quasiperiodic units by matching it against a constantly evolving template. The method can accurately segment a multi-parameter signal, even when all the individual channels are so corrupted that they cannot be individually segmented. Experiments carried out on MIMIC, a multi-parameter physiological dataset recorded on ICU patients, demonstrate the effectiveness of the method. Our method performs as well as a widely used QRS detector on clean raw data, and outperforms it on corrupted data. Under additive noise at SNR 0 dB the average errors were 5:81 ms for our method and 303:48 ms for the QRS detector. Under transient corruption they were 2:89 ms and 387:32 ms respectively
- …