480 research outputs found

    Adaptive framing based similarity measurement between time warped speech signals using Kalman filter

    Get PDF
    Similarity measurement between speech signals aims at calculating the degree of similarity using acoustic features that has been receiving much interest due to the processing of large volume of multimedia information. However, dynamic properties of speech signals such as varying silence segments and time warping factor make it more challenging to measure the similarity between speech signals. This manuscript entails further extension of our research towards the adaptive framing based similarity measurement between speech signals using a Kalman filter. Silence removal is enhanced by integrating multiple features for voiced and unvoiced speech segments detection. The adaptive frame size measurement is improved by using the acceleration/deceleration phenomenon of object linear motion. A dominate feature set is used to represent the speech signals along with the pre-calculated model parameters that are set by the offline tuning of a Kalman filter. Performance is evaluated using additional datasets to evaluate the impact of the proposed model and silence removal approach on the time warped speech similarity measurement. Detailed statistical results are achieved indicating the overall accuracy improvement from 91 to 98% that proves the superiority of the extended approach on our previous research work towards the time warped continuous speech similarity measurement

    Frequency-warped autoregressive modeling and filtering

    Get PDF
    This thesis consists of an introduction and nine articles. The articles are related to the application of frequency-warping techniques to audio signal processing, and in particular, predictive coding of wideband audio signals. The introduction reviews the literature and summarizes the results of the articles. Frequency-warping, or simply warping techniques are based on a modification of a conventional signal processing system so that the inherent frequency representation in the system is changed. It is demonstrated that this may be done for basically all traditional signal processing algorithms. In audio applications it is beneficial to modify the system so that the new frequency representation is close to that of human hearing. One of the articles is a tutorial paper on the use of warping techniques in audio applications. Majority of the articles studies warped linear prediction, WLP, and its use in wideband audio coding. It is proposed that warped linear prediction would be particularly attractive method for low-delay wideband audio coding. Warping techniques are also applied to various modifications of classical linear predictive coding techniques. This was made possible partly by the introduction of a class of new implementation techniques for recursive filters in one of the articles. The proposed implementation algorithm for recursive filters having delay-free loops is a generic technique. This inspired to write an article which introduces a generalized warped linear predictive coding scheme. One example of the generalized approach is a linear predictive algorithm using almost logarithmic frequency representation.reviewe

    Multiresolution techniques for audio signal restoration

    Get PDF
    This thesis describes a study of techniques for the restoration of musical audio signals using a multiresolution signal representation called the multiresolution Fourier transform (MFT), a time-frequency-scale representation. This representation allows the restoration to adapt to the local signal structure, which typically consists of a set of approximately sinusoidal partials, each consisting of an “onset” of rapid energy variation followed by more slowly varying “sustain” and “decay” phases. It must be decided what components of a noisy audio signal are to be kept in the restored version and, conversely, which must be removed. A simple filter is introduced that retains only musical signal —that is signal which adheres to the musical model — and rejects everything else. It is shown that this filter used in conjunction with the MIT has a low computational complexity. The MIT is used to capture the transient energy present at the onset of notes by splitting the time axis of a musical signal into steady-state and transient zones using a simple onset detector, which measures the expected energy at a given lime against the actual energy present. Past audio signal restoration systems have relied on estimating a restored audio signal’s spectrum from the noisy audio signal presented to the algorithm. In this thesis the idea of having more than one version of a recording is used in order to gain further information about the ideal spectrum of the noisy signal. This poses a number of problems with regards to matching the time scales of two versions of the same piece. These are addressed and solutions are offered, based on a novel multiresolution warping algorithm. Finally, various methods for using the detected signal spectrum of a clean modern signal to restore a noisy signal using the warping techniques and musical event detection filters are shown. These account for variations in scale and input signal to noise ratio (SNR) in the noisy signal. It is also shown how the simple adaptive filter introduced earlier can be used to restore audio signals with impulse noise as well as while additive noise. This filter and the time-warping technique is compared to adaptive Wiener filtering as an audio restoration method

    Multiframe Super-Resolution of Color Image Sequences Using a Global Motion Model

    Get PDF
    The development of efficient software tools capable of super- resolving multi-spectral image sequences on-the-fly is an important step toward the production of imaging systems capable of acquiring vital imagery of hostile environments at an affordable price. A number of image processing tools already available for use in target recognition and identification rely on the availability of high-resolution imagery which cannot be safely acquired at a reasonable price. This thesis investigates the use of multiframe super-resolution as a tool to increase the spatial resolution of image sequences acquired with sensors commonly used in consumer video cameras. Multiframe super-resolution is the branch of imaging science which tries to restore high-resolution estimates of a scene utilizing a sequence of under-sampled images of that scene. Although a number of algorithms have already been developed to deal with this problem, they have unfortunately not been extended to deal with multi-spectral images acquired from moving imaging platforms. This thesis performs such extension for one of the most successful super-resolution algorithm and demonstrates that it can be used to improve the performance of common multi-spectral imaging systems utilizing Color Filter Arrays to acquire spectral data

    ECG Biometric Authentication: A Comparative Analysis

    Get PDF
    Robust authentication and identification methods become an indispensable urgent task to protect the integrity of the devices and the sensitive data. Passwords have provided access control and authentication, but have shown their inherent vulnerabilities. The speed and convenience factor are what makes biometrics the ideal authentication solution as they could have a low probability of circumvention. To overcome the limitations of the traditional biometric systems, electrocardiogram (ECG) has received the most attention from the biometrics community due to the highly individualized nature of the ECG signals and the fact that they are ubiquitous and difficult to counterfeit. However, one of the main challenges in ECG-based biometric development is the lack of large ECG databases. In this paper, we contribute to creating a new large gallery off-the-person ECG datasets that can provide new opportunities for the ECG biometric research community. We explore the impact of filtering type, segmentation, feature extraction, and health status on ECG biometric by using the evaluation metrics. Our results have shown that our ECG biometric authentication outperforms existing methods lacking the ability to efficiently extract features, filtering, segmentation, and matching. This is evident by obtaining 100% accuracy for PTB, MIT-BHI, CEBSDB, CYBHI, ECG-ID, and in-house ECG-BG database in spite of noisy, unhealthy ECG signals while performing five-fold cross-validation. In addition, an average of 2.11% EER among 1,694 subjects is obtained

    Computerized Evaluatution of Microsurgery Skills Training

    Get PDF
    The style of imparting medical training has evolved, over the years. The traditional methods of teaching and practicing basic surgical skills under apprenticeship model, no longer occupy the first place in modern technically demanding advanced surgical disciplines like neurosurgery. Furthermore, the legal and ethical concerns for patient safety as well as cost-effectiveness have forced neurosurgeons to master the necessary microsurgical techniques to accomplish desired results. This has lead to increased emphasis on assessment of clinical and surgical techniques of the neurosurgeons. However, the subjective assessment of microsurgical techniques like micro-suturing under the apprenticeship model cannot be completely unbiased. A few initiatives using computer-based techniques, have been made to introduce objective evaluation of surgical skills. This thesis presents a novel approach involving computerized evaluation of different components of micro-suturing techniques, to eliminate the bias of subjective assessment. The work involved acquisition of cine clips of micro-suturing activity on synthetic material. Image processing and computer vision based techniques were then applied to these videos to assess different characteristics of micro-suturing viz. speed, dexterity and effectualness. In parallel subjective grading on these was done by a senior neurosurgeon. Further correlation and comparative study of both the assessments was done to analyze the efficacy of objective and subjective evaluation

    Speech recognition using linear dynamic models.

    Get PDF
    The majority of automatic speech recognition (ASR) systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results

    Weighted Time Warping for Temporal Segmentation of Multi-Parameter Physiological Signals

    Get PDF
    We present a novel approach to segmenting a quasiperiodic multi-parameter physiological signal in the presence of noise and transient corruption. We use Weighted Time Warping (WTW), to combine the partially correlated signals. We then use the relationship between the channels and the repetitive morphology of the time series to partition it into quasiperiodic units by matching it against a constantly evolving template. The method can accurately segment a multi-parameter signal, even when all the individual channels are so corrupted that they cannot be individually segmented. Experiments carried out on MIMIC, a multi-parameter physiological dataset recorded on ICU patients, demonstrate the effectiveness of the method. Our method performs as well as a widely used QRS detector on clean raw data, and outperforms it on corrupted data. Under additive noise at SNR 0 dB the average errors were 5:81 ms for our method and 303:48 ms for the QRS detector. Under transient corruption they were 2:89 ms and 387:32 ms respectively
    corecore