2,438 research outputs found

    Speaker segmentation and clustering

    Get PDF
    This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

    Joint segmentation of color and depth data based on splitting and merging driven by surface fitting

    Get PDF
    This paper proposes a segmentation scheme based on the joint usage of color and depth data together with a 3D surface estimation scheme. Firstly a set of multi-dimensional vectors is built from color, geometry and surface orientation information. Normalized cuts spectral clustering is then applied in order to recursively segment the scene in two parts thus obtaining an over-segmentation. This procedure is followed by a recursive merging stage where close segments belonging to the same object are joined together. At each step of both procedures a NURBS model is fitted on the computed segments and the accuracy of the fitting is used as a measure of the plausibility that a segment represents a single surface or object. By comparing the accuracy to the one at the previous step, it is possible to determine if each splitting or merging operation leads to a better scene representation and consequently whether to perform it or not. Experimental results show how the proposed method provides an accurate and reliable segmentation

    SVMs for Automatic Speech Recognition: a Survey

    Get PDF
    Hidden Markov Models (HMMs) are, undoubtedly, the most employed core technique for Automatic Speech Recognition (ASR). Nevertheless, we are still far from achieving high-performance ASR systems. Some alternative approaches, most of them based on Artificial Neural Networks (ANNs), were proposed during the late eighties and early nineties. Some of them tackled the ASR problem using predictive ANNs, while others proposed hybrid HMM/ANN systems. However, despite some achievements, nowadays, the preponderance of Markov Models is a fact. During the last decade, however, a new tool appeared in the field of machine learning that has proved to be able to cope with hard classification problems in several fields of application: the Support Vector Machines (SVMs). The SVMs are effective discriminative classifiers with several outstanding characteristics, namely: their solution is that with maximum margin; they are capable to deal with samples of a very higher dimensionality; and their convergence to the minimum of the associated cost function is guaranteed. These characteristics have made SVMs very popular and successful. In this chapter we discuss their strengths and weakness in the ASR context and make a review of the current state-of-the-art techniques. We organize the contributions in two parts: isolated-word recognition and continuous speech recognition. Within the first part we review several techniques to produce the fixed-dimension vectors needed for original SVMs. Afterwards we explore more sophisticated techniques based on the use of kernels capable to deal with sequences of different length. Among them is the DTAK kernel, simple and effective, which rescues an old technique of speech recognition: Dynamic Time Warping (DTW). Within the second part, we describe some recent approaches to tackle more complex tasks like connected digit recognition or continuous speech recognition using SVMs. Finally we draw some conclusions and outline several ongoing lines of research

    Software and hardware implementation techniques for digital communications-related algorithms

    Get PDF
    There are essentially three areas addressed in the body of this thesis. (a) The first is a theoretical investigation into the design and development of a practically realizable implementation of a maximum-likelihood detection process to deal with digital data transmission over HF radio links. These links exhibit multipath properties with delay spreads that can easily extend over 12 to 15 milliseconds. The project was sponsored by the Ministry of Defence through the auspices of the Science and Engineering Research Council. The primary objective was to transmit voice band data at a minimum rate of 2.4 kb/s continuously for long periods of time during the day or night. Computer simulation models of HF propagation channels were created to simulate atmospheric and multipath effects of transmission from London to Washington DC, Ankara, and as far as Melbourne, Australia. Investigations into HF channel estimation are not the subject of this thesis. The detection process assumed accurate knowledge of the channel. [Continues.

    Support Vector Machines for Speech Recognition

    Get PDF
    Hidden Markov models (HMM) with Gaussian mixture observation densities are the dominant approach in speech recognition. These systems typically use a representational model for acoustic modeling which can often be prone to overfitting and does not translate to improved discrimination. We propose a new paradigm centered on principles of structural risk minimization using a discriminative framework for speech recognition based on support vector machines (SVMs). SVMs have the ability to simultaneously optimize the representational and discriminative ability of the acoustic classifiers. We have developed the first SVM-based large vocabulary speech recognition system that improves performance over traditional HMM-based systems. This hybrid system achieves a state-of-the-art word error rate of 10.6% on a continuous alphadigit task ? a 10% improvement relative to an HMM system. On SWITCHBOARD, a large vocabulary task, the system improves performance over a traditional HMM system from 41.6% word error rate to 40.6%. This dissertation discusses several practical issues that arise when SVMs are incorporated into the hybrid system

    A New Regularized Adaptive Windowed Lomb Periodogram for Time-Frequency Analysis of Nonstationary Signals With Impulsive Components

    Get PDF
    This paper proposes a new class of windowed Lomb periodogram (WLP) for time-frequency analysis of nonstationary signals, which may contain impulsive components and may be nonuniformly sampled. The proposed methods significantly extend the conventional Lomb periodogram in two aspects: 1) The nonstationarity problem is addressed by employing the weighted least squares (WLS) to estimate locally the time-varying periodogram and an intersection of confidence interval technique to adaptively select the window sizes of WLS in the time-frequency domain. This yields an adaptive WLP (AWLP) having a better tradeoff between time resolution and frequency resolution. 2) A more general regularized maximum-likelihood-type (M-) estimator is used instead of the LS estimator in estimating the AWLP. This yields a novel M-estimation-based regularized AWLP method which is capable of reducing estimation variance, accentuating predominant time-frequency components, restraining adverse influence of impulsive components, and separating impulsive components. Simulation results were conducted to illustrate the advantages of the proposed method over the conventional Lomb periodogram in adaptive time-frequency resolution, sparse representation for sinusoids, robustness to impulsive components, and applicability to nonuniformly sampled data. Moreover, as the computation of the proposed method at each time sample and frequency is independent of others, parallel computing can be conveniently employed without much difficulty to significantly reduce the computational time of our proposed method for real-time applications. The proposed method is expected to find a wide range of applications in instrumentation and measurement and related areas. Its potential applications to power quality analysis and speech signal analysis are also discussed and demonstrated.published_or_final_versio
    • …
    corecore