3 research outputs found

    Purging of silence for robust speaker identification in colossal database

    Get PDF
    The aim of this work is to develop an effective speaker recognition system under noisy environments for large data sets. The important phases involved in typical identification systems are feature extraction, training and testing. During the feature extraction phase, the speaker-specific information is processed based on the characteristics of the voice signal. Effective methods have been proposed for the silence removal in order to achieve accurate recognition under noisy environments in this work. Pitch and Pitch-strength parameters are extracted as distinct features from the input speech spectrum. Multi-linear principle component analysis (MPCA) is is utilized to minimize the complexity of the parameter matrix. Silence removal using zero crossing rate (ZCR) and endpoint detection algorithm (EDA) methods are applied on the source utterance during the feature extraction phase. These features are useful in later classification phase, where the identification is made on the basis of support vector machine (SVM) algorithms. Forward loking schostic (FOLOS) is the efficient large-scale SVM algorithm that has been employed for the effective classification among speakers. The evaluation findings indicate that the methods suggested increase the performance for large amounts of data in noise ecosystems

    Fast Polynomial Kernel Classification for Massive Data

    Full text link
    In the era of big data, it is highly desired to develop efficient machine learning algorithms to tackle massive data challenges such as storage bottleneck, algorithmic scalability, and interpretability. In this paper, we develop a novel efficient classification algorithm, called fast polynomial kernel classification (FPC), to conquer the scalability and storage challenges. Our main tools are a suitable selected feature mapping based on polynomial kernels and an alternating direction method of multipliers (ADMM) algorithm for a related non-smooth convex optimization problem. Fast learning rates as well as feasibility verifications including the convergence of ADMM and the selection of center points are established to justify theoretical behaviors of FPC. Our theoretical assertions are verified by a series of simulations and real data applications. The numerical results demonstrate that FPC significantly reduces the computational burden and storage memory of the existing learning schemes such as support vector machines and boosting, without sacrificing their generalization abilities much.Comment: arXiv admin note: text overlap with arXiv:1402.4735 by other author

    Instance selection with threshold clustering for support vector machines

    Get PDF
    Doctor of PhilosophyDepartment of StatisticsMichael J HigginsTremendous advances in computing power have allowed the size of datasets to grow massively. Many machine learning approaches have been developed to deal with massive data by reducing the number of features, observations, or both. Instance selection (IS) is a data mining process that relies on scaling down the number of observations of a dataset. In this research, we focus on IS methods that rely on clustering algorithms, particularly, on threshold clustering (TC). TC is a recent efficient clustering method. Given a fixed size threshold t*, TC forms clusters of t* or more units while ensuring that the maximum within- cluster dissimilarity is small. Unlike most traditional clustering methods, TC is designed to form many small clusters of units, making it ideal for IS. Support vector machines (SVM) is a powerful method for classification. However, train- ing SVM may be computationally infeasible for large datasets—training SVM requires O(N3) runtime, where N is size of the training data. In this dissertation, we propose a method for IS for training SVM under big data settings called support vector machines with threshold clustering (SVMTC). Our proposed method begins by clustering each class in the training set separately using TC. Then, centroids of all clusters are formed the re- duced set. If the data reduction is insufficient, TC may be repeated. SVM is then applied on the reduced dataset. In this way, our proposed method can reduce the training set for SVM by factor (t*)^r or more, where r is the number of iterations of TC, dramatically reducing the runtime required to train SVM. Furthermore, we prove under the Gaussian radial basis kernel, that the maximum distance between the Gram matrix for the original data—which is used to find support vectors—and the Gram matrix for the reduced data is bounded by a function of the maximum within-cluster distance for TC. Then, we show, via simulation and application to datasets, that SVMTC efficiently reduces the size of training sets with- out sacrificing the prediction accuracy of SVM. Moreover, it often outperforms competing methods for IS in terms of the runtime, memory usage, and prediction accuracy. Next, we explore best practices for applying feature reduction methods for SVMTC when the number of features is large. We investigate the usefulness of various feature selection and feature extraction methods, including principal component analysis (PCA), linear discriminant analysis (LDA), LASSO, and Fisher Scores, as an initial step of SVMTC. For feature reduction methods that select a linear combination of the original features— for example, PCA—we also investigate forming prototypes using the original features or the transformed features. We compare, via application to datasets, the performance of SVMTC under feature reduction methods. We find that LASSO tends to be an effective feature selection method, and overall, show that SVMTC is improved significantly under the proposed methods. Finally, we perform a comparative study of iterative threshold instance selection (ITIS) and other IS methods. ITIS is a recent extension method of TC that is used as IS. We use simulation to compare between ITIS and competing methods. The results illustrate that ITIS is effective in massive data settings when compared against other instance selection methods like k-means and its variations. In addition, we demonstrate the efficacy of hybrid clustering algorithms that utilize ITIS as an initial step, and show via simulation study that these methods outperform other hybrid clustering methods in terms of runtime and memory without sacrificing performance
    corecore