494 research outputs found

    Robust ASR using Support Vector Machines

    Get PDF
    The improved theoretical properties of Support Vector Machines with respect to other machine learning alternatives due to their max-margin training paradigm have led us to suggest them as a good technique for robust speech recognition. However, important shortcomings have had to be circumvented, the most important being the normalisation of the time duration of different realisations of the acoustic speech units. In this paper, we have compared two approaches in noisy environments: first, a hybrid HMM–SVM solution where a fixed number of frames is selected by means of an HMM segmentation and second, a normalisation kernel called Dynamic Time Alignment Kernel (DTAK) first introduced in Shimodaira et al. [Shimodaira, H., Noma, K., Nakai, M., Sagayama, S., 2001. Support vector machine with dynamic time-alignment kernel for speech recognition. In: Proc. Eurospeech, Aalborg, Denmark, pp. 1841–1844] and based on DTW (Dynamic Time Warping). Special attention has been paid to the adaptation of both alternatives to noisy environments, comparing two types of parameterisations and performing suitable feature normalisation operations. The results show that the DTA Kernel provides important advantages over the baseline HMM system in medium to bad noise conditions, also outperforming the results of the hybrid system.Publicad

    The Geometry of Nonlinear Embeddings in Kernel Discriminant Analysis

    Full text link
    Fisher's linear discriminant analysis is a classical method for classification, yet it is limited to capturing linear features only. Kernel discriminant analysis as an extension is known to successfully alleviate the limitation through a nonlinear feature mapping. We study the geometry of nonlinear embeddings in discriminant analysis with polynomial kernels and Gaussian kernel by identifying the population-level discriminant function that depends on the data distribution and the kernel. In order to obtain the discriminant function, we solve a generalized eigenvalue problem with between-class and within-class covariance operators. The polynomial discriminants are shown to capture the class difference through the population moments explicitly. For approximation of the Gaussian discriminant, we use a particular representation of the Gaussian kernel by utilizing the exponential generating function for Hermite polynomials. We also show that the Gaussian discriminant can be approximated using randomized projections of the data. Our results illuminate how the data distribution and the kernel interact in determination of the nonlinear embedding for discrimination, and provide a guideline for choice of the kernel and its parameters

    Doctor of Philosophy

    Get PDF
    dissertationThe goal of machine learning is to develop efficient algorithms that use training data to create models that generalize well to unseen data. Learning algorithms can use labeled data, unlabeled data or both. Supervised learning algorithms learn a model using labeled data only. Unsupervised learning methods learn the internal structure of a dataset using only unlabeled data. Lastly, semisupervised learning is the task of finding a model using both labeled and unlabeled data. In this research work, we contribute to both supervised and semisupervised learning. We contribute to supervised learning by proposing an efficient high-dimensional space coverage scheme which is based on the disjunctive normal form. We use conjunctions of a set of half-spaces to create a set of convex polytopes. Disjunction of these polytopes can provide desirable coverage of space. Unlike traditional methods based on neural networks, we do not initialize the model parameters randomly. As a result, our model minimizes the risk of poor local minima and higher learning rates can be used which leads to faster convergence. We contribute to semisupervised learning by proposing 2 unsupervised loss functions that form the basis of a novel semisupervised learning method. The first loss function is called Mutual-Exclusivity. The motivation of this method is the observation that an optimal decision boundary lies between the manifolds of different classes where there are no or very few samples. Decision boundaries can be pushed away from training samples by maximizing their margin and it is not necessary to know the class labels of the samples to maximize the margin. The second loss is named Transformation/Stability and is based on the fact that the prediction of a classifier for a data sample should not change with respect to transformations and perturbations applied to that data sample. In addition, internal variations of a learning system should have little to no effect on the output. The proposed loss minimizes the variation in the prediction of the network for a specific data sample. We also show that the same technique can be used to improve the robustness of a learning model with respect to adversarial examples

    When Kernel Methods meet Feature Learning: Log-Covariance Network for Action Recognition from Skeletal Data

    Full text link
    Human action recognition from skeletal data is a hot research topic and important in many open domain applications of computer vision, thanks to recently introduced 3D sensors. In the literature, naive methods simply transfer off-the-shelf techniques from video to the skeletal representation. However, the current state-of-the-art is contended between to different paradigms: kernel-based methods and feature learning with (recurrent) neural networks. Both approaches show strong performances, yet they exhibit heavy, but complementary, drawbacks. Motivated by this fact, our work aims at combining together the best of the two paradigms, by proposing an approach where a shallow network is fed with a covariance representation. Our intuition is that, as long as the dynamics is effectively modeled, there is no need for the classification network to be deep nor recurrent in order to score favorably. We validate this hypothesis in a broad experimental analysis over 6 publicly available datasets.Comment: 2017 IEEE Computer Vision and Pattern Recognition (CVPR) Workshop

    What is the Machine Learning?

    Full text link
    Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables -- aided by physical intuition -- that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable's discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.Comment: 6 pages, 3 figures. Version published in PRD, discussion adde

    Estimation of Relevant Variables on High-Dimensional Biological Patterns Using Iterated Weighted Kernel Functions

    Get PDF
    BACKGROUND The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. METHODOLOGY/PRINCIPAL FINDINGS Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. CONCLUSIONS Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation
    • …
    corecore