14 research outputs found

    Learning the kernel with hyperkernels

    No full text
    This paper addresses the problem of choosing a kernel suitable for estimation with a support vector machine, hence further automating machine learning. This goal is achieved by defining a reproducing kernel Hilbert space on the space of kernels itself. Such a formulation leads to a statistical estimation problem similar to the problem of minimizing a regularized risk functional. We state the equivalent representer theorem for the choice of kernels and present a semidefinite programming formulation of the resulting optimization problem. Several recipes for constructing hyperkernels are provided, as well as the details of common machine learning problems. Experimental results for classification, regression and novelty detection on UCI data show the feasibility of our approach

    Hierarchic Bayesian models for kernel learning

    Get PDF
    The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method

    Classification and fusion methods for multimodal biometric authentication.

    Get PDF
    Ouyang, Hua.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 81-89).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Biometric Authentication --- p.1Chapter 1.2 --- Multimodal Biometric Authentication --- p.2Chapter 1.2.1 --- Combination of Different Biometric Traits --- p.3Chapter 1.2.2 --- Multimodal Fusion --- p.5Chapter 1.3 --- Audio-Visual Bi-modal Authentication --- p.6Chapter 1.4 --- Focus of This Research --- p.7Chapter 1.5 --- Organization of This Thesis --- p.8Chapter 2 --- Audio-Visual Bi-modal Authentication --- p.10Chapter 2.1 --- Audio-visual Authentication System --- p.10Chapter 2.1.1 --- Why Audio and Mouth? --- p.10Chapter 2.1.2 --- System Overview --- p.11Chapter 2.2 --- XM2VTS Database --- p.12Chapter 2.3 --- Visual Feature Extraction --- p.14Chapter 2.3.1 --- Locating the Mouth --- p.14Chapter 2.3.2 --- Averaged Mouth Images --- p.17Chapter 2.3.3 --- Averaged Optical Flow Images --- p.21Chapter 2.4 --- Audio Features --- p.23Chapter 2.5 --- Video Stream Classification --- p.23Chapter 2.6 --- Audio Stream Classification --- p.25Chapter 2.7 --- Simple Fusion --- p.26Chapter 3 --- Weighted Sum Rules for Multi-modal Fusion --- p.27Chapter 3.1 --- Measurement-Level Fusion --- p.27Chapter 3.2 --- Product Rule and Sum Rule --- p.28Chapter 3.2.1 --- Product Rule --- p.28Chapter 3.2.2 --- Naive Sum Rule (NS) --- p.29Chapter 3.2.3 --- Linear Weighted Sum Rule (WS) --- p.30Chapter 3.3 --- Optimal Weights Selection for WS --- p.31Chapter 3.3.1 --- Independent Case --- p.31Chapter 3.3.2 --- Identical Case --- p.33Chapter 3.4 --- Confidence Measure Based Fusion Weights --- p.35Chapter 4 --- Regularized k-Nearest Neighbor Classifier --- p.39Chapter 4.1 --- Motivations --- p.39Chapter 4.1.1 --- Conventional k-NN Classifier --- p.39Chapter 4.1.2 --- Bayesian Formulation of kNN --- p.40Chapter 4.1.3 --- Pitfalls and Drawbacks of kNN Classifiers --- p.41Chapter 4.1.4 --- Metric Learning Methods --- p.43Chapter 4.2 --- Regularized k-Nearest Neighbor Classifier --- p.46Chapter 4.2.1 --- Metric or Not Metric? --- p.46Chapter 4.2.2 --- Proposed Classifier: RkNN --- p.47Chapter 4.2.3 --- Hyperkernels and Hyper-RKHS --- p.49Chapter 4.2.4 --- Convex Optimization of RkNN --- p.52Chapter 4.2.5 --- Hyper kernel Construction --- p.53Chapter 4.2.6 --- Speeding up RkNN --- p.56Chapter 4.3 --- Experimental Evaluation --- p.57Chapter 4.3.1 --- Synthetic Data Sets --- p.57Chapter 4.3.2 --- Benchmark Data Sets --- p.64Chapter 5 --- Audio-Visual Authentication Experiments --- p.68Chapter 5.1 --- Effectiveness of Visual Features --- p.68Chapter 5.2 --- Performance of Simple Sum Rule --- p.71Chapter 5.3 --- Performances of Individual Modalities --- p.73Chapter 5.4 --- Identification Tasks Using Confidence-based Weighted Sum Rule --- p.74Chapter 5.4.1 --- Effectiveness of WS_M_C Rule --- p.75Chapter 5.4.2 --- WS_M_C v.s. WS_M --- p.76Chapter 5.5 --- Speaker Identification Using RkNN --- p.77Chapter 6 --- Conclusions and Future Work --- p.78Chapter 6.1 --- Conclusions --- p.78Chapter 6.2 --- Important Follow-up Works --- p.80Bibliography --- p.81Chapter A --- Proof of Proposition 3.1 --- p.90Chapter B --- Proof of Proposition 3.2 --- p.9

    Sparse representations in multi-kernel dictionaries for in-situ classification of underwater objects

    Get PDF
    2017 Spring.Includes bibliographical references.The performance of the kernel-based pattern classification algorithms depends highly on the selection of the kernel function and its parameters. Consequently in the recent years there has been a growing interest in machine learning algorithms to select kernel functions automatically from a predefined dictionary of kernels. In this work we develop a general mathematical framework for multi-kernel classification that makes use of sparse representation theory for automatically selecting the kernel functions and their parameters that best represent a set of training samples. We construct a dictionary of different kernel functions with different parametrizations. Using a sparse approximation algorithm, we represent the ideal score of each training sample as a sparse linear combination of the kernel functions in the dictionary evaluated at all training samples. Moreover, we incorporate the high-level operator's concepts into the learning by using the in-situ learning for the new unseen samples whose scores can not be represented suitably using the previously selected representative samples. Finally, we evaluate the viability of this method for in-situ classification of a database of underwater object images. Results are presented in terms of ROC curve, confusion matrix and correct classification rate measures

    Regularized Regression Problem in hyper-RKHS for Learning Kernels

    Full text link
    This paper generalizes the two-stage kernel learning framework, illustrates its utility for kernel learning and out-of-sample extensions, and proves {asymptotic} convergence results for the introduced kernel learning model. Algorithmically, we extend target alignment by hyper-kernels in the two-stage kernel learning framework. The associated kernel learning task is formulated as a regression problem in a hyper-reproducing kernel Hilbert space (hyper-RKHS), i.e., learning on the space of kernels itself. To solve this problem, we present two regression models with bivariate forms in this space, including kernel ridge regression (KRR) and support vector regression (SVR) in the hyper-RKHS. By doing so, it provides significant model flexibility for kernel learning with outstanding performance in real-world applications. Specifically, our kernel learning framework is general, that is, the learned underlying kernel can be positive definite or indefinite, which adapts to various requirements in kernel learning. Theoretically, we study the convergence behavior of these learning algorithms in the hyper-RKHS and derive the learning rates. Different from the traditional approximation analysis in RKHS, our analyses need to consider the non-trivial independence of pairwise samples and the characterisation of hyper-RKHS. To the best of our knowledge, this is the first work in learning theory to study the approximation performance of regularized regression problem in hyper-RKHS.Comment: 25 pages, 3 figure

    A Max-Min Task Offloading Algorithm for Mobile Edge Computing Using Non-Orthogonal Multiple Access

    Full text link
    To mitigate computational power gap between the network core and edges, mobile edge computing (MEC) is poised to play a fundamental role in future generations of wireless networks. In this letter, we consider a non-orthogonal multiple access (NOMA) transmission model to maximize the worst task to be offloaded among all users to the network edge server. A provably convergent and efficient algorithm is developed to solve the considered non-convex optimization problem for maximizing the minimum number of offloaded bits in a multi-user NOMAMEC system. Compared to the approach of optimized orthogonal multiple access (OMA), for given MEC delay, power and energy limits, the NOMA-based system considerably outperforms its OMA-based counterpart in MEC settings. Numerical results demonstrate that the proposed algorithm for NOMA-based MEC is particularly useful for delay sensitive applications.Comment: 5 pages, 5 figure

    Feature Scaling via Second-Order Cone Programming

    Get PDF
    Feature scaling has attracted considerable attention during the past several decades because of its important role in feature selection. In this paper, a novel algorithm for learning scaling factors of features is proposed. It first assigns a nonnegative scaling factor to each feature of data and then adopts a generalized performance measure to learn the optimal scaling factors. It is of interest to note that the proposed model can be transformed into a convex optimization problem: second-order cone programming (SOCP). Thus the scaling factors of features in our method are globally optimal in some sense. Several experiments on simulated data, UCI data sets, and the gene data set are conducted to demonstrate that the proposed method is more effective than previous methods

    Multiple Kernel Clustering

    Full text link

    Isometry and convexity in dimensionality reduction

    Get PDF
    The size of data generated every year follows an exponential growth. The number of data points as well as the dimensions have increased dramatically the past 15 years. The gap between the demand from the industry in data processing and the solutions provided by the machine learning community is increasing. Despite the growth in memory and computational power, advanced statistical processing on the order of gigabytes is beyond any possibility. Most sophisticated Machine Learning algorithms require at least quadratic complexity. With the current computer model architecture, algorithms with higher complexity than linear O(N) or O(N logN) are not considered practical. Dimensionality reduction is a challenging problem in machine learning. Often data represented as multidimensional points happen to have high dimensionality. It turns out that the information they carry can be expressed with much less dimensions. Moreover the reduced dimensions of the data can have better interpretability than the original ones. There is a great variety of dimensionality reduction algorithms under the theory of Manifold Learning. Most of the methods such as Isomap, Local Linear Embedding, Local Tangent Space Alignment, Diffusion Maps etc. have been extensively studied under the framework of Kernel Principal Component Analysis (KPCA). In this dissertation we study two current state of the art dimensionality reduction methods, Maximum Variance Unfolding (MVU) and Non-Negative Matrix Factorization (NMF). These two dimensionality reduction methods do not fit under the umbrella of Kernel PCA. MVU is cast as a Semidefinite Program, a modern convex nonlinear optimization algorithm, that offers more flexibility and power compared to iv KPCA. Although MVU and NMF seem to be two disconnected problems, we show that there is a connection between them. Both are special cases of a general nonlinear factorization algorithm that we developed. Two aspects of the algorithms are of particular interest: computational complexity and interpretability. In other words computational complexity answers the question of how fast we can find the best solution of MVU/NMF for large data volumes. Since we are dealing with optimization programs, we need to find the global optimum. Global optimum is strongly connected with the convexity of the problem. Interpretability is strongly connected with local isometry1 that gives meaning in relationships between data points. Another aspect of interpretability is association of data with labeled information. The contributions of this thesis are the following: 1. MVU is modified so that it can scale more efficient. Results are shown on 1 million speech datasets. Limitations of the method are highlighted. 2. An algorithm for fast computations for the furthest neighbors is presented for the first time in the literature. 3. Construction of optimal kernels for Kernel Density Estimation with modern convex programming is presented. For the first time we show that the Leave One Cross Validation (LOOCV) function is quasi-concave. 4. For the first time NMF is formulated as a convex optimization problem 5. An algorithm for the problem of Completely Positive Matrix Factorization is presented. 6. A hybrid algorithm of MVU and NMF the isoNMF is presented combining advantages of both methods. 7. The Isometric Separation Maps (ISM) a variation of MVU that contains classification information is presented. 8. Large scale nonlinear dimensional analysis on the TIMIT speech database is performed. 9. A general nonlinear factorization algorithm is presented based on sequential convex programming. Despite the efforts to scale the proposed methods up to 1 million data points in reasonable time, the gap between the industrial demand and the current state of the art is still orders of magnitude wide.Ph.D.Committee Chair: David Anderson; Committee Co-Chair: Alexander Gray; Committee Member: Anthony Yezzi; Committee Member: Hongyuan Zha; Committee Member: Justin Romberg; Committee Member: Ronald Schafe
    corecore