22 research outputs found

    Healing the Relevance Vector Machine through Augmentation

    Get PDF
    The Relevance Vector Machine (RVM) is a sparse approximate Bayesian kernel method. It provides full predictive distributions for test cases. However, the predictive uncertainties have the unintuitive property, that emphthey get smaller the further you move away from the training cases. We give a thorough analysis. Inspired by the analogy to non-degenerate Gaussian Processes, we suggest augmentation to solve the problem. The purpose of the resulting model, RVM*, is primarily to corroborate the theoretical and experimental analysis. Although RVM* could be used in practical applications, it is no longer a truly sparse model. Experiments show that sparsity comes at the expense of worse predictive distributions

    Understanding and Comparing Scalable Gaussian Process Regression for Big Data

    Full text link
    As a non-parametric Bayesian model which produces informative predictive distribution, Gaussian process (GP) has been widely used in various fields, like regression, classification and optimization. The cubic complexity of standard GP however leads to poor scalability, which poses challenges in the era of big data. Hence, various scalable GPs have been developed in the literature in order to improve the scalability while retaining desirable prediction accuracy. This paper devotes to investigating the methodological characteristics and performance of representative global and local scalable GPs including sparse approximations and local aggregations from four main perspectives: scalability, capability, controllability and robustness. The numerical experiments on two toy examples and five real-world datasets with up to 250K points offer the following findings. In terms of scalability, most of the scalable GPs own a time complexity that is linear to the training size. In terms of capability, the sparse approximations capture the long-term spatial correlations, the local aggregations capture the local patterns but suffer from over-fitting in some scenarios. In terms of controllability, we could improve the performance of sparse approximations by simply increasing the inducing size. But this is not the case for local aggregations. In terms of robustness, local aggregations are robust to various initializations of hyperparameters due to the local attention mechanism. Finally, we highlight that the proper hybrid of global and local scalable GPs may be a promising way to improve both the model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB

    New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification

    Get PDF
    In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP

    Toward robust and efficient physically-based rendering

    Get PDF
    Le rendu fondé sur la physique est utilisé pour le design, l'illustration ou l'animation par ordinateur. Ce type de rendu produit des images photo-réalistes en résolvant les équations qui décrivent le transport de la lumière dans une scène. Bien que ces équations soient connues depuis longtemps, et qu'un grand nombre d'algorithmes aient été développés pour les résoudre, il n'en existe pas qui puisse gérer de manière efficace toutes les scènes possibles. Plutôt qu'essayer de développer un nouvel algorithme de simulation d'éclairage, nous proposons d'améliorer la robustesse de la plupart des méthodes utilisées à ce jour et/ou qui sont amenées à être développées dans les années à venir. Nous faisons cela en commençant par identifier les sources de non-robustesse dans un moteur de rendu basé sur la physique, puis en développant des méthodes permettant de minimiser leur impact. Le résultat de ce travail est un ensemble de méthodes utilisant différents outils mathématiques et algorithmiques, chacune de ces méthodes visant à améliorer une partie spécifique d'un moteur de rendu. Nous examinons aussi comment les architectures matérielles actuelles peuvent être utilisées à leur maximum afin d'obtenir des algorithmes plus rapides, sans ajouter d'approximations. Bien que les contributions présentées dans cette thèse aient vocation à être combinées, chacune d'entre elles peut être utilisée seule : elles sont techniquement indépendantes les unes des autres.Physically-based rendering is used for design, illustration or computer animation. It consists in producing photorealistic images by solving the equations which describe how light travels in a scene. Although these equations have been known for a long time and many algorithms for light simulation have been developed, no algorithm exists to solve them efficiently for any scene. Instead of trying to develop a new algorithm devoted to light simulation, we propose to enhance the robustness of most methods used nowadays and/or which can be developed in the years to come. We do this by first identifying the sources of non-robustness in a physically-based rendering engine, and then addressing them by specific algorithms. The result is a set of methods based on different mathematical or algorithmic methods, each aiming at improving a different part of a rendering engine. We also investigate how the current hardware architectures can be used at their maximum to produce more efficient algorithms, without adding approximations. Although the contributions presented in this dissertation are meant to be combined, each of them can be used in a standalone way: they have been designed to be internally independent of each other

    Improved kernel methods for classification

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Automação de classificador SVM para aplicação em projetos de consultoria de gestão

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2019.O trabalho propôs a criação de um protótipo de ferramenta para auxiliar os consultores de uma consultoria de gestão empresarial no melhor entendimento e aprofundamento do problema de seus clientes bem como na tomada de decisões e proposições de soluções. Isso é feito a partir da automatização do processo de mineração de dados, podendo ser realizado com pouca necessidade de interação com o usuário. O desenho da ferramenta tomou como base conceitos e estudos de ferramentas disponíveis para automated machine learning realizados por meio de ampla revisão bibliográfica. A partir dos estudos, foi possível estruturar a lógica da ferramenta e suas funcionalidades. Essa lógica tem como base algumas das etapas do CRISP-DM, passando pelo entendimento dos dados, preparação, modelagem e avaliação. A validação da aplicabilidade da ferramenta foi feita utilizando bases de dados públicas. Os resultados mostram que com a utilização da ferramenta, mesmo com pouco conhecimento de mineração de dados, é possível construir modelos consistentes.This work proposed the creation of a prototype tool to assist the consultants of a business management consultancy in the best understanding of the problem of its clients as well as in the decision making and propositions of solutions. This was done by automating the data mining process, which can be accomplished with little need for user interaction. The tool design was based on concepts and studies of available tools for automated machine learning supported by a wide bibliographic review. From the studies, it was possible to structure the logic of the tool and its functionalities. This logic is based on some of the steps of CRISP-DM, including data understanding, data preparation, modeling and evaluation. The tool applicability validation was done using public databases. The results show that with the use of the tool, even with little knowledge of data mining, it is possible to construct consistent models

    Kernel Methods for Machine Learning with Life Science Applications

    Get PDF

    A Practical and Conceptual Framework for Learning in Control

    No full text
    We propose a fully Bayesian approach for efficient reinforcement learning (RL) in Markov decision processes with continuous-valued state and action spaces when no expert knowledge is available. Our framework is based on well-established ideas from statistics and machine learning and learns fast since it carefully models, quantifies, and incorporates available knowledge when making decisions. The key ingredient of our framework is a probabilistic model, which is implemented using a Gaussian process (GP), a distribution over functions. In the context of dynamic systems, the GP models the transition function. By considering all plausible transition functions simultaneously, we reduce model bias, a problem that frequently occurs when deterministic models are used. Due to its generality and efficiency, our RL framework can be considered a conceptual and practical approach to learning models and controllers whe
    corecore