9 research outputs found

    High-probability minimax probability machines

    Get PDF
    In this paper we focus on constructing binary classifiers that are built on the premise of minimising an upper bound on their future misclassification rate. We pay particular attention to the approach taken by the minimax probability machine (Lanckriet et al. in J Mach Learn Res 3:555–582, 2003), which directly minimises an upper bound on the future misclassification rate in a worst-case setting: that is, under all possible choices of class-conditional distributions with a given mean and covariance matrix. The validity of these bounds rests on the assumption that the means and covariance matrices are known in advance, however this is not always the case in practice and their empirical counterparts have to be used instead. This can result in erroneous upper bounds on the future misclassification rate and lead to the formulation of sub-optimal predictors. In this paper we address this oversight and study the influence that uncertainty in the moments, the mean and covariance matrix, has on the construction of predictors under the minimax principle. By using high-probability upper bounds on the deviation between true moments and their empirical counterparts, we can re-formulate the minimax optimisation to incorporate this uncertainty and find the predictor that minimises the high-probability, worst-case misclassification rate. The moment uncertainty introduces a natural regularisation component into the optimisation, where each class is regularised in proportion to the degree of moment uncertainty. Experimental results would support the view that in the case of with limited data availability, the incorporation of moment uncertainty can lead to the formation of better predictors

    Sparsity in machine learning: theory and practice.

    Get PDF
    The thesis explores sparse machine learning algorithms for supervised (classification and regression) and unsupervised (subspace methods) learning. For classification, we review the set covering machine (SCM) and propose new algorithms that directly minimise the SCMs sample compression generalisation error bounds during the training phase. Two of the resulting algorithms are proved to produce optimal or near-optimal solutions with respect to the loss bounds they minimise. One of the SCM loss bounds is shown to be incorrect and a corrected derivation of the sample compression bound is given along with a framework for allowing asymmetrical loss in sample compression risk bounds. In regression, we analyse the kernel matching pursuit (KMP) algorithm and derive a loss bound that takes into account the dual sparse basis vectors. We make connections to a sparse kernel principal components analysis (sparse KPCA) algorithm and bound its future loss using a sample compression argument. This investigation suggests a similar argument for kernel canonical correlation analysis (KCCA) and so the application of a similar sparsity algorithm gives rise to the sparse KCCA algorithm. We also propose a loss bound for sparse KCCA using the novel technique developed for KMP. All of the algorithms and bounds proposed in the thesis are elucidated with experiments

    Integer optimization methods for machine learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (p. 129-137).In this thesis, we propose new mixed integer optimization (MIO) methods to ad- dress problems in machine learning. The first part develops methods for supervised bipartite ranking, which arises in prioritization tasks in diverse domains such as information retrieval, recommender systems, natural language processing, bioinformatics, and preventative maintenance. The primary advantage of using MIO for ranking is that it allows for direct optimization of ranking quality measures, as opposed to current state-of-the-art algorithms that use heuristic loss functions. We demonstrate using a number of datasets that our approach can outperform other ranking methods. The second part of the thesis focuses on reverse-engineering ranking models. This is an application of a more general ranking problem than the bipartite case. Quality rankings affect business for many organizations, and knowing the ranking models would allow these organizations to better understand the standards by which their products are judged and help them to create higher quality products. We introduce an MIO method for reverse-engineering such models and demonstrate its performance in a case-study with real data from a major ratings company. We also devise an approach to find the most cost-effective way to increase the rank of a certain product. In the final part of the thesis, we develop MIO methods to first generate association rules and then use the rules to build an interpretable classifier in the form of a decision list, which is an ordered list of rules. These are both combinatorially challenging problems because even a small dataset may yield a large number of rules and a small set of rules may correspond to many different orderings. We show how to use MIO to mine useful rules, as well as to construct a classifier from them. We present results in terms of both classification accuracy and interpretability for a variety of datasets.by Allison An Chang.Ph.D

    The Decision List Machine

    No full text
    We introduce a new learning algorithm for decision lists to allow features that are constructed from the data and to allow a tradeoff between accuracy and complexity. We bound its generalization error in terms of the number of errors and the size of the classifier it finds on the training data. We also compare its performance on some natural data sets with the set covering machine and the support vector machine.
    corecore