12 research outputs found
Combining MLC and SVM classifiers for learning based decision making : analysis and evaluations
Maximum likelihood classifier (MLC) and support vector machines (SVM) are two commonly used approaches in machine learning. MLC is based on Bayesian theory in estimating parameters of a probabilistic model, whilst SVM is an optimization based nonparametric method in this context. Recently, it is found that SVM in some cases is equivalent to MLC in probabilistically modeling the learning process. In this paper, MLC and SVM are combined in learning and classification, which helps to yield probabilistic output for SVM and facilitate soft decision making. In total four groups of data are used for evaluations, covering sonar, vehicle, breast cancer, and DNA sequences. The data samples are characterized in terms of Gaussian/non-Gaussian distributed and balanced/unbalanced samples which are then further used for performance assessment in comparing the SVM and the combined SVM-MLC classifier. Interesting results are reported to indicate how the combined classifier may work under various conditions. Accepted on May 11, 201
Global optimization based on active preference learning with radial basis functions
AbstractThis paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express apreferencesuch as "this is better than that" between two candidate decision vectors. The algorithm described in this paper aims at reaching the global optimizer by iteratively proposing the decision maker a new comparison to make, based on actively learning a surrogate of the latent (unknown and perhaps unquantifiable) objective function from past sampled decision vectors and pairwise preferences. A radial-basis function surrogate is fit via linear or quadratic programming, satisfying if possible the preferences expressed by the decision maker on existing samples. The surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of the surrogate and exploration of the decision space, or maximize a function related to the probability that the new candidate will be preferred. Compared to active preference learning based on Bayesian optimization, we show that our approach is competitive in that, within the same number of comparisons, it usually approaches the global optimum more closely and is computationally lighter. Applications of the proposed algorithm to solve a set of benchmark global optimization problems, for multi-objective optimization, and for optimal tuning of a cost-sensitive neural network classifier for object recognition from images are described in the paper. MATLAB and a Python implementations of the algorithms described in the paper are available athttp://cse.lab.imtlucca.it/~bemporad/glis
Enhancing Partially Labelled Data: Self Learning and Word Vectors in Natural Language Processing
There has been an explosion in unstructured text data in recent years with services like Twitter, Facebook and WhatsApp helping drive this growth. Many of these companies are facing pressure to monitor the content on their platforms and as such Natural Language Processing (NLP) techniques are more important than ever. There are many applications of NLP ranging from spam filtering, sentiment analysis of social media, automatic text summarisation and document classification
New models and methods for classification and feature selection. a mathematical optimization perspective
The objective of this PhD dissertation is the development of new models for Supervised
Classification and Benchmarking, making use of Mathematical Optimization and Statistical
tools. Particularly, we address the fusion of instruments from both disciplines,
with the aim of extracting knowledge from data. In such a way, we obtain innovative
methodologies that overcome to those existing ones, bridging theoretical Mathematics
with real-life problems.
The developed works along this thesis have focused on two fundamental methodologies
in Data Science: support vector machines (SVM) and Benchmarking. Regarding
the first one, the SVM classifier is based on the search for the separating hyperplane of
maximum margin and it is written as a quadratic convex problem. In the Benchmarking
context, the goal is to calculate the different efficiencies through a non-parametric
deterministic approach. In this thesis we will focus on Data Envelopment Analysis
(DEA), which consists on a Linear Programming formulation.
This dissertation is structured as follows. In Chapter 1 we briefly present the
different challenges this thesis faces on, as well as their state-of-the-art. In the same
vein, the different formulations used as base models are exposed, together with the
notation used along the chapters in this thesis.
In Chapter 2, we tackle the problem of the construction of a version of the SVM
that considers misclassification errors. To do this, we incorporate new performance
constraints in the SVM formulation, imposing upper bounds on the misclassification
errors. The resulting formulation is a quadratic convex problem with linear constraints.
Chapter 3 continues with the SVM as the basis, and sets out the problem of providing
not only a hard-labeling for each of the individuals belonging to the dataset, but a
class probability estimation. Furthermore, confidence intervals for both the score values
and the posterior class probabilities will be provided. In addition, as in the previous
chapter, we will carry the obtained results to the field in which misclassified errors are
considered. With such a purpose, we have to solve either a quadratic convex problem
or a quadratic convex problem with linear constraints and integer variables, and always
taking advantage of the parameter tuning of the SVM, that is usually wasted.
Based on the results in Chapter 2, in Chapter 4 we handle the problem of feature selection, taking again into account the misclassification errors. In order to build this
technique, the feature selection is embedded in the classifier model. Such a process is
divided in two different steps. In the first step, feature selection is performed while at
the same time data is separated via an hyperplane or linear classifier, considering the
performance constraints. In the second step, we build the maximum margin classifier
(SVM) using the selected features from the first step, and again taking into account
the same performance constraints.
In Chapter 5, we move to the problem of Benchmarking, where the practices of
different entities are compared through the products or services they provide. This is
done with the aim of make some changes or improvements in each of them. Concretely,
in this chapter we propose a Mixed Integer Linear Programming formulation based in
Data Envelopment Analysis (DEA), with the aim of perform feature selection, improving
the interpretability and comprehension of the obtained model and efficiencies.
Finally, in Chapter 6 we collect the conclusions of this thesis as well as future lines
of research
PROBABILISTIC AND GEOMETRIC APPROACHES TO THE ANALYSIS OF NON-STANDARD DATA
This dissertation explores topics in machine learning, network analysis, and the foundations of statistics using tools from geometry, probability and optimization. The rise of machine learning has brought powerful new (and old) algorithms for data analysis. Much of classical statistics research is about understanding how statistical algorithms behave depending on various aspects of the data. The first part of this dissertation examines the support vector machine classifier (SVM). Leveraging Karush-Kuhn-Tucker conditions we find surprising connections between SVM and several other simple classifiers. We use these connections to explain SVM’s behavior in a variety of data scenarios and demonstrate how these insights are directly relevant to the data analyst. The next part of this dissertation studies networks which evolve over time. We first develop a method to empirically evaluate vertex centrality metrics in an evolving network. We then apply this methodology to investigate the role of precedent in the US legal system. Next, we shift to a probabilistic perspective on temporally evolving networks. We study a general probabilistic model of an evolving network that undergoes an abrupt change in its evolution dynamics. In particular, we examine the effect of such a change on the network’s structural properties. We develop mathematical techniques using continuous time branching processes to derive quantitative error bounds for functionals of a major class of these models about their large network limits. Using these results, we develop general theory to understand the role of abrupt changes in the evolution dynamics of these models. Based on this theory we derive a consistent, non-parametric change point detection estimator. We conclude with a discussion on foundational topics in statistics, commenting on debates both old and new. First, we examine the false confidence theorem which raises questions for data practitioners making inferences based on epistemic uncertainty measures such as Bayesian posterior distributions. Second, we give an overview of the rise of “data science" and what it means for statistics (and vice versa), touching on topics such as reproducibility, computation, education, communication and statistical theory.Doctor of Philosoph
Probabilistic methods for high dimensional signal processing
This thesis investigates the use of probabilistic and Bayesian methods for analysing high dimensional signals. The work proceeds in three main parts sharing similar objectives. Throughout we focus on building data efficient inference mechanisms geared toward high dimensional signal processing. This is achieved by using probabilistic models on top of informative data representation operators. We also improve on the fitting objective to make it better suited to our requirements. Variational Inference We introduce a variational approximation framework using direct optimisation of what is known as the scale invariant Alpha-Beta divergence (sAB-divergence). This new objective encompasses most variational objectives that use the Kullback-Leibler, the Rényi or the gamma divergences. It also gives access to objective functions never exploited before in the context of variational inference. This is achieved via two easy to interpret control parameters, which allow for a smooth interpolation over the divergence space while trading-off properties such as mass-covering of a target distribution and robustness to outliers in the data. Furthermore, the sAB variational objective can be optimised directly by re-purposing existing methods for Monte Carlo computation of complex variational objectives, leading to estimates of the divergence instead of variational lower bounds. We show the advantages of this objective on Bayesian models for regression problems. Roof-Edge hidden Markov Random Field We propose a method for semi-local Hurst estimation by incorporating a Markov random field model to constrain a wavelet-based pointwise Hurst estimator. This results in an estimator which is able to exploit the spatial regularities of a piecewise parametric varying Hurst parameter. The pointwise estimates are jointly inferred along with the parametric form of the underlying Hurst function which characterises how the Hurst parameter varies deterministically over the spatial support of the data. Unlike recent Hurst regularisation methods, the proposed approach is flexible in that arbitrary parametric forms can be considered and is extensible in as much as the associated gradient descent algorithm can accommodate a broad class of distributional assumptions without any significant modifications. The potential benefits of the approach are illustrated with simulations of various first-order polynomial forms. Scattering Hidden Markov Tree We here combine the rich, over-complete signal representation afforded by the scattering transform together with a probabilistic graphical model which captures hierarchical dependencies between coefficients at different layers. The wavelet scattering network result in a high-dimensional representation which is translation invariant and stable to deformations whilst preserving informative content. Such properties are achieved by cascading wavelet transform convolutions with non-linear modulus and averaging operators. The network structure and its distributions are described using a Hidden Markov Tree. This yields a generative model for high dimensional inference and offers a means to perform various inference tasks such as prediction. Our proposed scattering convolutional hidden Markov tree displays promising results on classification tasks of complex images in the challenging case where the number of training examples is extremely small. We also use variational methods on the aforementioned model and leverage the objective sAB variational objective defined earlier to improve the quality of the approximation
Design of Interactive Feature Space Construction Protocol
Machine learning deals with designing systems that learn from data i.e. automatically improve
with experience. Systems gain experience by detecting patterns or regularities and using them for
making predictions. These predictions are based on the properties that the system learns from the
data. Thus when we say a machine learns, it means it has changed in a way that allows it to
perform more efficiently than before. Machine learning is emerging as an important technology
for solving a number of applications involving natural language processing applications, medical
diagnosis, game playing or financial applications. Wide variety of machine learning approaches
have been developed and used for a number of applications.
We first review the work done in the field of machine learning and analyze various concepts
about machine learning that are applicable to the work presented in this thesis. Next we examine
active machine learning for pipelining of an important natural language application i.e.
information extraction, in which the task of prediction is carried out in different stages and the
output of each stage serves as an input to the next stage.
A number of machine learning algorithms have been developed for different applications.
However no single machine learning algorithm can be used appropriately for all learning
problems. It is not possible to create a general learner for all problems because there are varied
types of real world datasets that cannot be handled by a single learner. For this purpose an
evaluation of the machine learning algorithms is needed. We present an experiment for the
evaluation of various state-of-the-art machine learning algorithms using an interactive machine
learning tool called WEKA (Waikato Environment for Knowledge Analysis). Evaluation is
carried out with the purpose of finding an optimal solution for a real world learning problemcredit
approval used in banks. It is a classification problem.
Finally, we present an approach of combining various learners with the aim of increasing their
efficiency. We present two experiments that evaluate the machine learning algorithms for
efficiency and compare their performance with the new combined approach, for the same
classification problem. Later we show the effects of feature selection on the efficiency of our
combined approach as well as on other machine learning techniques. The aim of this work is to
analyze the techniques that increase the efficiency of the learners
Support Vector Machines as Probabilistic Models
We show how the SVM can be viewed as a maximum likelihood estimate of a class of probabilistic models. This model class can be viewed as a reparametrization of the SVM in a similar vein to the v-SVM reparametrizing the classical (C-)SVM. It is not discriminative, but has a non-uniform marginal. We illustrate the benefits of this new view by rederiving and re-investigating two established SVM-related algorithms