694 research outputs found
On Maximum Margin Hierarchical Classification
We present work in progress towards maximum margin hierarchical classification where the objects are allowed to belong to more than one category at a time. The classification hierarchy is represented as a Markov network equipped with an exponential family defined on the edges. We present a variation of the maximum margin multilabel learning framework, suited to the hierarchical classification task and allows efficient implementation via gradient-based methods. We compare the behaviour of the proposed method to the recently introduced hierarchical regularized least squares classifier as well as two SVM variants in Reuter's news article classification
Recommended from our members
Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels
In this paper we propose two distinct enhancements to the basic
''bag-of-keypoints" image categorisation scheme proposed in [4]. In this
approach images are represented as a variable sized set of local image
features (keypoints). Thus, we require machine learning tools which
can operate on sets of vectors. In [4] this is achieved by representing
the set as a histogram over bins found by k-means. We show how this
approach can be improved and generalised using Gaussian Mixture Models
(GMMs). Alternatively, the set of keypoints can be represented directly
as a probability density function, over which a kernel can be de ned. This
approach is shown to give state of the art categorisation performance
Kernel Ellipsoidal Trimming
Ellipsoid estimation is an issue of primary importance in many practical areas such as control, system identification, visual/audio tracking, experimental design, data mining, robust statistics and novelty/outlier detection. This paper presents a new method of kernel information matrix ellipsoid estimation (KIMEE) that finds an ellipsoid in a kernel defined feature space based on a centered information matrix. Although the method is very general and can be applied to many of the aforementioned problems, the main focus in this paper is the problem of novelty or outlier detection associated with fault detection. A simple iterative algorithm based on Titterington's minimum volume ellipsoid method is proposed for practical implementation. The KIMEE method demonstrates very good performance on a set of real-life and simulated datasets compared with support vector machine methods
Complexity of pattern classes and Lipschitz property
Rademacher and Gaussian complexities are successfully used in learning theory for measuring the capacity of the class of functions to be learned. One of the most important properties for these complexities is their Lipschitz property: a composition of a class of functions with a fixed Lipschitz function may increase its complexity by at most twice the Lipschitz constant. The proof of this property is non-trivial (in contrast to the other properties) and it is believed that the proof in the Gaussian case is conceptually more difficult then the one for the Rademacher case. In this paper we give a detailed prove of the Lipschitz property for the Rademacher case and generalize the same idea to an arbitrary complexity (including the Gaussian). We also discuss a related topic about the Rademacher complexity of a class consisting of all the Lipschitz functions with a given Lipschitz constant. We show that the complexity is surprisingly low in the one-dimensional case. The question for higher dimensions remains open
High-probability minimax probability machines
In this paper we focus on constructing binary classifiers that are built on the premise of minimising an upper bound on their future misclassification rate. We pay particular attention to the approach taken by the minimax probability machine (Lanckriet et al. in J Mach Learn Res 3:555–582, 2003), which directly minimises an upper bound on the future misclassification rate in a worst-case setting: that is, under all possible choices of class-conditional distributions with a given mean and covariance matrix. The validity of these bounds rests on the assumption that the means and covariance matrices are known in advance, however this is not always the case in practice and their empirical counterparts have to be used instead. This can result in erroneous upper bounds on the future misclassification rate and lead to the formulation of sub-optimal predictors. In this paper we address this oversight and study the influence that uncertainty in the moments, the mean and covariance matrix, has on the construction of predictors under the minimax principle. By using high-probability upper bounds on the deviation between true moments and their empirical counterparts, we can re-formulate the minimax optimisation to incorporate this uncertainty and find the predictor that minimises the high-probability, worst-case misclassification rate. The moment uncertainty introduces a natural regularisation component into the optimisation, where each class is regularised in proportion to the degree of moment uncertainty. Experimental results would support the view that in the case of with limited data availability, the incorporation of moment uncertainty can lead to the formation of better predictors
Sparse Semi-supervised Learning Using Conjugate Functions
In this paper, we propose a general framework for sparse semi-supervised learning, which concerns
using a small portion of unlabeled data and a few labeled data to represent target functions and thus
has the merit of accelerating function evaluations when predicting the output of a new example.
This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss
involving a regularization with unlabeled data, and is applicable to a family of semi-supervised
learning methods such as multi-view co-regularized least squares and single-view Laplacian support
vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view
SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and
the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative
algorithm to optimize the problem. We give the margin bound on the generalization error of the
sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function
class. Experiments on artificial and real-world data show their effectiveness. We further give a
sequential training approach to show their possibility and potential for uses in large-scale problems
and provide encouraging experimental results indicating the efficacy of the margin bound and empirical
Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised
learnin
Two view learning: SVM-2K, theory and practice
Kernel methods make it relatively easy to define complex highdimensional
feature spaces. This raises the question of how we can
identify the relevant subspaces for a particular learning task. When two
views of the same phenomenon are available kernel Canonical Correlation
Analysis (KCCA) has been shown to be an effective preprocessing
step that can improve the performance of classification algorithms such
as the Support Vector Machine (SVM). This paper takes this observation
to its logical conclusion and proposes a method that combines this
two stage learning (KCCA followed by SVM) into a single optimisation
termed SVM-2K. We present both experimental and theoretical analysis
of the approach showing encouraging results and insights
Practical Bayesian support vector regression for financial time series prediction and market condition change detection
Support vector regression (SVR) has long been proven to be a successful tool to predict financial time series. The core idea of this study is to outline an automated framework for achieving a faster and easier parameter selection process, and at the same time, generating useful prediction uncertainty estimates in order to effectively tackle flexible real-world financial time series prediction problems. A Bayesian approach to SVR is discussed, and implemented. It is found that the direct implementation of the probabilistic framework of Gao et al. returns unsatisfactory results in our experiments. A novel enhancement is proposed by adding a new kernel scaling parameter (Formula presented.) to overcome the difficulties encountered. In addition, the multi-armed bandit Bayesian optimization technique is applied to automate the parameter selection process. Our framework is then tested on financial time series of various asset classes (i.e. equity index, credit default swaps spread, bond yields, and commodity futures) to ensure its flexibility. It is shown that the generalization performance of this parameter selection process can reach or sometimes surpass the computationally expensive cross-validation procedure. An adaptive calibration process is also described to allow practical use of the prediction uncertainty estimates to assess the quality of predictions. It is shown that the machine-learning approach discussed in this study can be developed as a very useful pricing tool, and potentially a market condition change detector. A further extension is possible by taking the prediction uncertainties into consideration when building a financial portfolio
1-factorisation of the Composition of Regular Graphs
1-factorability of the composition of graphs is studied. The followings sufficient conditions are proved: is 1-factorable if and are regular and at least one of the following holds: (i) Graphs and both contain a 1-factor, (ii) is 1-factorable (iii) is 1-factorable. It is also shown that the tensor product is 1-factorable, if at least one of two graphs is 1-factorable. This result in turn implies that the strong tensor product is 1-factorable, if is 1-factorable
- …