25 research outputs found
Learning to Resolve Natural Language Ambiguities: A Unified Approach
We analyze a few of the commonly used statistics based and machine learning
algorithms for natural language disambiguation tasks and observe that they can
be re-cast as learning linear separators in the feature space. Each of the
methods makes a priori assumptions, which it employs, given the data, when
searching for its hypothesis. Nevertheless, as we show, it searches a space
that is as rich as the space of all linear separators. We use this to build an
argument for a data driven approach which merely searches for a good linear
separator in the feature space, without further assumptions on the domain or a
specific problem.
We present such an approach - a sparse network of linear separators,
utilizing the Winnow learning algorithm - and show how to use it in a variety
of ambiguity resolution problems. The learning approach presented is
attribute-efficient and, therefore, appropriate for domains having very large
number of attributes.
In particular, we present an extensive experimental comparison of our
approach with other methods on several well studied lexical disambiguation
tasks such as context-sensitive spelling correction, prepositional phrase
attachment and part of speech tagging. In all cases we show that our approach
either outperforms other methods tried for these tasks or performs comparably
to the best
Regression depth and support vector machine
The regression depth method (RDM) proposed by Rousseeuw and Hubert [RH99] plays an important role in the area of robust regression for a continuous response variable. Christmann and Rousseeuw [CR01] showed that RDM is also useful for the case of binary regression. Vapnik?s convex risk minimization principle [Vap98] has a dominating role in statistical machine learning theory. Important special cases are the support vector machine (SVM), [epsilon]-support vector regression and kernel logistic regression. In this paper connections between these methods from different disciplines are investigated for the case of pattern recognition. Some results concerning the robustness of the SVM and other kernel based methods are given. --
Counting aggregate classifiers.
There are many methods to design classifiers for the supervised classification problem. In this paper, we study the problem of aggregating classifiers. We construct an algorithm to count the number of distinct aggregate classifiers. This leads to a new way of finding a best aggregate classifier. When there are only two classes, we explore the link between aggregating classifiers and n-bit boolean functions. Further, the sequence of the number of distinct aggregated classifiers appears to be new.Boolean function; Classification; Classifiers; Design; Functions; Methods; Studies; Supervised classification; Weighted majority vote;
On robustness properties of convex risk minimization methods for pattern recognition
The paper brings together methods from two disciplines: machine learning theory and robust statistics. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds of the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. A sensitivity analysis of the support vector machine is given. --AdaBoost loss function,influence function,kernel logistic regression,robustness,sensitivity curve,statistical learning,support vector machine,total variation
Regression Depth and Support Vector Machine
The regression depth method (RDM) proposed by Rousseeuw and Hubert [RH99] plays an important role in the area of robust regression for a continuous response variable. Christmann and Rousseeuw [CR01] showed that RDM is also useful for the case of binary regression. Vapnik’s convex risk minimization principle [Vap98] has a dominating role in statistical machine learning theory. Important special cases are the support vector machine (SVM), epsilon-support vector regression and kernel logistic regression. In this paper connections between these methods from different disciplines are investigated for the case of pattern recognition. Some results concerning the robustness of the SVM and other kernel based methods are given
Kernel-convoluted Deep Neural Networks with Data Augmentation
The Mixup method (Zhang et al. 2018), which uses linearly interpolated data,
has emerged as an effective data augmentation tool to improve generalization
performance and the robustness to adversarial examples. The motivation is to
curtail undesirable oscillations by its implicit model constraint to behave
linearly at in-between observed data points and promote smoothness. In this
work, we formally investigate this premise, propose a way to explicitly impose
smoothness constraints, and extend it to incorporate with implicit model
constraints. First, we derive a new function class composed of
kernel-convoluted models (KCM) where the smoothness constraint is directly
imposed by locally averaging the original functions with a kernel function.
Second, we propose to incorporate the Mixup method into KCM to expand the
domains of smoothness. In both cases of KCM and the KCM adapted with the Mixup,
we provide risk analysis, respectively, under some conditions for kernels. We
show that the upper bound of the excess risk is not slower than that of the
original function class. The upper bound of the KCM with the Mixup remains
dominated by that of the KCM if the perturbation of the Mixup vanishes faster
than where is a sample size. Using CIFAR-10 and CIFAR-100
datasets, our experiments demonstrate that the KCM with the Mixup outperforms
the Mixup method in terms of generalization and robustness to adversarial
examples
Making Risk Minimization Tolerant to Label Noise
In many applications, the training data, from which one needs to learn a
classifier, is corrupted with label noise. Many standard algorithms such as SVM
perform poorly in presence of label noise. In this paper we investigate the
robustness of risk minimization to label noise. We prove a sufficient condition
on a loss function for the risk minimization under that loss to be tolerant to
uniform label noise. We show that the loss, sigmoid loss, ramp loss and
probit loss satisfy this condition though none of the standard convex loss
functions satisfy it. We also prove that, by choosing a sufficiently large
value of a parameter in the loss function, the sigmoid loss, ramp loss and
probit loss can be made tolerant to non-uniform label noise also if we can
assume the classes to be separable under noise-free data distribution. Through
extensive empirical studies, we show that risk minimization under the
loss, the sigmoid loss and the ramp loss has much better robustness to label
noise when compared to the SVM algorithm
Supervised Classification: Quite a Brief Overview
The original problem of supervised classification considers the task of
automatically assigning objects to their respective classes on the basis of
numerical measurements derived from these objects. Classifiers are the tools
that implement the actual functional mapping from these measurements---also
called features or inputs---to the so-called class label---or output. The
fields of pattern recognition and machine learning study ways of constructing
such classifiers. The main idea behind supervised methods is that of learning
from examples: given a number of example input-output relations, to what extent
can the general mapping be learned that takes any new and unseen feature vector
to its correct class? This chapter provides a basic introduction to the
underlying ideas of how to come to a supervised classification problem. In
addition, it provides an overview of some specific classification techniques,
delves into the issues of object representation and classifier evaluation, and
(very) briefly covers some variations on the basic supervised classification
task that may also be of interest to the practitioner