17,224 research outputs found
Optimization of distributions differences for classification
In this paper we introduce a new classification algorithm called Optimization
of Distributions Differences (ODD). The algorithm aims to find a transformation
from the feature space to a new space where the instances in the same class are
as close as possible to one another while the gravity centers of these classes
are as far as possible from one another. This aim is formulated as a
multiobjective optimization problem that is solved by a hybrid of an
evolutionary strategy and the Quasi-Newton method. The choice of the
transformation function is flexible and could be any continuous space function.
We experiment with a linear and a non-linear transformation in this paper. We
show that the algorithm can outperform 6 other state-of-the-art classification
methods, namely naive Bayes, support vector machines, linear discriminant
analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in
12 standard classification datasets. Our results show that the method is less
sensitive to the imbalanced number of instances comparing to these methods. We
also show that ODD maintains its performance better than other classification
methods in these datasets, hence, offers a better generalization ability
Click-through rate prediction : a comparative study of ensemble techniques in real-time bidding
Dissertation presented as a partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge ManagementReal-Time Bidding is an automated mechanism to buy and sell ads in real time that uses data collected from internet users, to accurately deliver the right audience to the best-matched advertisers. It goes beyond contextual advertising by motivating the bidding focused on user data and also, it is different from the sponsored search auction where the bid price is associated with keywords. There is extensive literature regarding the classification and prediction of performance metrics such as click-through-rate, impression rate and bidding price. However, there is limited research on the application of advanced machine learning techniques, such as ensemble methods, on predicting click-through rate of real-time bidding campaigns. This paper presents an in-depth analysis of predicting click-through rate in real-time bidding campaigns by comparing the classification results from six traditional classification models (Linear Discriminant Analysis, Logistic Regression, Regularised Regression, Decision trees, k-nearest neighbors and Support Vector Machines) with two popular ensemble learning techniques (Voting and BootStrap Aggregation). The goal of our research is to determine whether ensemble methods can accurately predict click-through rate and compared to standard classifiers. Results showed that ensemble techniques outperformed simple classifiers performance. Moreover, also, highlights the excellent performance of linear algorithms (Linear Discriminant Analysis and Regularized Regression)
Support Vector Machines for Credit Scoring and discovery of significant features
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1
Neural Class-Specific Regression for face verification
Face verification is a problem approached in the literature mainly using
nonlinear class-specific subspace learning techniques. While it has been shown
that kernel-based Class-Specific Discriminant Analysis is able to provide
excellent performance in small- and medium-scale face verification problems,
its application in today's large-scale problems is difficult due to its
training space and computational requirements. In this paper, generalizing our
previous work on kernel-based class-specific discriminant analysis, we show
that class-specific subspace learning can be cast as a regression problem. This
allows us to derive linear, (reduced) kernel and neural network-based
class-specific discriminant analysis methods using efficient batch and/or
iterative training schemes, suited for large-scale learning problems. We test
the performance of these methods in two datasets describing medium- and
large-scale face verification problems.Comment: 9 pages, 4 figure
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows
acqui- sition of high dimensional datasets from which valuable information may
be extracted using multivariate statistical analyses, such as dimensionality
reduction and automatic classification (supervised and unsupervised). In this
work, a supervised classification through a partial least squares discriminant
analysis (PLS-DA) is performed on the hy- perspectral data. The obtained
results are compared with those obtained by the most commonly used
classification approaches
- …