Search CORE

3,746,114 research outputs found

Efficient Classification for Metric Data

Author: Gottlieb Lee-Ad
Kontorovich Aryeh
Krauthgamer Robert
Publication venue
Publication date: 10/07/2014
Field of study

Recent advances in large-margin classification of data residing in general metric spaces (rather than Hilbert spaces) enable classification under various natural metrics, such as string edit and earthmover distance. A general framework developed for this purpose by von Luxburg and Bousquet [JMLR, 2004] left open the questions of computational efficiency and of providing direct bounds on generalization error. We design a new algorithm for classification in general metric spaces, whose runtime and accuracy depend on the doubling dimension of the data points, and can thus achieve superior classification performance in many common scenarios. The algorithmic core of our approach is an approximate (rather than exact) solution to the classical problems of Lipschitz extension and of Nearest Neighbor Search. The algorithm's generalization performance is guaranteed via the fat-shattering dimension of Lipschitz classifiers, and we present experimental evidence of its superiority to some common kernel methods. As a by-product, we offer a new perspective on the nearest neighbor classifier, which yields significantly sharper risk asymptotics than the classic analysis of Cover and Hart [IEEE Trans. Info. Theory, 1967].Comment: This is the full version of an extended abstract that appeared in Proceedings of the 23rd COLT, 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

A review on data stream classification

Author: A. A Haneen
A. Noraziah
Aggarwal C.C.
Aggarwal C.C.
Amini A.
Amini A.
Amini A.
Ankerst M.
Boden B.
Cao F.
Chen Y.
Esfandani G.
Forestiero A.
Hu W.
Huang T.-Q.
Kholghi M.
Mohd Helmy Abd Wahab
Nakata Y.
Namadchian A.
Rajaraman A.
Sun Z.
Xiong Z.
Publication venue: 'IOP Publishing'
Publication date: 01/01/2018
Field of study

At this present time, the significance of data streams cannot be denied as many researchers have placed their focus on the research areas of databases, statistics, and computer science. In fact, data streams refer to some data points sequences that are found in order with the potential to be non-binding, which is generated from the process of generating information in a manner that is not stationary. As such the typical tasks of searching data have been linked to streams of data that are inclusive of clustering, classification, and repeated mining of pattern. This paper presents several data stream clustering approaches, which are based on density, besides attempting to comprehend the function of the related algorithms; both semi-supervised and active learning, along with reviews of a number of recent studies

UTHM Institutional Repository

Crossref

UMP Institutional Repository

Fast DD-classification of functional data

Author: Mosler Karl
Mozharovskyi Pavlo
Publication venue
Publication date: 28/01/2016
Field of study

A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the

DD

-plot, which is a subset of the unit hypercube. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification on

[0,1]^q

. The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik-Cervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as a benchmark study

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

HAL: Hyper Article en Ligne

Customer profile classification using transactional data

Author: Apeh Edward Tersoo
Gabrys Bogdan
Schierz Amanda C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2011
Field of study

Customer profiles are by definition made up of factual and transactional data. It is often the case that due to reasons such as high cost of data acquisition and/or protection, only the transactional data are available for data mining operations. Transactional data, however, tend to be highly sparse and skewed due to a large proportion of customers engaging in very few transactions. This can result in a bias in the prediction accuracy of classifiers built using them towards the larger proportion of customers with fewer transactions. This paper investigates an approach for accurately and confidently grouping and classifying customers in bins on the basis of the number of their transactions. The experiments we conducted on a highly sparse and skewed real-world transactional data show that our proposed approach can be used to identify a critical point at which customer profiles can be more confidently distinguished

Crossref

Bournemouth University Research Online

Support vector machine for functional data classification

Author: Aronszajn
Besse
Biau
Cardot
Cardot
Cristianini
Dauxois
Deville
Evgeniou
Fabrice Rossi
Ferré
Francois
Frank
Hastie
Hastie
Hastie
Hastie
Hoerl
James
Leurgans
Lin
Mallat
Marx
Nathalie Villa
Pezzulli
Ramsay
Ramsay
Rossi
Rossi
Rossi
Rossi
Sandberg
Sandberg
Smola
Smola
Steinwart
Steinwart
Stinchcombe
Vapnik
Vert
Villa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.Comment: 13 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

HAL: Hyper Article en Ligne

Mixed Data and Classification of Transit Stops

Author: Handley John C.
Matteson David S.
Tupper Laura L.
Publication venue
Publication date: 12/11/2016
Field of study

An analysis of the characteristics and behavior of individual bus stops can reveal clusters of similar stops, which can be of use in making routing and scheduling decisions, as well as determining what facilities to provide at each stop. This paper provides an exploratory analysis, including several possible clustering results, of a dataset provided by the Regional Transit Service of Rochester, NY. The dataset describes ridership on public buses, recording the time, location, and number of entering and exiting passengers each time a bus stops. A description of the overall behavior of bus ridership is followed by a stop-level analysis. We compare multiple measures of stop similarity, based on location, route information, and ridership volume over time

arXiv.org e-Print Archive

Crossref

Signal processing methods for EEG data classification

Author: Varnavas Andreas Soteriou
Varnavas Andreas Soteriou
Publication venue
Publication date: 01/01/2008
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository