30,622 research outputs found
Weighted k-Nearest-Neighbor Techniques and Ordinal Classification
In the field of statistical discrimination k-nearest neighbor classification is a well-known, easy and successful method. In this paper we present an extended version of this technique, where the distances of the nearest neighbors can be taken into account. In this sense there is a close connection to LOESS, a local regression technique. In addition we show possibilities to use nearest neighbor for classification in the case of an ordinal class structure. Empirical studies show the advantages of the new techniques
Nearest Neighbor and Kernel Survival Analysis: Nonasymptotic Error Bounds and Strong Consistency Rates
We establish the first nonasymptotic error bounds for Kaplan-Meier-based
nearest neighbor and kernel survival probability estimators where feature
vectors reside in metric spaces. Our bounds imply rates of strong consistency
for these nonparametric estimators and, up to a log factor, match an existing
lower bound for conditional CDF estimation. Our proof strategy also yields
nonasymptotic guarantees for nearest neighbor and kernel variants of the
Nelson-Aalen cumulative hazards estimator. We experimentally compare these
methods on four datasets. We find that for the kernel survival estimator, a
good choice of kernel is one learned using random survival forests.Comment: International Conference on Machine Learning (ICML 2019
A Novel Clustering Algorithm Based on Quantum Games
Enormous successes have been made by quantum algorithms during the last
decade. In this paper, we combine the quantum game with the problem of data
clustering, and then develop a quantum-game-based clustering algorithm, in
which data points in a dataset are considered as players who can make decisions
and implement quantum strategies in quantum games. After each round of a
quantum game, each player's expected payoff is calculated. Later, he uses a
link-removing-and-rewiring (LRR) function to change his neighbors and adjust
the strength of links connecting to them in order to maximize his payoff.
Further, algorithms are discussed and analyzed in two cases of strategies, two
payoff matrixes and two LRR functions. Consequently, the simulation results
have demonstrated that data points in datasets are clustered reasonably and
efficiently, and the clustering algorithms have fast rates of convergence.
Moreover, the comparison with other algorithms also provides an indication of
the effectiveness of the proposed approach.Comment: 19 pages, 5 figures, 5 table
Methods to integrate a language model with semantic information for a word prediction component
Most current word prediction systems make use of n-gram language models (LM)
to estimate the probability of the following word in a phrase. In the past
years there have been many attempts to enrich such language models with further
syntactic or semantic information. We want to explore the predictive powers of
Latent Semantic Analysis (LSA), a method that has been shown to provide
reliable information on long-distance semantic dependencies between words in a
context. We present and evaluate here several methods that integrate LSA-based
information with a standard language model: a semantic cache, partial
reranking, and different forms of interpolation. We found that all methods show
significant improvements, compared to the 4-gram baseline, and most of them to
a simple cache model as well.Comment: 10 pages ; EMNLP'2007 Conference (Prague
Feature Selection and Weighting by Nearest Neighbor Ensembles
In the field of statistical discrimination nearest neighbor methods are a well known, quite simple but successful nonparametric classification tool. In higher dimensions, however, predictive power normally deteriorates. In general, if some covariates are assumed to be noise variables, variable selection is a promising approach. The paper’s main focus is on the development and evaluation of a nearest neighbor ensemble with implicit variable selection. In contrast to other nearest neighbor approaches we are not primarily interested in classification, but in estimating the (posterior) class probabilities. In simulation studies and for real world data the proposed nearest neighbor ensemble is compared to an extended forward/backward variable selection procedure for nearest neighbor classifiers, and some alternative well established classification tools (that offer probability estimates as well). Despite its simple structure, the proposed method’s performance is quite good - especially if relevant covariates can be separated from noise variables. Another advantage of the presented ensemble is the easy identification of interactions that are usually hard to detect. So not simply variable selection but rather some kind of feature selection is performed.
The paper is a preprint of an article published in Chemometrics and Intelligent Laboratory Systems. Please use the journal version for citation
- …