848 research outputs found
An adaptive nearest neighbor rule for classification
We introduce a variant of the -nearest neighbor classifier in which is
chosen adaptively for each query, rather than supplied as a parameter. The
choice of depends on properties of each neighborhood, and therefore may
significantly vary between different points. (For example, the algorithm will
use larger for predicting the labels of points in noisy regions.)
We provide theory and experiments that demonstrate that the algorithm
performs comparably to, and sometimes better than, -NN with an optimal
choice of . In particular, we derive bounds on the convergence rates of our
classifier that depend on a local quantity we call the `advantage' which is
significantly weaker than the Lipschitz conditions used in previous convergence
rate proofs. These generalization bounds hinge on a variant of the seminal
Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant
concerns conditional probabilities and may be of independent interest
A free central-limit theorem for dynamical systems
The free central-limit theorem, a fundamental theorem in free probability,
states that empirical averages of freely independent random variables are
asymptotically semi-circular. We extend this theorem to general dynamical
systems of operators that we define using a free random variable coupled
with a group of *-automorphims describing the evolution of . We introduce
free mixing coefficients that measure how far a dynamical system is from being
freely independent. Under conditions on those coefficients, we prove that the
free central-limit theorem also holds for these processes and provide
Berry-Essen bounds. We generalize this to triangular arrays and U-statistics.
Finally we draw connections with classical probability and random matrix theory
with a series of examples
Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values
This work is motivated by the needs of predictive analytics on healthcare
data as represented by Electronic Medical Records. Such data is invariably
problematic: noisy, with missing entries, with imbalance in classes of
interests, leading to serious bias in predictive modeling. Since standard data
mining methods often produce poor performance measures, we argue for
development of specialized techniques of data-preprocessing and classification.
In this paper, we propose a new method to simultaneously classify large
datasets and reduce the effects of missing values. It is based on a multilevel
framework of the cost-sensitive SVM and the expected maximization imputation
method for missing values, which relies on iterated regression analyses. We
compare classification results of multilevel SVM-based algorithms on public
benchmark datasets with imbalanced classes and missing values as well as real
data in health applications, and show that our multilevel SVM-based method
produces fast, and more accurate and robust classification results.Comment: arXiv admin note: substantial text overlap with arXiv:1503.0625
Some invariant biorthogonal sets with an application to coherent states
We show how to construct, out of a certain basis invariant under the action
of one or more unitary operators, a second biorthogonal set with similar
properties. In particular, we discuss conditions for this new set to be also a
basis of the Hilbert space, and we apply the procedure to coherent states. We
conclude the paper considering a simple application of our construction to
pseudo-hermitian quantum mechanics.Comment: in press in Journal of Mathematical Analysis and Applications. arXiv
admin note: text overlap with arXiv:0904.088
- …