2,777 research outputs found
Comments on "On Approximating Euclidean Metrics by Weighted t-Cost Distances in Arbitrary Dimension"
Mukherjee (Pattern Recognition Letters, vol. 32, pp. 824-831, 2011) recently
introduced a class of distance functions called weighted t-cost distances that
generalize m-neighbor, octagonal, and t-cost distances. He proved that weighted
t-cost distances form a family of metrics and derived an approximation for the
Euclidean norm in . In this note we compare this approximation to
two previously proposed Euclidean norm approximations and demonstrate that the
empirical average errors given by Mukherjee are significantly optimistic in
. We also propose a simple normalization scheme that improves the
accuracy of his approximation substantially with respect to both average and
maximum relative errors.Comment: 7 pages, 1 figure, 3 tables. arXiv admin note: substantial text
overlap with arXiv:1008.487
On Euclidean Norm Approximations
Euclidean norm calculations arise frequently in scientific and engineering
applications. Several approximations for this norm with differing complexity
and accuracy have been proposed in the literature. Earlier approaches were
based on minimizing the maximum error. Recently, Seol and Cheun proposed an
approximation based on minimizing the average error. In this paper, we first
examine these approximations in detail, show that they fit into a single
mathematical formulation, and compare their average and maximum errors. We then
show that the maximum errors given by Seol and Cheun are significantly
optimistic.Comment: 9 pages, 1 figure, Pattern Recognitio
Subsampling Algorithms for Semidefinite Programming
We derive a stochastic gradient algorithm for semidefinite optimization using
randomization techniques. The algorithm uses subsampling to reduce the
computational cost of each iteration and the subsampling ratio explicitly
controls granularity, i.e. the tradeoff between cost per iteration and total
number of iterations. Furthermore, the total computational cost is directly
proportional to the complexity (i.e. rank) of the solution. We study numerical
performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System
Robust Methods for High-Dimensional Linear Learning
We propose statistically robust and computationally efficient linear learning
methods in the high-dimensional batch setting, where the number of features
may exceed the sample size . We employ, in a generic learning setting, two
algorithms depending on whether the considered loss function is
gradient-Lipschitz or not. Then, we instantiate our framework on several
applications including vanilla sparse, group-sparse and low-rank matrix
recovery. This leads, for each application, to efficient and robust learning
algorithms, that reach near-optimal estimation rates under heavy-tailed
distributions and the presence of outliers. For vanilla -sparsity, we are
able to reach the rate under heavy-tails and -corruption,
at a computational cost comparable to that of non-robust analogs. We provide an
efficient implementation of our algorithms in an open-source
library called , by means of which we carry out numerical
experiments which confirm our theoretical findings together with a comparison
to other recent approaches proposed in the literature.Comment: accepted versio
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …