Search CORE

2,777 research outputs found

Comments on "On Approximating Euclidean Metrics by Weighted t-Cost Distances in Arbitrary Dimension"

Author: Barni
Borgefors
Celebi
Chaudhuri
Das
Das
Das
Fatih Celiker
Fouard
Hassan A. Kingravi
M. Emre Celebi
Mukherjee
Muller
Ohashi
Rhodes
Rosenfeld
Seol
Verwer
Publication venue: 'Elsevier BV'
Publication date: 10/06/2012
Field of study

Mukherjee (Pattern Recognition Letters, vol. 32, pp. 824-831, 2011) recently introduced a class of distance functions called weighted t-cost distances that generalize m-neighbor, octagonal, and t-cost distances. He proved that weighted t-cost distances form a family of metrics and derived an approximation for the Euclidean norm in

\mathbb{Z}^n

. In this note we compare this approximation to two previously proposed Euclidean norm approximations and demonstrate that the empirical average errors given by Mukherjee are significantly optimistic in

\mathbb{R}^n

. We also propose a simple normalization scheme that improves the accuracy of his approximation substantially with respect to both average and maximum relative errors.Comment: 7 pages, 1 figure, 3 tables. arXiv admin note: substantial text overlap with arXiv:1008.487

arXiv.org e-Print Archive

Crossref

On Euclidean Norm Approximations

Author: Celebi M. Emre
Celiker Fatih
Kingravi Hassan A.
Publication venue
Publication date: 28/08/2010
Field of study

Euclidean norm calculations arise frequently in scientific and engineering applications. Several approximations for this norm with differing complexity and accuracy have been proposed in the literature. Earlier approaches were based on minimizing the maximum error. Recently, Seol and Cheun proposed an approximation based on minimizing the average error. In this paper, we first examine these approximations in detail, show that they fit into a single mathematical formulation, and compare their average and maximum errors. We then show that the maximum errors given by Seol and Cheun are significantly optimistic.Comment: 9 pages, 1 figure, Pattern Recognitio

arXiv.org e-Print Archive

CiteSeerX

Subsampling Algorithms for Semidefinite Programming

Author: d'Aspremont Alexandre
Publication venue
Publication date: 01/01/2011
Field of study

We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls granularity, i.e. the tradeoff between cost per iteration and total number of iterations. Furthermore, the total computational cost is directly proportional to the complexity (i.e. rank) of the solution. We study numerical performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Robust Methods for High-Dimensional Linear Learning

Author: Gaïffas Stéphane
Merad Ibrahim
Publication venue
Publication date: 29/05/2023
Field of study

We propose statistically robust and computationally efficient linear learning methods in the high-dimensional batch setting, where the number of features

d

may exceed the sample size

n

. We employ, in a generic learning setting, two algorithms depending on whether the considered loss function is gradient-Lipschitz or not. Then, we instantiate our framework on several applications including vanilla sparse, group-sparse and low-rank matrix recovery. This leads, for each application, to efficient and robust learning algorithms, that reach near-optimal estimation rates under heavy-tailed distributions and the presence of outliers. For vanilla

s

-sparsity, we are able to reach the

s\log (d)/n

rate under heavy-tails and

\eta

-corruption, at a computational cost comparable to that of non-robust analogs. We provide an efficient implementation of our algorithms in an open-source

\mathtt{Python}

library called

\mathtt{linlearn}

, by means of which we carry out numerical experiments which confirm our theoretical findings together with a comparison to other recent approaches proposed in the literature.Comment: accepted versio

arXiv.org e-Print Archive

k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

Author: Cunningham Padraig
Delany Sarah Jane
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/04/2020
Field of study

Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

arXiv.org e-Print Archive

Arrow@TUDublin