18,217 research outputs found
Empirical Evaluation of the Difficulty of Finding a Good Value of k for the Nearest Neighbor
As an analysis of the classification accuracy bound for the Nearest Neighbor technique, in this work we have studied if it is possible to find a good value of the parmeter k for each example according to their attribute values. Or at least, if there is a pattern for the parameter k in the original search space. We have carried out different approaches based onthe Nearest Neighbor technique and calculated the prediction accuracy for a group of databases from the UCI repository. Based on the experimental results of our study, we can state that, in general, it is not possible to know a priori a specific value of k to correctly classify an unseen example
Practical and Optimal LSH for Angular Distance
We show the existence of a Locality-Sensitive Hashing (LSH) family for the
angular distance that yields an approximate Near Neighbor Search algorithm with
the asymptotically optimal running time exponent. Unlike earlier algorithms
with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn
2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving
upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also
introduce a multiprobe version of this algorithm, and conduct experimental
evaluation on real and synthetic data sets.
We complement the above positive results with a fine-grained lower bound for
the quality of any LSH family for angular distance. Our lower bound implies
that the above LSH family exhibits a trade-off between evaluation time and
quality that is close to optimal for a natural class of LSH functions.Comment: 22 pages, an extended abstract is to appear in the proceedings of the
29th Annual Conference on Neural Information Processing Systems (NIPS 2015
Efficient Estimation of Mutual Information for Strongly Dependent Variables
We demonstrate that a popular class of nonparametric mutual information (MI)
estimators based on k-nearest-neighbor graphs requires number of samples that
scales exponentially with the true MI. Consequently, accurate estimation of MI
between two strongly dependent variables is possible only for prohibitively
large sample size. This important yet overlooked shortcoming of the existing
estimators is due to their implicit reliance on local uniformity of the
underlying joint distribution. We introduce a new estimator that is robust to
local non-uniformity, works well with limited data, and is able to capture
relationship strengths over many orders of magnitude. We demonstrate the
superior performance of the proposed estimator on both synthetic and real-world
data.Comment: 13 pages, to appear in International Conference on Artificial
Intelligence and Statistics (AISTATS) 201
Recommended from our members
Application of Machine Learning Methods to the Open-Loop Control of a Freeform Fabrication System
Freeform fabrication of complete functional devices requires the fabrication system to achieve well-controlled
deposition of many materials with widely varying material properties. In a research setting, material preparation
processes are not highly refined, causing batch property variation, and cost and time may prohibit accurate
quantification of the relevant material properties, such as viscosity, elasticity, etc. for each batch. Closed-loop
control based on the deposited material road is problematic due to the difficulty in non-contact measurement of the
road geometry, so a labor-intensive calibration and open-loop control method is typically used. In the present work,
k-Nearest Neighbor and Support Vector Machine (SVM) machine learning algorithms are applied to the problem of
generating open-loop control parameters which produce desired deposited material road geometry from a description
of a given material and tool configuration comprising a set of qualitative and quantitative attributes. Training data
for the algorithms is generated in the course of ordinary use of the SFF system as the results of manual calibration of
control parameters. Given the large instance space and the small training data set compiled thus far, the
performance is quite promising, although still insufficient to allow complete automation of the calibration process.
The SVM-based approach produces tolerable results when tested with materials not in the training data set. When
control parameters produced by the learning algorithms are used as a starting point for manual calibration,
significant operator time savings and material waste reduction may be achieved.Mechanical Engineerin
- …