Search CORE

20,570 research outputs found

Analysis of approximate nearest neighbor searching with clustered point sets

Author: Maneewongvatana Songrit
Mount David M.
Publication venue
Publication date: 01/01/1999
Field of study

We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan 15-16, 199

arXiv.org e-Print Archive

CiteSeerX

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

Author: Franceschinell Rodrigo A.
Wainer Jacques
Publication venue
Publication date: 16/10/2018
Field of study

This research tested the following well known strategies to deal with binary imbalanced data on 82 different real life data sets (sampled to imbalance rates of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline (just the base classifier). As base classifiers we used SVM with RBF kernel, random forests, and gradient boosting machines and we measured the quality of the resulting classifier using 6 different metrics (Area under the curve, Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced accuracy). The best strategy strongly depends on the metric used to measure the quality of the classifier. For AUC and accuracy class weight and the baseline perform better; for F-measure and MCC, SMOTE performs better; and for G-mean and balanced accuracy, underbagging

arXiv.org e-Print Archive

Experimentation in Psychology--Rationale, Concepts and Issues

Author: Chow Siu L.
Publication venue
Publication date: 01/01/2002
Field of study

An experiment is made up of two or more data-collection conditons that are identical in all aspects, but one. It owes its design to an inductive principle and its hypothesis to deductive logic. It is the most suited for corroborating explanatory theries , ascertaining functional relationship, or assessing the substantive effectiveness of a manipulation. Also discussed are (a) the three meanings of 'control,' (b) the issue of ecological validity, (c) the distinction between theory-corroboration and agricultural-model experiments, and (d) the distinction among the hypotheses at four levels of abstraction that are implicit in an experiment

CogPrints Cognitive Sciences Eprint Archive

Soil nitrogen affects phosphorus recycling: foliar resorption and plant–soil feedbacks in a northern hardwood forest

Author: Fahey Timothy J
Fisk Melany C
Quintero Braulio A.
See Craig R.
Vadeboncoeur Matthew A
Yanai Ruth D
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/09/2015
Field of study

Previous studies have attempted to link foliar resorption of nitrogen and phosphorus to their respective availabilities in soil, with mixed results. Based on resource optimization theory, we hypothesized that the foliar resorption of one element could be driven by the availability of another element. We tested various measures of soil N and P as predictors of N and P resorption in six tree species in 18 plots across six stands at the Bartlett Experimental Forest, New Hampshire, USA. Phosphorus resorption efficiency (P , 0.01) and proficiency (P ¼ 0.01) increased with soil N content to 30 cm depth, suggesting that trees conserve P based on the availability of soil N. Phosphorus resorption also increased with soil P content, which is difficult to explain based on single-element limitation, but follows from the correlation between soil N and soil P. The expected single-element relationships were evident only in the O horizon: P resorption was high where resin-available P was low in the Oe (P , 0.01 for efficiency, P , 0.001 for proficiency) and N resorption was high where potential N mineralization in the Oa was low (P , 0.01 for efficiency and 0.11 for proficiency). Since leaf litter is a principal source of N and P to the O horizon, low nutrient availability there could be a result rather than a cause of high resorption. The striking effect of soil N content on foliar P resorption is the first evidence of multiple-element control on nutrient resorption to be reported from an unmanipulated ecosystem

UNH Scholars' Repository

Deforestation

Author: G. Cornelis van Kooten
Henk Folmer
Publication venue
Publication date
Field of study