20,570 research outputs found
Analysis of approximate nearest neighbor searching with clustered point sets
We present an empirical analysis of data structures for approximate nearest
neighbor searching. We compare the well-known optimized kd-tree splitting
method against two alternative splitting methods. The first, called the
sliding-midpoint method, which attempts to balance the goals of producing
subdivision cells of bounded aspect ratio, while not producing any empty cells.
The second, called the minimum-ambiguity method is a query-based approach. In
addition to the data points, it is also given a training set of query points
for preprocessing. It employs a simple greedy algorithm to select the splitting
plane that minimizes the average amount of ambiguity in the choice of the
nearest neighbor for the training points. We provide an empirical analysis
comparing these two methods against the optimized kd-tree construction for a
number of synthetically generated data and query sets. We demonstrate that for
clustered data and query sets, these algorithms can provide significant
improvements over the standard kd-tree construction for approximate nearest
neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan
15-16, 199
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
Experimentation in Psychology--Rationale, Concepts and Issues
An experiment is made up of two or more data-collection conditons that are identical in all aspects, but one. It owes its design to an inductive principle and its hypothesis to deductive logic. It is the most suited for corroborating explanatory theries , ascertaining functional relationship, or assessing the substantive effectiveness of a manipulation. Also discussed are (a) the three meanings of 'control,' (b) the issue of ecological validity, (c) the distinction between theory-corroboration and agricultural-model experiments, and (d) the distinction among the hypotheses at four levels of abstraction that are implicit in an experiment
Soil nitrogen affects phosphorus recycling: foliar resorption and plant–soil feedbacks in a northern hardwood forest
Previous studies have attempted to link foliar resorption of nitrogen and phosphorus to their respective availabilities in soil, with mixed results. Based on resource optimization theory, we hypothesized that the foliar resorption of one element could be driven by the availability of another element. We tested various measures of soil N and P as predictors of N and P resorption in six tree species in 18 plots across six stands at the Bartlett Experimental Forest, New Hampshire, USA. Phosphorus resorption efficiency (P , 0.01) and proficiency (P ¼ 0.01) increased with soil N content to 30 cm depth, suggesting that trees conserve P based on the availability of soil N. Phosphorus resorption also increased with soil P content, which is difficult to explain based on single-element limitation, but follows from the correlation between soil N and soil P. The expected single-element relationships were evident only in the O horizon: P resorption was high where resin-available P was low in the Oe (P , 0.01 for efficiency, P , 0.001 for proficiency) and N resorption was high where potential N mineralization in the Oa was low (P , 0.01 for efficiency and 0.11 for proficiency). Since leaf litter is a principal source of N and P to the O horizon, low nutrient availability there could be a result rather than a cause of high resorption. The striking effect of soil N content on foliar P resorption is the first evidence of multiple-element control on nutrient resorption to be reported from an unmanipulated ecosystem
- …