17,351 research outputs found
Covering rough sets based on neighborhoods: An approach without using neighborhoods
Rough set theory, a mathematical tool to deal with inexact or uncertain
knowledge in information systems, has originally described the indiscernibility
of elements by equivalence relations. Covering rough sets are a natural
extension of classical rough sets by relaxing the partitions arising from
equivalence relations to coverings. Recently, some topological concepts such as
neighborhood have been applied to covering rough sets. In this paper, we
further investigate the covering rough sets based on neighborhoods by
approximation operations. We show that the upper approximation based on
neighborhoods can be defined equivalently without using neighborhoods. To
analyze the coverings themselves, we introduce unary and composition operations
on coverings. A notion of homomorphismis provided to relate two covering
approximation spaces. We also examine the properties of approximations
preserved by the operations and homomorphisms, respectively.Comment: 13 pages; to appear in International Journal of Approximate Reasonin
Analysis of approximate nearest neighbor searching with clustered point sets
We present an empirical analysis of data structures for approximate nearest
neighbor searching. We compare the well-known optimized kd-tree splitting
method against two alternative splitting methods. The first, called the
sliding-midpoint method, which attempts to balance the goals of producing
subdivision cells of bounded aspect ratio, while not producing any empty cells.
The second, called the minimum-ambiguity method is a query-based approach. In
addition to the data points, it is also given a training set of query points
for preprocessing. It employs a simple greedy algorithm to select the splitting
plane that minimizes the average amount of ambiguity in the choice of the
nearest neighbor for the training points. We provide an empirical analysis
comparing these two methods against the optimized kd-tree construction for a
number of synthetically generated data and query sets. We demonstrate that for
clustered data and query sets, these algorithms can provide significant
improvements over the standard kd-tree construction for approximate nearest
neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan
15-16, 199
Robust Learning from Bites for Data Mining
Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets. --Breakdown point,convex risk minimization,data mining,distributed computing,influence function,logistic regression,robustness,scalability
- …