17,351 research outputs found

    Covering rough sets based on neighborhoods: An approach without using neighborhoods

    Get PDF
    Rough set theory, a mathematical tool to deal with inexact or uncertain knowledge in information systems, has originally described the indiscernibility of elements by equivalence relations. Covering rough sets are a natural extension of classical rough sets by relaxing the partitions arising from equivalence relations to coverings. Recently, some topological concepts such as neighborhood have been applied to covering rough sets. In this paper, we further investigate the covering rough sets based on neighborhoods by approximation operations. We show that the upper approximation based on neighborhoods can be defined equivalently without using neighborhoods. To analyze the coverings themselves, we introduce unary and composition operations on coverings. A notion of homomorphismis provided to relate two covering approximation spaces. We also examine the properties of approximations preserved by the operations and homomorphisms, respectively.Comment: 13 pages; to appear in International Journal of Approximate Reasonin

    Analysis of approximate nearest neighbor searching with clustered point sets

    Full text link
    We present an empirical analysis of data structures for approximate nearest neighbor searching. We compare the well-known optimized kd-tree splitting method against two alternative splitting methods. The first, called the sliding-midpoint method, which attempts to balance the goals of producing subdivision cells of bounded aspect ratio, while not producing any empty cells. The second, called the minimum-ambiguity method is a query-based approach. In addition to the data points, it is also given a training set of query points for preprocessing. It employs a simple greedy algorithm to select the splitting plane that minimizes the average amount of ambiguity in the choice of the nearest neighbor for the training points. We provide an empirical analysis comparing these two methods against the optimized kd-tree construction for a number of synthetically generated data and query sets. We demonstrate that for clustered data and query sets, these algorithms can provide significant improvements over the standard kd-tree construction for approximate nearest neighbor searching.Comment: 20 pages, 8 figures. Presented at ALENEX '99, Baltimore, MD, Jan 15-16, 199

    Robust Learning from Bites for Data Mining

    Get PDF
    Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets. --Breakdown point,convex risk minimization,data mining,distributed computing,influence function,logistic regression,robustness,scalability
    corecore