13,048 research outputs found

    An adaptive nearest neighbor rule for classification

    Full text link
    We introduce a variant of the kk-nearest neighbor classifier in which kk is chosen adaptively for each query, rather than supplied as a parameter. The choice of kk depends on properties of each neighborhood, and therefore may significantly vary between different points. (For example, the algorithm will use larger kk for predicting the labels of points in noisy regions.) We provide theory and experiments that demonstrate that the algorithm performs comparably to, and sometimes better than, kk-NN with an optimal choice of kk. In particular, we derive bounds on the convergence rates of our classifier that depend on a local quantity we call the `advantage' which is significantly weaker than the Lipschitz conditions used in previous convergence rate proofs. These generalization bounds hinge on a variant of the seminal Uniform Convergence Theorem due to Vapnik and Chervonenkis; this variant concerns conditional probabilities and may be of independent interest

    Adaptive kNN using Expected Accuracy for Classification of Geo-Spatial Data

    Full text link
    The k-Nearest Neighbor (kNN) classification approach is conceptually simple - yet widely applied since it often performs well in practical applications. However, using a global constant k does not always provide an optimal solution, e.g., for datasets with an irregular density distribution of data points. This paper proposes an adaptive kNN classifier where k is chosen dynamically for each instance (point) to be classified, such that the expected accuracy of classification is maximized. We define the expected accuracy as the accuracy of a set of structurally similar observations. An arbitrary similarity function can be used to find these observations. We introduce and evaluate different similarity functions. For the evaluation, we use five different classification tasks based on geo-spatial data. Each classification task consists of (tens of) thousands of items. We demonstrate, that the presented expected accuracy measures can be a good estimator for kNN performance, and the proposed adaptive kNN classifier outperforms common kNN and previously introduced adaptive kNN algorithms. Also, we show that the range of considered k can be significantly reduced to speed up the algorithm without negative influence on classification accuracy

    Survey of data mining approaches to user modeling for adaptive hypermedia

    Get PDF
    The ability of an adaptive hypermedia system to create tailored environments depends mainly on the amount and accuracy of information stored in each user model. Some of the difficulties that user modeling faces are the amount of data available to create user models, the adequacy of the data, the noise within that data, and the necessity of capturing the imprecise nature of human behavior. Data mining and machine learning techniques have the ability to handle large amounts of data and to process uncertainty. These characteristics make these techniques suitable for automatic generation of user models that simulate human decision making. This paper surveys different data mining techniques that can be used to efficiently and accurately capture user behavior. The paper also presents guidelines that show which techniques may be used more efficiently according to the task implemented by the applicatio

    Detection of Mines in Acoustic Images using Higher Order Spectral Features

    Get PDF
    A new pattern-recognition algorithm detects approximately 90% of the mines hidden in the Coastal Systems Station Sonar0, 1, and 3 databases of cluttered acoustic images, with about 10% false alarms. Similar to other approaches, the algorithm presented here includes processing the images with an adaptive Wiener filter (the degree of smoothing depends on the signal strength in a local neighborhood) to remove noise without destroying the structural information in the mine shapes, followed by a two-dimensional FIR filter designed to suppress noise and clutter, while enhancing the target signature. A double peak pattern is produced as the FIR filter passes over mine highlight and shadow regions. Although the location, size, and orientation of this pattern within a region of the image can vary, features derived from higher order spectra (HOS) are invariant to translation, rotation, and scaling, while capturing the spatial correlations of mine-like objects. Classification accuracy is improved by combining features based on geometrical properties of the filter output with features based on HOS. The highest accuracy is obtained by fusing classification based on bispectral features with classification based on trispectral features

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

    Futility Analysis in the Cross-Validation of Machine Learning Models

    Full text link
    Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.Comment: 22 pages, 5 figure
    corecore