18 research outputs found

    Neighborhood Counting Measure Metric and Minimum Risk Metric: An empirical comparison

    Get PDF
    Wang in a PAMI paper proposed Neighborhood Counting Measure (NCM) as a similarity measure for the k-nearest neighbors classification algorithm. In his paper, Wang mentioned Minimum Risk Metric (MRM) an earlier method based on the minimization of the risk of misclassification. However, Wang did not compare NCM with MRM because of its allegedly excessive computational load. In this letter, we empirically compare NCM against MRM on k-NN with k=1, 3, 5, 7 and 11 with decision taken with a voting scheme and k=21 with decision taken with a weighted voting scheme on the same datasets used by Wang. Our results shows that MRM outperforms NCM for most of the k values tested. Moreover, we show that the MRM computation is not so probihibitive as indicated by Wang. ©2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE

    A Review on Advanced Decision Trees for Efficient & Effective k-NN Classification

    Get PDF
    K Nearest Neighbor (KNN) strategy is a notable classification strategy in data mining and estimations in light of its direct execution and colossal arrangement execution. In any case, it is outlandish for ordinary KNN strategies to select settled k esteem to all tests. Past courses of action assign different k esteems to different test tests by the cross endorsement strategy however are typically tedious. This work proposes new KNN strategies, first is a KTree strategy to learn unique k esteems for different test or new cases, by including a training arrange in the KNN classification. This work additionally proposes a change rendition of KTree technique called K*Tree to speed its test organize by putting additional data of the training tests in the leaf node of KTree, for example, the training tests situated in the leaf node, their KNNs, and the closest neighbor of these KNNs. K*Tree, which empowers to lead KNN arrangement utilizing a subset of the training tests in the leaf node instead of all training tests utilized in the recently KNN techniques. This really reduces the cost of test organize

    A MapReduce-based nearest neighbor approach for big-data-driven traffic flow prediction

    Full text link
    In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k-nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup

    Port throughput influence factors based on neighborhood rough sets: an exploratory study

    Get PDF
    Purpose: The purpose of this paper is to devise a efficient method for the importance analysis on Port Throughput Influence Factors. Design/methodology/approach: Neighborhood rough sets is applied to solve the problem of selection factors. First the throughput index system is established. Then, we build the attribute reduction model using the updated numerical attribute to reduction algorithm based on neighborhood rough sets. We optimized the algorithm in order to achieve high efficiency performance. Finally, the article do empirical validation using Guangzhou Port throughput and influencing factors’ historical data of year 2000 to 2013. Findings: Through the model and algorithm, port enterprises can identify the importance of port throughput factors. It can provide support for their decisions. Research limitations: The empirical data are historical data of year 2000 to 2013. The amount of data is small. Practical implications: The results provide support for port business investment, decisions and risk control, and also provide assistance for port enterprises’ or other researchers’ throughput forecasting. Originality/value: In this paper, we establish a throughput index system, and optimize the algorithm for efficiency performance.Peer Reviewe

    Computer-aided diagnosis in neonatal lambs

    Get PDF
    In our country, the number of small ruminant animals is decreasing day by day due to various reasons. In parallel with the decrease in the number of small ruminants, significant decreases are seen in animal production. One way to prevent the reduction in the number of small ruminants is to be able to make successful predictions and analysis related to the diagnosis. Thanks to computer-aided diagnostic studies performed with machine learning, the quality of health services increases while the costs of the health sector decrease. The aim of this study is to perform computer aided diagnosis in neonatal lambs using machine learning methods. Hence in study, decision tree, naive bayes, k-nearest neighbors, artificial neural networks and random forest methods were used. The performances of these classification methods were analyzed with accuracy, balanced accuracy, specifity, recall, F-measure, kappa and area under the ROC curve (AUC) criteria. As a result of the study, the Naive bayes method more successful results than other methods for computer aided diagnosis produced. It is very important that, the Naive bayes method is simple and easy to apply, achieves more successful results than other complex methods

    About Neighborhood Counting Measure Metric and Minimum Risk Metric

    Full text link
    corecore