7 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Reputation-based maintenance in case-based reasoning

    Get PDF
    Case Base Maintenance algorithms update the contents of a case base in order to improve case-based reasoner performance. In this paper, we introduce a new case base maintenance method called Reputation-Based Maintenance (RBM) with the aim of increasing the classification accuracy of a Case-Based Reasoning system while reducing the size of its case base. The proposed RBM algorithm calculates a case property called Reputationfor each member of the case base, the value of which reflects the competence of the related case. Based on this case property, several removal policies and maintenance methods have been designed, each focusing on different aspects of the case base maintenance. The performance of the RBM method was compared with well-known state-of-the-art algorithms. The tests were performed on 30 datasets selected from the UCI repository. The results show that the RBM method in all its variations achieves greater accuracy than a baseline CBR, while some variations significantly outperform the state-of-the-art methods. We particularly highlight theRBM_ACBR algorithm, which achieves the highest accuracy among the methods in the comparison to a statistically significant degree, and the RBMcr algorithm, which increases the baseline accuracy while removing, on average, over half of the case basehis work has been partially supported by the SpanishMinistry of Science and Innovation with project MISMIS-LANGUAGE (grantnumber PGC2018-096212-B-C33), by the Catalan Agency of University andResearch Grants Management (AGAUR) (grants number 2017 SGR 341 and 2017SGR 574), by Spanish Network ‘‘Learning Machines for Singular Problems andApplications (MAPAS)’’ (TIN2017-90567-REDT, MINECO/FEDER EU) and by theEuropean Union’s Horizon 2020 research and innovation programme under theMarie Sklodowska-Curie grant agreement No. 860843Peer ReviewedPostprint (author's final draft

    Profiling Instances in Noise Reduction

    Get PDF
    The dependency on the quality of the training data has led to significant work in noise reduction for instance-based learning algorithms. This paper presents an empirical evaluation of current noise reduction techniques, not just from the perspective of their comparative performance, but from the perspective of investigating the types of instances that they focus on for re- moval. A novel instance profiling technique known as RDCL profiling allows the structure of a training set to be analysed at the instance level cate- gorising each instance based on modelling their local competence properties. This profiling approach o↵ers the opportunity of investigating the types of instances removed by the noise reduction techniques that are currently in use in instance-based learning. The paper also considers the e↵ect of removing instances with specific profiles from a dataset and shows that a very simple approach of removing instances that are misclassified by the training set and cause other instances in the dataset to be misclassified is an e↵ective noise reduction technique

    k-Nearest Neighbour Classifiers - A Tutorial

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.This paper is the second edition of a paper previously published as a technical report . Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods

    Reducing the Memory Size of a Fuzzy Case-Based Reasoning System Applying Rough Set Techniques

    Get PDF
    Early work on case-based reasoning (CBR) reported in the literature shows the importance of soft computing techniques applied to different stages of the classical four-step CBR life cycle. This correspondence proposes a reduction technique based on rough sets theory capable of minimizing the case memory by analyzing the contribution of each case feature. Inspired by the application of the minimum description length principle, the method uses the granularity of the original data to compute the relevance of each attribute. The rough feature weighting and selection method is applied as a preprocessing step prior to the generation of a fuzzy rule system, which is employed in the revision phase of the proposed CBR system. Experiments using real oceanographic data show that the rough sets reduction method maintains the accuracy of the employed fuzzy rules, while reducing the computational effort needed in its generation and increasing the explanatory strength of the fuzzy rules

    Retrieval, reuse, revision and retention in case-based reasoning

    Get PDF
    El original está disponible en www.journals.cambridge.orgCase-based reasoning (CBR) is an approach to problem solving that emphasizes the role of prior experience during future problem solving (i.e., new problems are solved by reusing and if necessary adapting the solutions to similar problems that were solved in the past). It has enjoyed considerable success in a wide variety of problem solving tasks and domains. Following a brief overview of the traditional problem-solving cycle in CBR, we examine the cognitive science foundations of CBR and its relationship to analogical reasoning. We then review a representative selection of CBR research in the past few decades on aspects of retrieval, reuse, revision, and retention.Peer reviewe

    A COLLABORATIVE FILTERING APPROACH TO PREDICT WEB PAGES OF INTEREST FROMNAVIGATION PATTERNS OF PAST USERS WITHIN AN ACADEMIC WEBSITE

    Get PDF
    This dissertation is a simulation study of factors and techniques involved in designing hyperlink recommender systems that recommend to users, web pages that past users with similar navigation behaviors found interesting. The methodology involves identification of pertinent factors or techniques, and for each one, addresses the following questions: (a) room for improvement; (b) better approach, if any; and (c) performance characteristics of the technique in environments that hyperlink recommender systems operate in. The following four problems are addressed:Web Page Classification. A new metric (PageRank × Inverse Links-to-Word count ratio) is proposed for classifying web pages as content or navigation, to help in the discovery of user navigation behaviors from web user access logs. Results of a small user study suggest that this metric leads to desirable results.Data Mining. A new apriori algorithm for mining association rules from large databases is proposed. The new algorithm addresses the problem of scaling of the classical apriori algorithm by eliminating an expensive joinstep, and applying the apriori property to every row of the database. In this study, association rules show the correlation relationships between user navigation behaviors and web pages they find interesting. The new algorithm has better space complexity than the classical one, and better time efficiency under some conditionsand comparable time efficiency under other conditions.Prediction Models for User Interests. We demonstrate that association rules that show the correlation relationships between user navigation patterns and web pages they find interesting can be transformed intocollaborative filtering data. We investigate collaborative filtering prediction models based on two approaches for computing prediction scores: using simple averages and weighted averages. Our findings suggest that theweighted averages scheme more accurately computes predictions of user interests than the simple averages scheme does.Clustering. Clustering techniques are frequently applied in the design of personalization systems. We studied the performance of the CLARANS clustering algorithm in high dimensional space in relation to the PAM and CLARA clustering algorithms. While CLARA had the best time performance, CLARANS resulted in clusterswith the lowest intra-cluster dissimilarities, and so was most effective in this regard
    corecore