4,077 research outputs found

    An Interval Valued K-Nearest Neighbors Classifier

    Get PDF
    The K-Nearest Neighbors (k-NN) classifier has become a well-known, successful method for pattern classification tasks. In recent years, many enhancements to the original algorithm have been proposed. Fuzzy sets theory has been the basis of several proposed models towards the enhancement of the nearest neighbors rule, being the Fuzzy K-Nearest Neighbors (FuzzyKNN) classifier the most notable procedure in the field. In this work we present a new approach to the nearest neighbor classifier based on the use of interval valued fuzzy sets. The use and implementation of interval values facilitates the membership of the instances and the computation of the votes in a more flexible way than the original FuzzyKNN method, thus improving its adaptability to different supervised learning problems. An experimental study, contrasted by the application of nonparametric statistical procedures, is carried out to ascertain whether the Interval Valued K-Nearest Neighbor (IV-KNN) classifier proposed here is significantly more accurate than k-NN, FuzzyKNN and other fuzzy nearest neighbor classifiers. We conclude that the IV-KNN is indeed significantly more accurate than the rest of classifiers analyzed

    A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining

    Full text link
    Big data comes in various ways, types, shapes, forms and sizes. Indeed, almost all areas of science, technology, medicine, public health, economics, business, linguistics and social science are bombarded by ever increasing flows of data begging to analyzed efficiently and effectively. In this paper, we propose a rough idea of a possible taxonomy of big data, along with some of the most commonly used tools for handling each particular category of bigness. The dimensionality p of the input space and the sample size n are usually the main ingredients in the characterization of data bigness. The specific statistical machine learning technique used to handle a particular big data set will depend on which category it falls in within the bigness taxonomy. Large p small n data sets for instance require a different set of tools from the large n small p variety. Among other tools, we discuss Preprocessing, Standardization, Imputation, Projection, Regularization, Penalization, Compression, Reduction, Selection, Kernelization, Hybridization, Parallelization, Aggregation, Randomization, Replication, Sequentialization. Indeed, it is important to emphasize right away that the so-called no free lunch theorem applies here, in the sense that there is no universally superior method that outperforms all other methods on all categories of bigness. It is also important to stress the fact that simplicity in the sense of Ockham's razor non plurality principle of parsimony tends to reign supreme when it comes to massive data. We conclude with a comparison of the predictive performance of some of the most commonly used methods on a few data sets.Comment: 18 pages, 2 figures 3 table

    Testing Market Response to Auditor Change Filings: a comparison of machine learning classifiers

    Get PDF
    The use of textual information contained in company filings with the Securities Exchange Commission (SEC), including annual reports on Form 10-K, quarterly reports on Form 10-Q, and current reports on Form 8-K, has gained the increased attention of finance and accounting researchers. In this paper we use a set of machine learning methods to predict the market response to changes in a firm\u27s auditor as reported in public filings. We vectorize the text of 8-K filings to test whether the resulting feature matrix can explain the sign of the market response to the filing. Specifically, using classification algorithms and a sample consisting of the Item 4.01 text of 8-K documents, which provides information on changes in auditors of companies that are registered with the SEC, we predict the sign of the cumulative abnormal return (CAR) around 8-K filing dates. We report the correct classification performance and time efficiency of the classification algorithms. Our results show some improvement over the naĂŻve classification method

    AMPSO: A new Particle Swarm Method for Nearest Neighborhood Classification

    Get PDF
    Nearest prototype methods can be quite successful on many pattern classification problems. In these methods, a collection of prototypes has to be found that accurately represents the input patterns. The classifier then assigns classes based on the nearest prototype in this collection. In this paper, we first use the standard particle swarm optimizer (PSO) algorithm to find those prototypes. Second, we present a new algorithm, called adaptive Michigan PSO (AMPSO) in order to reduce the dimension of the search space and provide more flexibility than the former in this application. AMPSO is based on a different approach to particle swarms as each particle in the swarm represents a single prototype in the solution. The swarm does not converge to a single solution; instead, each particle is a local classifier, and the whole swarm is taken as the solution to the problem. It uses modified PSO equations with both particle competition and cooperation and a dynamic neighborhood. As an additional feature, in AMPSO, the number of prototypes represented in the swarm is able to adapt to the problem, increasing as needed the number of prototypes and classes of the prototypes that make the solution to the problem. We compared the results of the standard PSO and AMPSO in several benchmark problems from the University of California, Irvine, data sets and find that AMPSO always found a better solution than the standard PSO. We also found that it was able to improve the results of the Nearest Neighbor classifiers, and it is also competitive with some of the algorithms most commonly used for classification.This work was supported by the Spanish founded research Project MSTAR::UC3M, Ref: TIN2008-06491-C04-03 and CAM Project CCG06-UC3M/ESP-0774.Publicad

    A Fuzzy k-Nearest Neighbors Classifier to Deal with Imperfect Data

    Get PDF
    © 2018. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ This document is the Accepted version of a Published Work that appeared in final form in Soft Computing. To access the final edited and published work see https://doi.org/10.1007/s00500-017-2567-xThe k-nearest neighbors method (kNN) is a nonparametric, instance-based method used for regression and classification. To classify a new instance, the kNN method computes its k nearest neighbors and generates a class value from them. Usually, this method requires that the information available in the datasets be precise and accurate, except for the existence of missing values. However, data imperfection is inevitable when dealing with real-world scenarios. In this paper, we present the kNNimp classifier, a k-nearest neighbors method to perform classification from datasets with imperfect value. The importance of each neighbor in the output decision is based on relative distance and its degree of imperfection. Furthermore, by using external parameters, the classifier enables us to define the maximum allowed imperfection, and to decide if the final output could be derived solely from the greatest weight class (the best class) or from the best class and a weighted combination of the closest classes to the best one. To test the proposed method, we performed several experiments with both synthetic and realworld datasets with imperfect data. The results, validated through statistical tests, show that the kNNimp classifier is robust when working with imperfect data and maintains a good performance when compared with other methods in the literature, applied to datasets with or without imperfection

    An adaptive Michigan approach PSO for nearest prototype classification

    Get PDF
    Proceedings of: Second International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2007, La Manga del Mar Menor, Spain, June 18-21, 2007.Nearest Prototype methods can be quite successful on many pattern classification problems. In these methods, a collection of prototypes has to be found that accurately represents the input patterns. The classifier then assigns classes based on the nearest prototype in this collection. In this paper we develop a new algorithm (called AMPSO), based on the Particle Swarm Optimization (PSO) algorithm, that can be used to find those prototypes. Each particle in a swarm represents a single prototype in the solution; the swarm evolves using modified PSO equations with both particle competition and cooperation. Experimentation includes an artificial problem and six common application problems from the UCI data sets. The results show that the AMPSO algorithm is able to find solutions with a reduced number of prototypes that classify data with comparable or better accuracy than the 1-NN classifier. The algorithm can also be compared or improves the results of many classical algorithms in each of those problems; and the results show that AMPSO also performs significantly better than any tested algorithm in one of the problems.This article has been financed by the Spanish founded research MEC project OPLINK::UC3M, Ref: TIN2005-08818-C04-02 and CAM project UC3M-TEC-05-029
    • …
    corecore