3,232 research outputs found

    Comparison of SVM and some older classification algorithms in text classification tasks

    Get PDF
    Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM We have decided to investigate this issue and compared SVM to kNN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with kNN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.IFIP International Conference on Artificial Intelligence in Theory and Practice - Knowledge Acquisition and Data MiningRed de Universidades con Carreras en Informática (RedUNCI

    Fuzzy Least Squares Twin Support Vector Machines

    Full text link
    Least Squares Twin Support Vector Machine (LST-SVM) has been shown to be an efficient and fast algorithm for binary classification. It combines the operating principles of Least Squares SVM (LS-SVM) and Twin SVM (T-SVM); it constructs two non-parallel hyperplanes (as in T-SVM) by solving two systems of linear equations (as in LS-SVM). Despite its efficiency, LST-SVM is still unable to cope with two features of real-world problems. First, in many real-world applications, labels of samples are not deterministic; they come naturally with their associated membership degrees. Second, samples in real-world applications may not be equally important and their importance degrees affect the classification. In this paper, we propose Fuzzy LST-SVM (FLST-SVM) to deal with these two characteristics of real-world data. Two models are introduced for FLST-SVM: the first model builds up crisp hyperplanes using training samples and their corresponding membership degrees. The second model, on the other hand, constructs fuzzy hyperplanes using training samples and their membership degrees. Numerical evaluation of the proposed method with synthetic and real datasets demonstrate significant improvement in the classification accuracy of FLST-SVM when compared to well-known existing versions of SVM

    Support Vector Machines for Credit Scoring and discovery of significant features

    Get PDF
    The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1

    ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution

    Full text link
    Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using machine learning (ML) techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.Comment: Final journal version, with some minor technical corrections. Extended version of arXiv:1508.0601
    • …
    corecore