1,324 research outputs found

    Estudio de métodos de construcción de ensembles de clasificadores y aplicaciones

    Get PDF
    La inteligencia artificial se dedica a la creación de sistemas informáticos con un comportamiento inteligente. Dentro de este área el aprendizaje computacional estudia la creación de sistemas que aprenden por sí mismos. Un tipo de aprendizaje computacional es el aprendizaje supervisado, en el cual, se le proporcionan al sistema tanto las entradas como la salida esperada y el sistema aprende a partir de estos datos. Un sistema de este tipo se denomina clasificador. En ocasiones ocurre, que en el conjunto de ejemplos que utiliza el sistema para aprender, el número de ejemplos de un tipo es mucho mayor que el número de ejemplos de otro tipo. Cuando esto ocurre se habla de conjuntos desequilibrados. La combinación de varios clasificadores es lo que se denomina "ensemble", y a menudo ofrece mejores resultados que cualquiera de los miembros que lo forman. Una de las claves para el buen funcionamiento de los ensembles es la diversidad. Esta tesis, se centra en el desarrollo de nuevos algoritmos de construcción de ensembles, centrados en técnicas de incremento de la diversidad y en los problemas desequilibrados. Adicionalmente, se aplican estas técnicas a la solución de varias problemas industriales.Ministerio de Economía y Competitividad, proyecto TIN-2011-2404

    On Time Series Classification with Dictionary-Based Classifiers

    Get PDF
    A family of algorithms for time series classification (TSC) involve running a sliding window across each series, discretising the window to form a word, forming a histogram of word counts over the dictionary, then constructing a classifier on the histograms. A recent evaluation of two of this type of algorithm, Bag of Patterns (BOP) and Bag of Symbolic Fourier Approximation Symbols (BOSS) found a significant difference in accuracy between these seemingly similar algorithms. We investigate this phenomenon by deconstructing the classifiers and measuring the relative importance of the four key components between BOP and BOSS. We find that whilst ensembling is a key component for both algorithms, the effect of the other components is mixed and more complex. We conclude that BOSS represents the state of the art for dictionary-based TSC. Both BOP and BOSS can be classed as bag of words approaches. These are particularly popular in Computer Vision for tasks such as image classification. We adapt three techniques used in Computer Vision for TSC: Scale Invariant Feature Transform; Spatial Pyramids; and Histogram Intersection. We find that using Spatial Pyramids in conjunction with BOSS (SP) produces a significantly more accurate classifier. SP is significantly more accurate than standard benchmarks and the original BOSS algorithm. It is not significantly worse than the best shapelet-based or deep learning approaches, and is only outperformed by an ensemble that includes BOSS as a constituent module

    Ant system-based feature set partitioning algorithm for K-NN and LDA ensembles construction

    Get PDF
    Combination of several classifiers has been very useful in improving the prediction accuracy and in most situations multiple classifiers perform better than single classifier.However not all combining approaches are successful at producing multiple classifiers with good classification accuracy because there is no standard resolution in constructing diverse and accurate classifier ensemble.This paper proposes ant system-based feature set partitioning algorithm in constructing k-nearest neighbor (k-NN) and linear discriminant analysis (LDA) ensembles. Experiments were performed on several University California, Irvine datasets to test the performance of the proposed algorithm.Experimental results showed that the proposed algorithm has successfully constructed better classifier ensemble for k-NN and LDA

    Designing multiple classifier combinations a survey

    Get PDF
    Classification accuracy can be improved through multiple classifier approach. It has been proven that multiple classifier combinations can successfully obtain better classification accuracy than using a single classifier. There are two main problems in designing a multiple classifier combination which are determining the classifier ensemble and combiner construction. This paper reviews approaches in constructing the classifier ensemble and combiner. For each approach, methods have been reviewed and their advantages and disadvantages have been highlighted. A random strategy and majority voting are the most commonly used to construct the ensemble and combiner, respectively. The results presented in this review are expected to be a road map in designing multiple classifier combinations

    Banzhaf random forests: Cooperative game theory based random forests with consistency.

    Get PDF
    Random forests algorithms have been widely used in many classification and regression applications. However, the theory of random forests lags far behind their applications. In this paper, we propose a novel random forests classification algorithm based on cooperative game theory. The Banzhaf power index is employed to evaluate the power of each feature by traversing possible feature coalitions. Hence, we call the proposed algorithm Banzhaf random forests (BRFs). Unlike the previously used information gain ratio, which only measures the power of each feature for classification and pays less attention to the intrinsic structure of the feature variables, the Banzhaf power index can measure the importance of each feature by computing the dependency among the group of features. More importantly, we have proved the consistency of BRFs, which narrows the gap between the theory and applications of random forests. Extensive experiments on several UCI benchmark data sets and three real world applications show that BRFs perform significantly better than existing consistent random forests on classification accuracy, and better than or at least comparable with Breiman’s random forests, support vector machines (SVMs) and k-nearest neighbors (KNNs) classifiers

    A Comparative Analysis of Data Mining Techniques on Breast Cancer Diagnosis Data using WEKA Toolbox

    Get PDF
    Abstract—Breast cancer is considered the second most common cancer in women compared to all other cancers. It is fatal in less than half of all cases and is the main cause of mortality in women. It accounts for 16% of all cancer mortalities worldwide. Early diagnosis of breast cancer increases the chance of recovery. Data mining techniques can be utilized in the early diagnosis of breast cancer. In this paper, an academic experimental breast cancer dataset is used to perform a data mining practical experiment using the Waikato Environment for Knowledge Analysis (WEKA) tool. The WEKA Java application represents a rich resource for conducting performance metrics during the execution of experiments. Pre-processing and feature extraction are used to optimize the data. The classification process used in this study was summarized through thirteen experiments. Additionally, 10 experiments using various different classification algorithms were conducted. The introduced algorithms were: Naïve Bayes, Logistic Regression, Lazy IBK (Instance-Bases learning with parameter K), Lazy Kstar, Lazy Locally Weighted Learner, Rules ZeroR, Decision Stump, Decision Trees J48, Random Forest and Random Trees. The process of producing a predictive model was automated with the use of classification accuracy. Further, several experiments on classification of Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer, were conducted to compare the success rates of the different methods. Results conclude that Lazy IBK classifier k-NN can achieve 98% accuracy among other classifiers. The main advantages of the study were the compactness of using 13 different data mining models and 10 different performance measurements, and plotting figures of classifications errors
    • …
    corecore