12,620 research outputs found

    Multilabel Prototype Generation for data reduction in K-Nearest Neighbour classification

    Get PDF
    Prototype Generation (PG) methods are typically considered for improving the efficiency of the k-Nearest Neighbour (kNN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel kNN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving—both in terms of efficiency and classification performance—the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works.This research was partially funded by the Spanish Ministerio de Ciencia e Innovación through the MultiScore (PID2020-118447RA-I00) and DOREMI (TED2021-132103A-I00) projects. The first author is supported by grant APOSTD/2020/256 from “Programa I+D+i de la Generalitat Valenciana”

    Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

    Get PDF
    Prototype Generation (PG) methods are typically considered for improving the efficiency of the kk-Nearest Neighbour (kkNN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel kkNN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving -- both in terms of efficiency and classification performance -- the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting a statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works

    Fuzzy Modeling of Client Preference in Data-Rich Marketing Environments

    Get PDF
    Advances in computational methods have led, in the world of financial services, to huge databases of client and market information. In the past decade, various computational intelligence (CI) techniques have been applied in mining this data for obtaining knowledge and in-depth information about the clients and the markets. This paper discusses the application of fuzzy clustering in target selection from large databases for direct marketing (DM) purposes. Actual data from the campaigns of a large financial services provider are used as a test case. The results obtained with the fuzzy clustering approach are compared with those resulting from the current practice of using statistical tools for target selection.fuzzy clustering;direct marketing;client segmentation;fuzzy systems

    SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates

    Full text link
    The lack of reliable methods for identifying descriptors - the sets of parameters capturing the underlying mechanisms of a materials property - is one of the key factors hindering efficient materials development. Here, we propose a systematic approach for discovering descriptors for materials properties, within the framework of compressed-sensing based dimensionality reduction. SISSO (sure independence screening and sparsifying operator) tackles immense and correlated features spaces, and converges to the optimal solution from a combination of features relevant to the materials' property of interest. In addition, SISSO gives stable results also with small training sets. The methodology is benchmarked with the quantitative prediction of the ground-state enthalpies of octet binary materials (using ab initio data) and applied to the showcase example of predicting the metal/insulator classification of binaries (with experimental data). Accurate, predictive models are found in both cases. For the metal-insulator classification model, the predictive capability are tested beyond the training data: It rediscovers the available pressure-induced insulator->metal transitions and it allows for the prediction of yet unknown transition candidates, ripe for experimental validation. As a step forward with respect to previous model-identification methods, SISSO can become an effective tool for automatic materials development.Comment: 11 pages, 5 figures, in press in Phys. Rev. Material
    • 

    corecore