12,620 research outputs found
Multilabel Prototype Generation for data reduction in K-Nearest Neighbour classification
Prototype Generation (PG) methods are typically considered for improving the efficiency of the k-Nearest Neighbour (kNN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel kNN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improvingâboth in terms of efficiency and classification performanceâthe only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works.This research was partially funded by the Spanish Ministerio de Ciencia e InnovaciĂłn through the MultiScore (PID2020-118447RA-I00) and DOREMI (TED2021-132103A-I00) projects. The first author is supported by grant APOSTD/2020/256 from âPrograma I+D+i de la Generalitat Valencianaâ
Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification
Prototype Generation (PG) methods are typically considered for improving the
efficiency of the -Nearest Neighbour (NN) classifier when tackling
high-size corpora. Such approaches aim at generating a reduced version of the
corpus without decreasing the classification performance when compared to the
initial set. Despite their large application in multiclass scenarios, very few
works have addressed the proposal of PG methods for the multilabel space. In
this regard, this work presents the novel adaptation of four multiclass PG
strategies to the multilabel case. These proposals are evaluated with three
multilabel NN-based classifiers, 12 corpora comprising a varied range of
domains and corpus sizes, and different noise scenarios artificially induced in
the data. The results obtained show that the proposed adaptations are capable
of significantly improving -- both in terms of efficiency and classification
performance -- the only reference multilabel PG work in the literature as well
as the case in which no PG method is applied, also presenting a statistically
superior robustness in noisy scenarios. Moreover, these novel PG strategies
allow prioritising either the efficiency or efficacy criteria through its
configuration depending on the target scenario, hence covering a wide area in
the solution space not previously filled by other works
Fuzzy Modeling of Client Preference in Data-Rich Marketing Environments
Advances in computational methods have led, in the world of financial services, to huge databases of client and market information. In the past decade, various computational intelligence (CI) techniques have been applied in mining this data for obtaining knowledge and in-depth information about the clients and the markets. This paper discusses the application of fuzzy clustering in target selection from large databases for direct marketing (DM) purposes. Actual data from the campaigns of a large financial services provider are used as a test case. The results obtained with the fuzzy clustering approach are compared with those resulting from the current practice of using statistical tools for target selection.fuzzy clustering;direct marketing;client segmentation;fuzzy systems
SISSO: a compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates
The lack of reliable methods for identifying descriptors - the sets of
parameters capturing the underlying mechanisms of a materials property - is one
of the key factors hindering efficient materials development. Here, we propose
a systematic approach for discovering descriptors for materials properties,
within the framework of compressed-sensing based dimensionality reduction.
SISSO (sure independence screening and sparsifying operator) tackles immense
and correlated features spaces, and converges to the optimal solution from a
combination of features relevant to the materials' property of interest. In
addition, SISSO gives stable results also with small training sets. The
methodology is benchmarked with the quantitative prediction of the ground-state
enthalpies of octet binary materials (using ab initio data) and applied to the
showcase example of predicting the metal/insulator classification of binaries
(with experimental data). Accurate, predictive models are found in both cases.
For the metal-insulator classification model, the predictive capability are
tested beyond the training data: It rediscovers the available pressure-induced
insulator->metal transitions and it allows for the prediction of yet unknown
transition candidates, ripe for experimental validation. As a step forward with
respect to previous model-identification methods, SISSO can become an effective
tool for automatic materials development.Comment: 11 pages, 5 figures, in press in Phys. Rev. Material
- âŠ