259 research outputs found

    A Genetic Tuning to Improve the Performance of Fuzzy Rule-Based Classification Systems with Interval-Valued Fuzzy Sets: Degree of Ignorance and Lateral Position

    Get PDF
    Fuzzy Rule-Based Systems are appropriate tools to deal with classification problems due to their good properties. However, they can suffer a lack of system accuracy as a result of the uncertainty inherent in the definition of the membership functions and the limitation of the homogeneous distribution of the linguistic labels. The aim of the paper is to improve the performance of Fuzzy Rule-Based Classification Systems by means of the Theory of Interval-Valued Fuzzy Sets and a post-processing genetic tuning step. In order to build the Interval-Valued Fuzzy Sets we define a new function called weak ignorance for modeling the uncertainty associated with the definition of the membership functions. Next, we adapt the fuzzy partitions to the problem in an optimal way through a cooperative evolutionary tuning in which we handle both the degree of ignorance and the lateral position (based on the 2-tuples fuzzy linguistic representation) of the linguistic labels. The experimental study is carried out over a large collection of data-sets and it is supported by a statistical analysis. Our results show empirically that the use of our methodology outperforms the initial Fuzzy Rule-Based Classification System. The application of our cooperative tuning enhances the results provided by the use of the isolated tuning approaches and also improves the behavior of the genetic tuning based on the 3-tuples fuzzy linguistic representation.Spanish Government TIN2008-06681-C06-01 TIN2010-1505

    A Sensitivity Analysis for Quality Measures of Quantitative Association Rules

    Get PDF
    There exist several fitness function proposals based on a combination of weighted objectives to optimize the discovery of association rules. Nevertheless, some differences in the measures used to assess the quality of association rules could be obtained according to the values of such weights. Therefore, in such proposals it is very important the user’s decision in order to specify the weights or coefficients of the optimized objectives. Thus, this work presents an analysis on the sensitivity of several quality measures when the weights included in the fitness function of the existing QARGA algorithm are modified. Finally, a comparative analysis of the results obtained according to the weights setup is provided.MICYT TIN2011-28956-C02-00Junta de Andalucía P11-TIC-752

    KEEL 3.0: an open source software for multi-stage analysis in data mining

    Get PDF
    This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to performdata management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems

    An ant colony-based semi-supervised approach for learning classification rules

    Get PDF
    Semi-supervised learning methods create models from a few labeled instances and a great number of unlabeled instances. They appear as a good option in scenarios where there is a lot of unlabeled data and the process of labeling instances is expensive, such as those where most Web applications stand. This paper proposes a semi-supervised self-training algorithm called Ant-Labeler. Self-training algorithms take advantage of supervised learning algorithms to iteratively learn a model from the labeled instances and then use this model to classify unlabeled instances. The instances that receive labels with high confidence are moved from the unlabeled to the labeled set, and this process is repeated until a stopping criteria is met, such as labeling all unlabeled instances. Ant-Labeler uses an ACO algorithm as the supervised learning method in the self-training procedure to generate interpretable rule-based models—used as an ensemble to ensure accurate predictions. The pheromone matrix is reused across different executions of the ACO algorithm to avoid rebuilding the models from scratch every time the labeled set is updated. Results showed that the proposed algorithm obtains better predictive accuracy than three state-of-the-art algorithms in roughly half of the datasets on which it was tested, and the smaller the number of labeled instances, the better the Ant-Labeler performance

    EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

    Get PDF
    Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals

    Instance selection of linear complexity for big data

    Get PDF
    Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).Supported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness

    Ensemble and fuzzy techniques applied to imbalanced traffic congestion datasets a comparative study

    Get PDF
    Class imbalance is among the most persistent complications which may confront the traditional supervised learning task in real-world applications. Among the different kind of classification problems that have been studied in the literature, the imbalanced ones, particularly those that represents real-world problems, have attracted the interest of many researchers in recent years. In order to face this problems, different approaches have been used or proposed in the literature, between then, soft computing and ensemble techniques. In this work, ensembles and fuzzy techniques have been applied to real-world traffic datasets in order to study their performance in imbalanced real-world scenarios. KEEL platform is used to carried out this study. The results show that different ensemble techniques obtain the best results in the proposed datasets. Document type: Part of book or chapter of boo

    Generalized additive and fuzzy models in environmental flow assessment: A comparison employing the West Balkan trout (Salmo farioides; Karaman, 1938)

    Full text link
    Human activities have altered flow regimes resulting in increased pressures and threats on river biota. Physical habitat simulation has been established as a standard approach among the methods for Environmental Flow Assessment (EFA). Traditionally, in EFA, univariate habitat suitability curves have been used to evaluate the habitat suitability at the microhabitat scale whereas Generalized Additive Models (GAMs) and fuzzy logic are considered the most common multivariate approaches to do so. The assessment of the habitat suitability for three size classes of the West Balkan trout (Salmo farioides; Karaman, 1938) inferred with these multivariate approaches was compared at three different levels. First the modelled patterns of habitat selection were compared by developing partial dependence plots. Then, the habitat assessment was spatially explicitly compared by calculating the fuzzy kappa statistic and finally, the habitat quantity and quality was compared broadly and at relevant flows under a hypothetical flow regulation, based on the Weighted Usable Area (WUA) vs. flow curves. The GAMs were slightly more accurate and the WUA-flow curves demonstrated that they were more optimistic in the habitat assessment with larger areas assessed with low to intermediate suitability (0.2 0.6). Nevertheless, both approaches coincided in the habitat assessment (the optimal areas were spatially coincident) and in the modelled patterns of habitat selection; large trout selected microhabitats with low flow velocity, large depth, coarse substrate and abundant cover. Medium sized trout selected microhabitats with low flow velocity, middle-to-large depth, any kind of substrate but bedrock and some elements of cover. Finally small trout selected microhabitats with low flow velocity, small depth, and light cover only avoiding bedrock substrate. Furthermore, both approaches also rendered similar WUA-flow curves and coincided in the predicted increases and decreases of the WUA under the hypothetical flow regulation. Although on an equal footing, GAMs performed slightly better, they do not automatically account for variables interactions. Conversely, fuzzy models do so and can be easily modified by experts to include new insights or to cover a wider range of environmental conditions. Therefore, as a consequence of the agreement between both approaches, we would advocate for combinations of GAMs and fuzzy models in fish-based EFA.This study was supported by the ECOFLOW project funded by the Hellenic General Secretariat of Research and Technology in the framework of the NSRF 2007-2013. We are grateful for field assistance of Dimitris Kommatas, Orfeas Triantafillou and Martin Palt and to Alcibiades N. Economou for assistance in discussions on trout biology and ecology.Muñoz Mas, R.; Papadaki, C.; Martinez-Capel, F.; Zogaris, S.; Ntoanidis, L.; Dimitriou, E. (2016). Generalized additive and fuzzy models in environmental flow assessment: A comparison employing the West Balkan trout (Salmo farioides; Karaman, 1938). Ecological Engineering. 91:365-377. doi:10.1016/j.ecoleng.2016.03.009S3653779
    corecore