11,263 research outputs found

    An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

    Get PDF
    AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

    Why Linguistic Fuzzy Rule Based Classification Systems perform well in Big Data Applications?

    Get PDF
    The significance of addressing Big Data applications is beyond all doubt. The current ability of extracting interesting knowledge from large volumes of information provides great advantages to both corporations and academia. Therefore, researchers and practitioners must deal with the problem of scalability so that Machine Learning and Data Mining algorithms can address Big Data properly. With this end, the MapReduce programming framework is by far the most widely used mechanism to implement fault-tolerant distributed applications. This novel framework implies the design of a divide-and-conquer mechanism in which local models are learned separately in one stage (Map tasks) whereas a second stage (Reduce) is devoted to aggregate all sub-models into a single solution. In this paper, we focus on the analysis of the behavior of Linguistic Fuzzy Rule Based Classification Systems when embedded into a MapReduce working procedure. By retrieving different information regarding the rules learned throughout the MapReduce process, we will be able to identify some of the capabilities of this particular paradigm that allowed them to provide a good performance when addressing Big Data problems. In summary, we will show that linguistic fuzzy classifiers are a robust approach in case of scalability requirements.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P and TIN2015-68454-R

    A Micro-Extended Belief Rule-Based System for Big Data Multi-Class Classification Problems

    Get PDF

    Self-Organizing Fuzzy Inference Ensemble System for Big Streaming Data Classification

    Get PDF
    An evolving intelligent system (EIS) is able to self-update its system structure and meta-parameters from streaming data. However, since the majority of EISs are implemented on a single-model architecture, their performances on large-scale, complex data streams are often limited. To address this deficiency, a novel self-organizing fuzzy inference ensemble framework is proposed in this paper. As the base learner of the proposed ensemble system, the self-organizing fuzzy inference system is capable of self-learning a highly transparent predictive model from streaming data on a chunk-by-chunk basis through a human-interpretable process. Very importantly, the base learner can continuously self-adjust its decision boundaries based on the inter-class and intra-class distances between prototypes identified from successive data chunks for higher classification precision. Thanks to its parallel distributed computing architecture, the proposed ensemble framework can achieve great classification precision while maintain high computational efficiency on large-scale problems. Numerical examples based on popular benchmark big data problems demonstrate the superior performance of the proposed approach over the state-of-the-art alternatives in terms of both classification accuracy and computational efficiency

    Attributes regrouping in Fuzzy Rule Based Classification Systems: an intra-classes approach

    Get PDF
    International audienceFuzzy rule-based classification systems (FRBCS) are able to build linguistic interpretable models, they automatically generate fuzzy if-then rules and use them to classify new observations. However, in these supervised learning systems, a high number of predictive attributes leads to an exponential increase of the number of generated rules. Moreover the antecedent conditions of the obtained rules are very large since they contain all the attributes that describe the examples. Therefore the accuracy of these systems as well as their interpretability degraded. To address this problem, we propose to use ensemble methods for FRBCS where the decisions of different classifiers are combined in order to form the final classification model. We are interested in particular in ensemble methods which split the attributes into subgroups and treat each subgroup separately. We propose to regroup attributes by correlation search among the training set elements that belongs to the same class, such an intra-classes correlation search allows to characterize each class separately. Several experiences were carried out on various data. The results show a reduction in the number of rules and of antecedents without altering accuracy, on the contrary classification rates are even improved
    • …
    corecore