11 research outputs found

    On Distributed Fuzzy Decision Trees for Big Data

    Get PDF
    Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. © 1993-2012 IEEE

    On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

    Full text link
    We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

    Evolutionary Fuzzy Systems for Explainable Artificial Intelligence: Why, When, What for, and Where to?

    Get PDF
    Evolutionary fuzzy systems are one of the greatest advances within the area of computational intelligence. They consist of evolutionary algorithms applied to the design of fuzzy systems. Thanks to this hybridization, superb abilities are provided to fuzzy modeling in many different data science scenarios. This contribution is intended to comprise a position paper developing a comprehensive analysis of the evolutionary fuzzy systems research field. To this end, the "4 W" questions are posed and addressed with the aim of understanding the current context of this topic and its significance. Specifically, it will be pointed out why evolutionary fuzzy systems are important from an explainable point of view, when they began, what they are used for, and where the attention of researchers should be directed to in the near future in this area. They must play an important role for the emerging area of eXplainable Artificial Intelligence (XAI) learning from data

    Online Static Security Assessment of Power Systems Based on Lasso Algorithm

    Full text link
    As one important means of ensuring secure operation in a power system, the contingency selection and ranking methods need to be more rapid and accurate. A novel method-based least absolute shrinkage and selection operator (Lasso) algorithm is proposed in this paper to apply to online static security assessment (OSSA). The assessment is based on a security index, which is applied to select and screen contingencies. Firstly, the multi-step adaptive Lasso (MSA-Lasso) regression algorithm is introduced based on the regression algorithm, whose predictive performance has an advantage. Then, an OSSA module is proposed to evaluate and select contingencies in different load conditions. In addition, the Lasso algorithm is employed to predict the security index of each power system operation state with the consideration of bus voltages and power flows, according to Newton-Raphson load flow (NRLF) analysis in post-contingency states. Finally, the numerical results of applying the proposed approach to the IEEE 14-bus, 118-bus, and 300-bus test systems demonstrate the accuracy and rapidity of OSSA.Comment: Accepted by Applied Science

    An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

    Get PDF
    AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

    Redesigning Post-Operative Processes Using Data Mining Classification Techniques

    Get PDF
    Data mining classification models are developed and investigated in this paper. These models are adopted to develop and redesign several business processes based on post-operative data. Post-operative data were collected and used via the Waikato Environment for Knowledge Analysis (WEKA), to investigate the factors influencing patients’ admission after surgery and compare the developed DM classification models. The results reveal that each implemented DM technique entails different attributes affecting patients’ post-surgery admission status. The comparison suggests that neural networks outperform other classification techniques. Further, the optimal number of beds required to accommodate post-operative patients is investigated. Simulation was conducted using queuing theory software to compute the expected number of beds required to achieve zero waiting time. The results indicate that the number of beds required to accommodate post-surgery patients waiting in the queue is the length of 1, which means that one bed will be available due to patient discharge

    Predicting the Outcomes of Important Events based on Social Media and Social Network Analysis

    Get PDF
    Twitter is a famous social network website that lets users post their opinions about current affairs, share their social events, and interact with others. It has now become one of the largest sources of news, with over 200 million active users monthly. It is possible to predict the outcomes of events based on social networks using machine learning and big data analytics. Massive data available from social networks can be utilized to improve prediction efficacy and accuracy. It is a challenging problem to achieve high accuracy in predicting the outcomes of political events using Twitter data. The focus of this thesis is to investigate novel approaches to predicting the outcomes of political events from social media and social networks. The first proposed method is to predict election results based on Twitter data analysis. The method extracts and analyses sentimental information from microblogs to predict the popularity of candidates. Experimental results have shown its advantages over the existing method for predicting outcomes of politic events. The second proposed method is to predict election results based on Twitter data analysis that analyses sentimental information using term weighting and selection to predict the popularity of candidates. Scaling factors are used for different types of terms, which help to select informative terms more effectively and achieve better prediction results than the previous method. The third method proposed in this thesis represents the social network by using network connectivity constructed based on retweet data and social media contents as well, leading to a new approach to predicting the outcome of political events. Two approaches, whole-network and sub-network, have been developed and compared. Experimental results show that the sub-network approach, which constructs sub-networks based on different topics, outperformed the whole-network approach
    corecore