Search CORE

11 research outputs found

On Distributed Fuzzy Decision Trees for Big Data

Author: Marcelloni Francesco
Pedrycz Witold
Segatori Armando
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. © 1993-2012 IEEE

Archivio della Ricerca - Università di Pisa

On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems

Author: Bustince Humberto
Elkano Mikel
Galar Mikel
Uriz Mikel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/02/2019
Field of study

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData Congress). arXiv admin note: text overlap with arXiv:1902.0935

arXiv.org e-Print Archive

Crossref

Evolutionary Fuzzy Systems for Explainable Artificial Intelligence: Why, When, What for, and Where to?

Author: Cordón García Óscar
del Jesus Díaz María Jose
Fernández Hilario Alberto Luis
Herrera Triguero Francisco
Marcelloni Francesco
Publication venue: IEEE Computational Intelligence Magazine
Publication date: 01/01/2019
Field of study

Evolutionary fuzzy systems are one of the greatest advances within the area of computational intelligence. They consist of evolutionary algorithms applied to the design of fuzzy systems. Thanks to this hybridization, superb abilities are provided to fuzzy modeling in many different data science scenarios. This contribution is intended to comprise a position paper developing a comprehensive analysis of the evolutionary fuzzy systems research field. To this end, the "4 W" questions are posed and addressed with the aim of understanding the current context of this topic and its significance. Specifically, it will be pointed out why evolutionary fuzzy systems are important from an explainable point of view, when they began, what they are used for, and where the attention of researchers should be directed to in the near future in this area. They must play an important role for the emerging area of eXplainable Artificial Intelligence (XAI) learning from data

Repositorio Institucional Universidad de Granada

Online Static Security Assessment of Power Systems Based on Lasso Algorithm

Author: Li Yahui
Li Yang
Sun Yuanyuan
Publication venue: 'MDPI AG'
Publication date: 01/08/2018
Field of study

As one important means of ensuring secure operation in a power system, the contingency selection and ranking methods need to be more rapid and accurate. A novel method-based least absolute shrinkage and selection operator (Lasso) algorithm is proposed in this paper to apply to online static security assessment (OSSA). The assessment is based on a security index, which is applied to select and screen contingencies. Firstly, the multi-step adaptive Lasso (MSA-Lasso) regression algorithm is introduced based on the regression algorithm, whose predictive performance has an advantage. Then, an OSSA module is proposed to evaluate and select contingencies in different load conditions. In addition, the Lasso algorithm is employed to predict the security index of each power system operation state with the consideration of bus voltages and power flows, according to Newton-Raphson load flow (NRLF) analysis in post-contingency states. Finally, the numerical results of applying the proposed approach to the IEEE 14-bus, 118-bus, and 300-bus test systems demonstrate the accuracy and rapidity of OSSA.Comment: Accepted by Applied Science

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

An overview of recent distributed algorithms for learning fuzzy models in Big Data classification

Author: Francesco Marcelloni
Michela Fazzolari
Pietro Ducange
Publication venue
Publication date: 10/03/2020
Field of study

AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability

Open Access Repository

Redesigning Post-Operative Processes Using Data Mining Classification Techniques

Author: Ghazi Alwattar Hayder
Publication venue: 'Universiti Malaysia Pahang Publishing'
Publication date: 21/10/2021
Field of study

Data mining classification models are developed and investigated in this paper. These models are adopted to develop and redesign several business processes based on post-operative data. Post-operative data were collected and used via the Waikato Environment for Knowledge Analysis (WEKA), to investigate the factors influencing patients’ admission after surgery and compare the developed DM classification models. The results reveal that each implemented DM technique entails different attributes affecting patients’ post-surgery admission status. The comparison suggests that neural networks outperform other classification techniques. Further, the optimal number of beds required to accommodate post-operative patients is investigated. Simulation was conducted using queuing theory software to compute the expected number of beds required to achieve zero waiting time. The results indicate that the number of beds required to accommodate post-surgery patients waiting in the queue is the length of 1, which means that one bed will be available due to patient discharge

University of Worcester Research and Publications

Predicting the Outcomes of Important Events based on Social Media and Social Network Analysis

Author: Wang Lei
Publication venue
Publication date: 28/10/2020
Field of study

Twitter is a famous social network website that lets users post their opinions about current affairs, share their social events, and interact with others. It has now become one of the largest sources of news, with over 200 million active users monthly. It is possible to predict the outcomes of events based on social networks using machine learning and big data analytics. Massive data available from social networks can be utilized to improve prediction efficacy and accuracy. It is a challenging problem to achieve high accuracy in predicting the outcomes of political events using Twitter data. The focus of this thesis is to investigate novel approaches to predicting the outcomes of political events from social media and social networks. The first proposed method is to predict election results based on Twitter data analysis. The method extracts and analyses sentimental information from microblogs to predict the popularity of candidates. Experimental results have shown its advantages over the existing method for predicting outcomes of politic events. The second proposed method is to predict election results based on Twitter data analysis that analyses sentimental information using term weighting and selection to predict the popularity of candidates. Scaling factors are used for different types of terms, which help to select informative terms more effectively and achieve better prediction results than the previous method. The third method proposed in this thesis represents the social network by using network connectivity constructed based on retweet data and social media contents as well, leading to a new approach to predicting the outcome of political events. Two approaches, whole-network and sub-network, have been developed and compared. Experimental results show that the sub-network approach, which constructs sub-networks based on different topics, outperformed the whole-network approach

University of Essex Research Repository

Recommended from our members

Anomaly detection for IoT networks using machine learning

Author: Abdulla Husain
Publication venue: Brunel University London
Publication date: 01/01/2023
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe Internet of Things (IoT) is considered one of the trending technologies today. IoT affects various industries, including logistics tracking, healthcare, automotive and smart cities. A rising number of cyber-attacks and breaches are rapidly targeting networks equipped with IoT devices. This thesis aims to improve security in IoT networks by enhancing anomaly detection using machine learning. This thesis identified the challenges and gaps related to securing the Internet of Things networks. The challenges are network size, the number of devices, the human factor, and the complexity of IoT networks. The gaps identified include the lack of research on signature-based intrusion detection systems used for anomaly detection, in addition to the lack of modelling input parameters required for anomaly detection in IoT networks. Furthermore, there is a lack of comparison of the performance of machine learning algorithms on standard and real IoT datasets. This thesis creates a dataset to test the anomaly binary classification performance of the Neural Networks, Gaussian Naive Bayes, Support Vector Machine, and Decision Trees machine learning algorithms and compares their results with the KDDCUP99 dataset. The results show that Support Vector Machine and Gaussian Naive Bayes perform lower than the other models on the created IoT dataset. This thesis reduces the number of features required by machine learning algorithms for anomaly detection in the IoT networks to five features only, which resulted in reduced execution time by an average of 58%. This thesis tests CNNwGFC, which is an enhanced Convolutional Neural Network model, in detecting and classifying anomalies in IoT networks. This model achieves an increase of 15.34% in the accuracy for IoT anomaly classification in the UNSW-NB15 compared to the classic Convolutional Neural Network. The CNNwGFC multi-classification accuracy (96.24%) is higher by 7.16 than the highest from the literature

Brunel University Research Archive