11 research outputs found
On Distributed Fuzzy Decision Trees for Big Data
Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multiway FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are, therefore, used as an input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme along three dimensions: 1) performance in terms of classification accuracy, model complexity, and execution time; 2) scalability varying the number of computing units; and 3) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with a modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis. © 1993-2012 IEEE
On the usage of the probability integral transform to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems
We present a new distributed fuzzy partitioning method to reduce the
complexity of multi-way fuzzy decision trees in Big Data classification
problems. The proposed algorithm builds a fixed number of fuzzy sets for all
variables and adjusts their shape and position to the real distribution of
training data. A two-step process is applied : 1) transformation of the
original distribution into a standard uniform distribution by means of the
probability integral transform. Since the original distribution is generally
unknown, the cumulative distribution function is approximated by computing the
q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy
partition in the transformed attribute space using a fixed number of equally
distributed triangular membership functions. Despite the aforementioned
transformation, the definition of every fuzzy set in the original space can be
recovered by applying the inverse cumulative distribution function (also known
as quantile function). The experimental results reveal that the proposed
methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT)
induction algorithm to maintain classification accuracy with up to 6 million
fewer leaves.Comment: Appeared in 2018 IEEE International Congress on Big Data (BigData
Congress). arXiv admin note: text overlap with arXiv:1902.0935
Evolutionary Fuzzy Systems for Explainable Artificial Intelligence: Why, When, What for, and Where to?
Evolutionary fuzzy systems are one of the greatest advances within the area of computational intelligence. They consist of evolutionary algorithms applied to the design of fuzzy systems. Thanks to this hybridization, superb abilities are provided to fuzzy modeling in many different data science scenarios. This contribution is intended to comprise a position paper developing a comprehensive analysis of the evolutionary fuzzy systems research field. To this end, the "4 W" questions are posed and addressed with the aim of understanding the current context of this topic and its significance. Specifically, it will be pointed out why evolutionary fuzzy systems are important from an explainable point of view, when they began, what they are used for, and where the attention of researchers should be directed to in the near future in this area. They must play an important role for the emerging area of eXplainable Artificial Intelligence (XAI) learning from data
Online Static Security Assessment of Power Systems Based on Lasso Algorithm
As one important means of ensuring secure operation in a power system, the
contingency selection and ranking methods need to be more rapid and accurate. A
novel method-based least absolute shrinkage and selection operator (Lasso)
algorithm is proposed in this paper to apply to online static security
assessment (OSSA). The assessment is based on a security index, which is
applied to select and screen contingencies. Firstly, the multi-step adaptive
Lasso (MSA-Lasso) regression algorithm is introduced based on the regression
algorithm, whose predictive performance has an advantage. Then, an OSSA module
is proposed to evaluate and select contingencies in different load conditions.
In addition, the Lasso algorithm is employed to predict the security index of
each power system operation state with the consideration of bus voltages and
power flows, according to Newton-Raphson load flow (NRLF) analysis in
post-contingency states. Finally, the numerical results of applying the
proposed approach to the IEEE 14-bus, 118-bus, and 300-bus test systems
demonstrate the accuracy and rapidity of OSSA.Comment: Accepted by Applied Science
An overview of recent distributed algorithms for learning fuzzy models in Big Data classification
AbstractNowadays, a huge amount of data are generated, often in very short time intervals and in various formats, by a number of different heterogeneous sources such as social networks and media, mobile devices, internet transactions, networked devices and sensors. These data, identified as Big Data in the literature, are characterized by the popular Vs features, such as Value, Veracity, Variety, Velocity and Volume. In particular, Value focuses on the useful knowledge that may be mined from data. Thus, in the last years, a number of data mining and machine learning algorithms have been proposed to extract knowledge from Big Data. These algorithms have been generally implemented by using ad-hoc programming paradigms, such as MapReduce, on specific distributed computing frameworks, such as Apache Hadoop and Apache Spark. In the context of Big Data, fuzzy models are currently playing a significant role, thanks to their capability of handling vague and imprecise data and their innate characteristic to be interpretable. In this work, we give an overview of the most recent distributed learning algorithms for generating fuzzy classification models for Big Data. In particular, we first show some design and implementation details of these learning algorithms. Thereafter, we compare them in terms of accuracy and interpretability. Finally, we argue about their scalability
Redesigning Post-Operative Processes Using Data Mining Classification Techniques
Data mining classification models are developed and investigated in this paper. These models are adopted to develop and redesign several business processes based on post-operative data. Post-operative data were collected and used via the Waikato Environment for Knowledge Analysis (WEKA), to investigate the factors influencing patients’ admission after surgery and compare the developed DM classification models. The results reveal that each implemented DM technique entails different attributes affecting patients’ post-surgery admission status. The comparison suggests that neural networks outperform other classification techniques. Further, the optimal number of beds required to accommodate post-operative patients is investigated. Simulation was conducted using queuing theory software to compute the expected number of beds required to achieve zero waiting time. The results indicate that the number of beds required to accommodate post-surgery patients waiting in the queue is the length of 1, which means that one bed will be available due to patient discharge
Predicting the Outcomes of Important Events based on Social Media and Social Network Analysis
Twitter is a famous social network website that lets users post their opinions about current affairs, share their social events, and interact with others. It has now become one of the largest sources of news, with over 200 million active users monthly. It is possible to predict the outcomes of events based on social networks using machine learning and big data analytics. Massive data available from social networks can be utilized to improve prediction efficacy and accuracy. It is a challenging problem to achieve high accuracy in predicting the outcomes of political events using Twitter data. The focus of this thesis is to investigate novel approaches to predicting the outcomes of political events from social media and social networks. The first proposed method is to predict election results based on Twitter data analysis. The method extracts and analyses sentimental information from microblogs to predict the popularity of candidates. Experimental results have shown its advantages over the existing method for predicting outcomes of politic events. The second proposed method is to predict election results based on Twitter data analysis that analyses sentimental information using term weighting and selection to predict the popularity of candidates. Scaling factors are used for different types of terms, which help to select informative terms more effectively and achieve better prediction results than the previous method. The third method proposed in this thesis represents the social network by using network connectivity constructed based on retweet data and social media contents as well, leading to a new approach to predicting the outcome of political events. Two approaches, whole-network and sub-network, have been developed and compared. Experimental results show that the sub-network approach, which constructs sub-networks based on different topics, outperformed the whole-network approach
Recommended from our members
Anomaly detection for IoT networks using machine learning
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe Internet of Things (IoT) is considered one of the trending technologies today. IoT affects various industries, including logistics tracking, healthcare, automotive and smart cities. A rising number of cyber-attacks and breaches are rapidly targeting networks equipped with IoT devices. This thesis aims to improve security in IoT networks by enhancing anomaly detection using machine learning.
This thesis identified the challenges and gaps related to securing the Internet of Things networks. The challenges are network size, the number of devices, the human factor, and the complexity of IoT networks. The gaps identified include the lack of research on signature-based intrusion detection systems used for anomaly detection, in addition to the lack of modelling input parameters required for anomaly detection in IoT networks. Furthermore, there is a lack of comparison of the performance of machine learning algorithms on standard and real IoT datasets.
This thesis creates a dataset to test the anomaly binary classification performance of the Neural Networks, Gaussian Naive Bayes, Support Vector Machine, and Decision Trees machine learning algorithms and compares their results with the KDDCUP99 dataset. The results show that Support Vector Machine and Gaussian Naive Bayes perform lower than the other models on the created IoT dataset. This thesis reduces the number of features required by machine learning algorithms for anomaly detection in the IoT networks to five features only, which resulted in reduced execution time by an average of 58%.
This thesis tests CNNwGFC, which is an enhanced Convolutional Neural Network model, in detecting and classifying anomalies in IoT networks. This model achieves an increase of 15.34% in the accuracy for IoT anomaly classification in the UNSW-NB15 compared to the classic Convolutional Neural Network. The CNNwGFC multi-classification accuracy (96.24%) is higher by 7.16 than the highest from the literature