2,226 research outputs found

    Automatic Finding Trapezoidal Membership Functions in Mining Fuzzy Association Rules Based on Learning Automata

    Get PDF
    Association rule mining is an important data mining technique used for discovering relationships among all data items. Membership functions have a significant impact on the outcome of the mining association rules. An important challenge in fuzzy association rule mining is finding an appropriate membership functions, which is an optimization issue. In the most relevant studies of fuzzy association rule mining, only triangle membership functions are considered. This study, as the first attempt, used a team of continuous action-set learning automata (CALA) to find both the appropriate number and positions of trapezoidal membership functions (TMFs). The spreads and centers of the TMFs were taken into account as parameters for the research space and a new approach for the establishment of a CALA team to optimize these parameters was introduced. Additionally, to increase the convergence speed of the proposed approach and remove bad shapes of membership functions, a new heuristic approach has been proposed. Experiments on two real data sets showed that the proposed algorithm improves the efficiency of the extracted rules by finding optimized membership functions

    A Genetic Tuning to Improve the Performance of Fuzzy Rule-Based Classification Systems with Interval-Valued Fuzzy Sets: Degree of Ignorance and Lateral Position

    Get PDF
    Fuzzy Rule-Based Systems are appropriate tools to deal with classification problems due to their good properties. However, they can suffer a lack of system accuracy as a result of the uncertainty inherent in the definition of the membership functions and the limitation of the homogeneous distribution of the linguistic labels. The aim of the paper is to improve the performance of Fuzzy Rule-Based Classification Systems by means of the Theory of Interval-Valued Fuzzy Sets and a post-processing genetic tuning step. In order to build the Interval-Valued Fuzzy Sets we define a new function called weak ignorance for modeling the uncertainty associated with the definition of the membership functions. Next, we adapt the fuzzy partitions to the problem in an optimal way through a cooperative evolutionary tuning in which we handle both the degree of ignorance and the lateral position (based on the 2-tuples fuzzy linguistic representation) of the linguistic labels. The experimental study is carried out over a large collection of data-sets and it is supported by a statistical analysis. Our results show empirically that the use of our methodology outperforms the initial Fuzzy Rule-Based Classification System. The application of our cooperative tuning enhances the results provided by the use of the isolated tuning approaches and also improves the behavior of the genetic tuning based on the 3-tuples fuzzy linguistic representation.Spanish Government TIN2008-06681-C06-01 TIN2010-1505

    Data mining using intelligent systems : an optimized weighted fuzzy decision tree approach

    Get PDF
    Data mining can be said to have the aim to analyze the observational datasets to find relationships and to present the data in ways that are both understandable and useful. In this thesis, some existing intelligent systems techniques such as Self-Organizing Map, Fuzzy C-means and decision tree are used to analyze several datasets. The techniques are used to provide flexible information processing capability for handling real-life situations. This thesis is concerned with the design, implementation, testing and application of these techniques to those datasets. The thesis also introduces a hybrid intelligent systems technique: Optimized Weighted Fuzzy Decision Tree (OWFDT) with the aim of improving Fuzzy Decision Trees (FDT) and solving practical problems. This thesis first proposes an optimized weighted fuzzy decision tree, incorporating the introduction of Fuzzy C-Means to fuzzify the input instances but keeping the expected labels crisp. This leads to a different output layer activation function and weight connection in the neural network (NN) structure obtained by mapping the FDT to the NN. A momentum term was also introduced into the learning process to train the weight connections to avoid oscillation or divergence. A new reasoning mechanism has been also proposed to combine the constructed tree with those weights which had been optimized in the learning process. This thesis also makes a comparison between the OWFDT and two benchmark algorithms, Fuzzy ID3 and weighted FDT. SIx datasets ranging from material science to medical and civil engineering were introduced as case study applications. These datasets involve classification of composite material failure mechanism, classification of electrocorticography (ECoG)/Electroencephalogram (EEG) signals, eye bacteria prediction and wave overtopping prediction. Different intelligent systems techniques were used to cluster the patterns and predict the classes although OWFDT was used to design classifiers for all the datasets. In the material dataset, Self-Organizing Map and Fuzzy C-Means were used to cluster the acoustic event signals and classify those events to different failure mechanism, after the classification, OWFDT was introduced to design a classifier in an attempt to classify acoustic event signals. For the eye bacteria dataset, we use the bagging technique to improve the classification accuracy of Multilayer Perceptrons and Decision Trees. Bootstrap aggregating (bagging) to Decision Tree also helped to select those most important sensors (features) so that the dimension of the data could be reduced. Those features which were most important were used to grow the OWFDT and the curse of dimensionality problem could be solved using this approach. The last dataset, which is concerned with wave overtopping, was used to benchmark OWFDT with some other Intelligent Systems techniques, such as Adaptive Neuro-Fuzzy Inference System (ANFIS), Evolving Fuzzy Neural Network (EFuNN), Genetic Neural Mathematical Method (GNMM) and Fuzzy ARTMAP. Through analyzing these datasets using these Intelligent Systems Techniques, it has been shown that patterns and classes can be found or can be classified through combining those techniques together. OWFDT has also demonstrated its efficiency and effectiveness as compared with a conventional fuzzy Decision Tree and weighted fuzzy Decision Tree

    Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution

    Get PDF
    This research presents the mining of quantitative association rules based on evolutionary computation techniques. First, a real-coded genetic algorithm that extends the well-known binary-coded CHC algorithm has been projected to determine the intervals that define the rules without needing to discretize the attributes. The proposed algorithm is evaluated in synthetic datasets under different levels of noise in order to test its performance and the reported results are then compared to that of a multi-objective differential evolution algorithm, recently published. Furthermore, rules from real-world time series such as temperature, humidity, wind speed and direction of the wind, ozone, nitrogen monoxide and sulfur dioxide have been discovered with the objective of finding all existing relations between atmospheric pollution and climatological conditions.Ministerio de Ciencia y Tecnología TIN2007-68084-C-00Junta de Andalucía P07-TIC-0261

    A forecasting of indices and corresponding investment decision making application

    Get PDF
    Student Number : 9702018F - MSc(Eng) Dissertation - School of Electrical and Information Engineering - Faculty of Engineering and the Built EnvironmentDue to the volatile nature of the world economies, investing is crucial in ensuring an individual is prepared for future financial necessities. This research proposes an application, which employs computational intelligent methods that could assist investors in making financial decisions. This system consists of 2 components. The Forecasting Component (FC) is employed to predict the closing index price performance. Based on these predictions, the Stock Quantity Selection Component (SQSC) recommends the investor to purchase stocks, hold the current investment position or sell stocks in possession. The development of the FC module involved the creation of Multi-Layer Perceptron (MLP) as well as Radial Basis Function (RBF) neural network classifiers. TCategorizes that these networks classify are based on a profitable trading strategy that outperforms the long-term “Buy and hold” trading strategy. The Dow Jones Industrial Average, Johannesburg Stock Exchange (JSE) All Share, Nasdaq 100 and the Nikkei 225 Stock Average indices are considered. TIt has been determined that the MLP neural network architecture is particularly suited in the prediction of closing index price performance. Accuracies of 72%, 68%, 69% and 64% were obtained for the prediction of closing price performance of the Dow Jones Industrial Average, JSE All Share, Nasdaq 100 and Nikkei 225 Stock Average indices, respectively. TThree designs of the Stock Quantity Selection Component were implemented and compared in terms of their complexity as well as scalability. TComplexity is defined as the number of classifiers employed by the design. Scalability is defined as the ability of the design to accommodate the classification of additional investment recommendations. TDesigns that utilized 1, 4 and 16 classifiers, respectively, were developed. These designs were implemented using MLP neural networks, RBF neural networks, Fuzzy Inference Systems as well as Adaptive Neuro-Fuzzy Inference Systems. The design that employed 4 classifiers achieved low complexity and high scalability. As a result, this design is most appropriate for the application of concern. It has also been determined that the neural network architecture as well as the Fuzzy Inference System implementation of this design performed equally well

    Technical and Fundamental Features Analysis for Stock Market Prediction with Data Mining Methods

    Get PDF
    Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working. Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks. In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy. Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables. To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.Predicting stock prices is an essential objective in the financial world. Forecasting stock returns and their risk represents one of the most critical concerns of market decision makers. This thesis investigates the stock price forecasting with three approaches from the data mining concept and shows how different elements in the stock price can help to enhance the accuracy of our prediction. For this reason, the first and second approaches capture many fundamental indicators from the stocks and implement them as explanatory variables to do stock price classification and forecasting. In the third approach, technical features from the candlestick representation of the share prices are extracted and used to enhance the accuracy of the forecasting. In each approach, different tools and techniques from data mining and machine learning are employed to justify why the forecasting is working. Furthermore, since the idea is to evaluate the potential of features in the stock trend forecasting, therefore we diversify our experiments using both technical and fundamental features. Therefore, in the first approach, a three-stage methodology is developed while in the first step, a comprehensive investigation of all possible features which can be effective on stocks risk and return are identified. Then, in the next stage, risk and return are predicted by applying data mining techniques for the given features. Finally, we develop a hybrid algorithm, based on some filters and function-based clustering; and re-predicted the risk and return of stocks. In the second approach, instead of using single classifiers, a fusion model is proposed based on the use of multiple diverse base classifiers that operate on a common input and a meta-classifier that learns from base classifiers’ outputs to obtain a more precise stock return and risk predictions. A set of diversity methods, including Bagging, Boosting, and AdaBoost, is applied to create diversity in classifier combinations. Moreover, the number and procedure for selecting base classifiers for fusion schemes are determined using a methodology based on dataset clustering and candidate classifiers’ accuracy. Finally, in the third approach, a novel forecasting model for stock markets based on the wrapper ANFIS (Adaptive Neural Fuzzy Inference System) – ICA (Imperialist Competitive Algorithm) and technical analysis of Japanese Candlestick is presented. Two approaches of Raw-based and Signal-based are devised to extract the model’s input variables and buy and sell signals are considered as output variables. To illustrate the methodologies, for the first and second approaches, Tehran Stock Exchange (TSE) data for the period from 2002 to 2012 are applied, while for the third approach, we used General Motors and Dow Jones indexes.154 - Katedra financívyhově

    Fuzzy-Granular Based Data Mining for Effective Decision Support in Biomedical Applications

    Get PDF
    Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms of prediction accuracy. More importantly, FARs can provide strong decision support on disease diagnoses due to their easy interpretability. This dissertation also proposes a fuzzy-granular method to select informative and discriminative genes from huge microarray gene expression data. With fuzzy granulation, information loss in the process of gene selection is decreased. As a result, more informative genes for cancer classification are selected and more accurate classifiers can be modeled. Empirical studies show that the proposed method is more accurate than traditional algorithms for cancer classification. And hence we expect that genes being selected can be more helpful for further biological studies

    Selecting the best measures to discover quantitative association rules

    Get PDF
    The majority of the existing techniques to mine association rules typically use the support and the confidence to evaluate the quality of the rules obtained. However, these two measures may not be sufficient to properly assess their quality due to some inherent drawbacks they present. A review of the literature reveals that there exist many measures to evaluate the quality of the rules, but that the simultaneous optimization of all measures is complex and might lead to poor results. In this work, a principal components analysis is applied to a set of measures that evaluate quantitative association rules' quality. From this analysis, a reduced subset of measures has been selected to be included in the fitness function in order to obtain better values for the whole set of quality measures, and not only for those included in the fitness function. This is a general-purpose methodology and can, therefore, be applied to the fitness function of any algorithm. To validate if better results are obtained when using the function fitness composed of the subset of measures proposed here, the existing QARGA algorithm has been applied to a wide variety of datasets. Finally, a comparative analysis of the results obtained by means of the application of QARGA with the original fitness function is provided, showing a remarkable improvement when the new one is used.Ministerio de Ciencia y Tecnología TIN2011-28956-C0

    Hybrid ACO and SVM algorithm for pattern classification

    Get PDF
    Ant Colony Optimization (ACO) is a metaheuristic algorithm that can be used to solve a variety of combinatorial optimization problems. A new direction for ACO is to optimize continuous and mixed (discrete and continuous) variables. Support Vector Machine (SVM) is a pattern classification approach originated from statistical approaches. However, SVM suffers two main problems which include feature subset selection and parameter tuning. Most approaches related to tuning SVM parameters discretize the continuous value of the parameters which will give a negative effect on the classification performance. This study presents four algorithms for tuning the SVM parameters and selecting feature subset which improved SVM classification accuracy with smaller size of feature subset. This is achieved by performing the SVM parameters’ tuning and feature subset selection processes simultaneously. Hybridization algorithms between ACO and SVM techniques were proposed. The first two algorithms, ACOR-SVM and IACOR-SVM, tune the SVM parameters while the second two algorithms, ACOMV-R-SVM and IACOMV-R-SVM, tune the SVM parameters and select the feature subset simultaneously. Ten benchmark datasets from University of California, Irvine, were used in the experiments to validate the performance of the proposed algorithms. Experimental results obtained from the proposed algorithms are better when compared with other approaches in terms of classification accuracy and size of the feature subset. The average classification accuracies for the ACOR-SVM, IACOR-SVM, ACOMV-R and IACOMV-R algorithms are 94.73%, 95.86%, 97.37% and 98.1% respectively. The average size of feature subset is eight for the ACOR-SVM and IACOR-SVM algorithms and four for the ACOMV-R and IACOMV-R algorithms. This study contributes to a new direction for ACO that can deal with continuous and mixed-variable ACO
    corecore