103 research outputs found

    A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications With Imbalanced Data

    Get PDF
    The current financial crisis has stressed the need to obtain more accurate prediction models in order to decrease risk when investing money on economic opportunities. In addition, the transparency of the process followed to make the decisions in financial applications is becoming an important issue. Furthermore, there is a need to handle real-world imbalanced financial datasets without using sampling techniques that might introduce noise in the used data. In this paper, we present a compact evolutionary interval-valued fuzzy rule-based classification system, which is based on interval-valued fuzzy rule-based classification system with tuning and rule selection (IVTURS FA RC-HD ) for the modeling and prediction of real-world financial applications. This proposed system allows obtaining good prediction accuracies using a small set of short fuzzy rules implying a high degree of interpretability of the generated linguistic model. Furthermore, the proposed system deals with the financial imbalanced datasets with no need for any preprocessing or sampling method and, thus, avoiding the accidental introduction of noise in the data used in the learning process. The system is also provided with a mechanism to handle examples that are not covered by any fuzzy rule in the generated rule base. To test the quality of our proposal, we will present an experimental study including 11 real-world financial datasets. We will show that the proposed system outperforms the original C4.5 decision tree, type-1, and interval-valued fuzzy counterparts that use the synthetic minority oversampling technique (SMOTE) to preprocess data and the original FURIA, which is a fuzzy approximative classifier. Furthermore, the proposed method enhances the results achieved by the cost-sensitive C4.5, and it gives competitive results when compared with FURIA using SMOTE, while our proposal avoids preprocessing techniques, and it provides interpretable models that allow obtaining more accurate results

    A Compact Evolutionary Interval-Valued Fuzzy Rule-Based Classification System for the Modeling and Prediction of Real-World Financial Applications with Imbalanced Data

    Get PDF
    The current financial crisis has stressed the need of obtaining more accurate prediction models in order to decrease the risk when investing money on economic opportunities. In addition, the transparency of the process followed to make the decisions in financial applications is becoming an important issue. Furthermore, there is a need to handle the real-world imbalanced financial data sets without using sampling techniques which might introduce noise in the used data. In this paper, we present a compact evolutionary interval-valued fuzzy rule-based classification system, which is based on IVTURSFARC-HD (Interval-Valued fuzzy rulebased classification system with TUning and Rule Selection) [22]), for the modeling and prediction of real-world financial applications. This proposed system allows obtaining good predictions accuracies using a small set of short fuzzy rules implying a high degree of interpretability of the generated linguistic model. Furthermore, the proposed system deals with the financial imbalanced datasets with no need for any preprocessing or sampling method and thus avoiding the accidental introduction of noise in the data used in the learning process. The system is also provided with a mechanism to handle examples that are not covered by any fuzzy rule in the generated rule base. To test the quality of our proposal, we will present an experimental study including eleven realworld financial datasets. We will show that the proposed system outperforms the original C4.5 decision tree, type-1 and interval-valued fuzzy counterparts which use the SMOTE sampling technique to preprocess data and the original FURIA, which is a fuzzy approximative classifier. Furthermore, the proposed method enhances the results achieved by the cost sensitive C4.5 and it gives competitive results when compared with FURIA using SMOTE, while our proposal avoids pre-processing techniques and it provides interpretable models that allow obtaining more accurate results.Spanish Government TIN2011-28488 TIN2013-40765-

    Neutrosophic rule-based prediction system for toxicity effects assessment of biotransformed hepatic drugs

    Get PDF
    Measuring toxicity is an important step in drug development. However, the current experimental meth- ods which are used to estimate the drug toxicity are expensive and need high computational efforts. Therefore, these methods are not suitable for large-scale evaluation of drug toxicity. As a consequence, there is a high demand to implement computational models that can predict drug toxicity risks. In this paper, we used a dataset that consists of 553 drugs that biotransformed in the liver

    Three-way Imbalanced Learning based on Fuzzy Twin SVM

    Full text link
    Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is connected with SVM, a standard binary classification model in machine learning, for solving imbalanced classification problems that SVM needs to improve. A new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM) are proposed. The new three-way fuzzy membership function is defined to increase the certainty of uncertain data in both input space and feature space, which assigns higher fuzzy membership to minority samples compared with majority samples. To evaluate the effectiveness of the proposed model, comparative experiments are designed for forty-seven different datasets with varying imbalance ratios. In addition, datasets with different imbalance ratios are derived from the same dataset to further assess the proposed model's performance. The results show that the proposed model significantly outperforms other traditional SVM-based methods

    Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection

    Get PDF
    This study proposes an alternate data extraction method that combines three well-known feature selection methods for handling large and problematic datasets: the correlation-based feature selection (CFS), best first search (BFS), and dominance-based rough set approach (DRSA) methods. This study aims to enhance the classifier’s performance in decision analysis by eliminating uncorrelated and inconsistent data values. The proposed method, named CFS-DRSA, comprises several phases executed in sequence, with the main phases incorporating two crucial feature extraction tasks. Data reduction is first, which implements a CFS method with a BFS algorithm. Secondly, a data selection process applies a DRSA to generate the optimized dataset. Therefore, this study aims to solve the computational time complexity and increase the classification accuracy. Several datasets with various characteristics and volumes were used in the experimental process to evaluate the proposed method’s credibility. The method’s performance was validated using standard evaluation measures and benchmarked with other established methods such as deep learning (DL). Overall, the proposed work proved that it could assist the classifier in returning a significant result, with an accuracy rate of 82.1% for the neural network (NN) classifier, compared to the support vector machine (SVM), which returned 66.5% and 49.96% for DL. The one-way analysis of variance (ANOVA) statistical result indicates that the proposed method is an alternative extraction tool for those with difficulties acquiring expensive big data analysis tools and those who are new to the data analysis field.Ministry of Higher Education under the Fundamental Research Grant Scheme (FRGS/1/2018/ICT04/UTM/01/1)Universiti Teknologi Malaysia (UTM) under Research University Grant Vot-20H04, Malaysia Research University Network (MRUN) Vot 4L876SPEV project, University of Hradec Kralove, Faculty of Informatics and Management, Czech Republic (ID: 2102–2021), “Smart Solutions in Ubiquitous Computing Environments

    IIVFDT: Ignorance Functions based Interval-Valued Fuzzy Decision Tree with Genetic Tuning

    Get PDF
    The choice of membership functions plays an essential role in the success of fuzzy systems. This is a complex problem due to the possible lack of knowledge when assigning punctual values as membership degrees. To face this handicap, we propose a methodology called Ignorance functions based Interval-Valued Fuzzy Decision Tree with genetic tuning, IIVFDT for short, which allows to improve the performance of fuzzy decision trees by taking into account the ignorance degree. This ignorance degree is the result of a weak ignorance function applied to the punctual value set as membership degree. Our IIVFDT proposal is composed of four steps: (1) the base fuzzy decision tree is generated using the fuzzy ID3 algorithm; (2) the linguistic labels are modeled with Interval-Valued Fuzzy Sets. To do so, a new parametrized construction method of Interval-Valued Fuzzy Sets is defined, whose length represents such ignorance degree; (3) the fuzzy reasoning method is extended to work with this representation of the linguistic terms; (4) an evolutionary tuning step is applied for computing the optimal ignorance degree for each Interval-Valued Fuzzy Set. The experimental study shows that the IIVFDT method allows the results provided by the initial fuzzy ID3 with and without Interval-Valued Fuzzy Sets to be outperformed. The suitability of the proposed methodology is shown with respect to both several state-of-the-art fuzzy decision trees and C4.5. Furthermore, we analyze the quality of our approach versus two methods that learn the fuzzy decision tree using genetic algorithms. Finally, we show that a superior performance can be achieved by means of the positive synergy obtained when applying the well known genetic tuning of the lateral position after the application of the IIVFDT method.Spanish Government TIN2011-28488 TIN2010-1505

    Ensemble classification of incomplete data – a non-imputation approach with an application in ovarian tumour diagnosis support

    Get PDF
    Wydział Matematyki i InformatykiW niniejszej pracy doktorskiej zająłem się problemem klasyfikacji danych niekompletnych. Motywacja do podjęcia badań ma swoje źródło w medycynie, gdzie bardzo często występuje zjawisko braku danych. Najpopularniejszą metodą radzenia sobie z tym problemem jest imputacja danych, będąca uzupełnieniem brakujących wartości na podstawie statystycznych zależności między cechami. W moich badaniach przyjąłem inną strategię rozwiązania tego problemu. Wykorzystując opracowane wcześniej klasyfikatory można przekształcić je do formy, która zwraca przedział możliwych predykcji. Następnie, poprzez zastosowanie operatorów agregacji oraz metod progowania, można dokonać finalnej klasyfikacji. W niniejszej pracy pokazuję jak dokonać ww. przekształcenia klasyfikatorów oraz jak wykorzystać strategie agregacji danych przedziałowych do klasyfikacji. Opracowane przeze mnie metody podnoszą jakość klasyfikacji danych niekompletnych w problemie wspomagania diagnostyki guzów jajnika. Dodatkowa analiza wyników na zewnętrznych zbiorach danych z repozytorium uczenia maszynowego Uniwersytetu Kalifornijskiego w Irvine (UCI) wskazuje, że przedstawione metody są komplementarne z imputacją.In this doctoral dissertation I focus on the problem of classification of incomplete data. The motivation for the research comes from medicine, where missing data phenomena are commonly encountered. The most popular method of dealing with data missingness is imputation; that is, inserting missing data on the basis of statistical relationships among features. In my research I choose a different strategy for dealing with this issue. Classifiers of a type previously developed can be transformed to a form which returns an interval of possible predictions. In the next step, with the use of aggregation operators and thresholding methods, one can make a final classification. I show how to make such transformations of classifiers and how to use aggregation strategies for interval data classification. These methods improve the quality of the process of classification of incomplete data in the problem of ovarian tumour diagnosis. Additional analysis carried out on external datasets from the University of California, Irvine (UCI) Machine Learning Repository shows that the aforementioned methods are complementary to imputation
    corecore