2,459 research outputs found

    Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets

    Get PDF
    In many real application areas, the data used are highly skewed and the number of instances for some classes are much higher than that of the other classes. Solving a classification task using such an imbalanced data-set is difficult due to the bias of the training towards the majority classes. The aim of this paper is to improve the performance of fuzzy rule based classification systems on imbalanced domains, increasing the granularity of the fuzzy partitions on the boundary areas between the classes, in order to obtain a better separability. We propose the use of a hierarchical fuzzy rule based classification system, which is based on the refinement of a simple linguistic fuzzy model by means of the extension of the structure of the knowledge base in a hierarchical way and the use of a genetic rule selection process in order to get a compact and accurate model. The good performance of this approach is shown through an extensive experimental study carried out over a large collection of imbalanced data-sets.Spanish Ministry of Education and Science (MEC) under Projects TIN-2005-08386-C05-01 and TIN-2005-08386- C05-0

    Towards Smart Data Technologies for Big Data Analytics

    Get PDF
    Currently the publicly available datasets for Big Data Ana-lytics are of different qualities, and obtaining the expected behavior from the Machine Learning algorithms is crucial. Furthermore, since working with a huge amount of data is usually a time-demanding task, tohave high quality data is required. Smart Data refers to the process of transforming Big Data into clean and reliable data, and this can be accomplished by converting them, reducing unnecessary volume of data or applying some preprocessing techniques with the aim of improve their quality, and still to obtain trustworthy results. We present those properties that affect the quality of data. Also, the available proposals to analyze the quality of huge amount of data and to cope with low quality datasets in an scalable way, are commented. Furthermore, the need for a methodology towards Smart Data is highlighted.Instituto de Investigación en InformáticaInstituto de Investigación en Informátic

    Dynamic affinity-based classification of multi-class imbalanced data with one-versus-one decomposition : a fuzzy rough set approach

    Get PDF
    Class imbalance occurs when data elements are unevenly distributed among classes, which poses a challenge for classifiers. The core focus of the research community has been on binary-class imbalance, although there is a recent trend toward the general case of multi-class imbalanced data. The IFROWANN method, a classifier based on fuzzy rough set theory, stands out for its performance in two-class imbalanced problems. In this paper, we consider its extension to multi-class data by combining it with one-versus-one decomposition. The latter transforms a multi-class problem into two-class sub-problems. Binary classifiers are applied to these sub-problems, after which their outcomes are aggregated into one prediction. We enhance the integration of IFROWANN in the decomposition scheme in two steps. Firstly, we propose an adaptive weight setting for the binary classifier, addressing the varying characteristics of the sub-problems. We call this modified classifier IFROWANN-WIR. Second, we develop a new dynamic aggregation method called WV–FROST that combines the predictions of the binary classifiers with the global class affinity before making a final decision. In a meticulous experimental study, we show that our complete proposal outperforms the state-of-the-art on a wide range of multi-class imbalanced datasets

    An Analysis of the Rule Weights and Fuzzy Reasoning Methods for Linguistic Rule Based Classification Systems Applied to Problems with Highly Imbalanced Data Sets

    Get PDF
    In this contribution we carry out an analysis of the rule weights and Fuzzy Reasoning Methods for Fuzzy Rule Based Classification Systems in the framework of imbalanced data-sets with a high imbalance degree. We analyze the behaviour of the Fuzzy Rule Based Classification Systems searching for the best configuration of rule weight and Fuzzy Reasoning Method also studying the cooperation of some pre-processing methods of instances. To do so we use a simple rule base obtained with the Chi (and co-authors’) method that extends the wellknown Wang and Mendel method to classification problems. The results obtained show the necessity to apply an instance preprocessing step and the clear differences in the use of the rule weight and Fuzzy Reasoning Method. Finally, it is empirically proved that there is a superior performance of Fuzzy Rule Based Classification Systems compared to the 1-NN and C4.5 classifiers in the framework of highly imbalanced data-sets.Spanish Projects TIN-2005-08386-C05-01 & TIC-2005-08386- C05-0

    Improving the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets and genetic amplitude tuning

    Get PDF
    Among the computational intelligence techniques employed to solve classification problems, Fuzzy Rule-Based Classification Systems (FRBCSs) are a popular tool because of their interpretable models based on linguistic variables, which are easier to understand for the experts or end-users. The aim of this paper is to enhance the performance of FRBCSs by extending the Knowledge Base with the application of the concept of Interval-Valued Fuzzy Sets (IVFSs). We consider a post-processing genetic tuning step that adjusts the amplitude of the upper bound of the IVFS to contextualize the fuzzy partitions and to obtain a most accurate solution to the problem. We analyze the goodness of this approach using two basic and well-known fuzzy rule learning algorithms, the Chi et al.’s method and the fuzzy hybrid genetics-based machine learning algorithm. We show the improvement achieved by this model through an extensive empirical study with a large collection of data-sets.This work has been supported by the Spanish Ministry of Science and Technology under projects TIN2008-06681-C06-01 and TIN2007-65981

    Why Linguistic Fuzzy Rule Based Classification Systems perform well in Big Data Applications?

    Get PDF
    The significance of addressing Big Data applications is beyond all doubt. The current ability of extracting interesting knowledge from large volumes of information provides great advantages to both corporations and academia. Therefore, researchers and practitioners must deal with the problem of scalability so that Machine Learning and Data Mining algorithms can address Big Data properly. With this end, the MapReduce programming framework is by far the most widely used mechanism to implement fault-tolerant distributed applications. This novel framework implies the design of a divide-and-conquer mechanism in which local models are learned separately in one stage (Map tasks) whereas a second stage (Reduce) is devoted to aggregate all sub-models into a single solution. In this paper, we focus on the analysis of the behavior of Linguistic Fuzzy Rule Based Classification Systems when embedded into a MapReduce working procedure. By retrieving different information regarding the rules learned throughout the MapReduce process, we will be able to identify some of the capabilities of this particular paradigm that allowed them to provide a good performance when addressing Big Data problems. In summary, we will show that linguistic fuzzy classifiers are a robust approach in case of scalability requirements.This work have been partially supported by the Spanish Ministry of Science and Technology under projects TIN2014-57251-P and TIN2015-68454-R

    Leveraging Users’ Trust and Reputation in Social Networks

    Get PDF
    In on line communities, where there is a huge number of users that interact under anonymous identities, it has been observed that e-word of mouth is a very powerful influence tool. So far, this technology is well known in on-line marketplaces, such as Amazon, eBay or travel based platforms like Tripadvisor or Booking. However, these trust based approach can be leverage in other scenarios from e-democracy to trust based recommendations on e-health context and e-learning systems. The purpose of this contribution is to analyse the main existing trust and reputation mechanisms and to point out new research challenges that needs to be accomplished with the objective of fully exploiting these systems in real world on-line communities.The authors would like to acknowledge the financial support from the EU project H2020-MSCA-IF-2016- DeciTrustNET-746398 and FEDER funds provided in the Spanish project TIN2016-75850-P

    An analysis of local and global solutions to address Big Data imbalanced classification: a case study with SMOTE preprocessing

    Get PDF
    Addressing the huge amount of data continuously generated is an important challenge in the Machine Learning field. The need to adapt the traditional techniques or create new ones is evident. To do so, distributed technologies have to be used to deal with the significant scalability constraints due to the Big Data context. In many Big Data applications for classification, there are some classes that are highly underrepresented, leading to what is known as the imbalanced classification problem. In this scenario, learning algorithms are often biased towards the majority classes, treating minority ones as outliers or noise. Consequently, preprocessing techniques to balance the class distribution were developed. This can be achieved by suppressing majority instances (undersampling) or by creating minority examples (oversampling). Regarding the oversampling methods, one of the most widespread is the SMOTE algorithm, which creates artificial examples according to the neighborhood of each minority class instance. In this work, our objective is to analyze the SMOTE behavior in Big Data as a function of some key aspects such as the oversampling degree, the neighborhood value and, specially, the type of distributed design (local vs. global).Instituto de Investigación en Informátic
    • …
    corecore