8,915 research outputs found

    Dealing with imbalanced and weakly labelled data in machine learning using fuzzy and rough set methods

    Get PDF

    Fuzzy Rough Sets for Self-Labelling: an Exploratory Analysis

    Get PDF
    Semi-supervised learning incorporates aspects of both supervised and unsupervised learning. In semi-supervised classification, only some data instances have associated class labels, while others are unlabelled. One particular group of semi-supervised classification approaches are those known as self-labelling techniques, which attempt to assign class labels to the unlabelled data instances. This is achieved by using the class predictions based upon the information of the labelled part of the data. In this paper, the applicability and suitability of fuzzy rough set theory for the task of self-labelling is investigated. An important preparatory experimental study is presented that evaluates how accurately different fuzzy rough set models can predict the classes of unlabelled data instances for semi-supervised classification. The predictions are made either by considering only the labelled data instances or by involving the unlabelled data instances as well. A stability analysis of the predictions also helps to provide further insight into the characteristics of the different fuzzy rough models. Our study shows that the ordered weighted average based fuzzy rough model performs best in terms of both accuracy and stability. Our conclusions offer a solid foundation and rationale that will allow the construction of a fuzzy rough self-labelling technique. They also provide an understanding of the applicability of fuzzy rough sets for the task of semi-supervised classification in general

    Fuzzy-Rough Set based Semi-Supervised Learning

    Get PDF
    Abstract—Much work has been carried out in the area of fuzzy-rough sets for supervised learning. However, very little has been accomplished for the unsupervised or semi-supervised tasks. For many real-word applications, it is often expensive, time-consuming and difficult to obtain labels for all data objects. This often results in large quantities of data which may only have very few labelled data objects. This paper proposes a novel fuzzy-rough based semi-supervised self-learning or self-training approach for the assignment of labels to unlabelled data. Unlike other semi-supervised approaches, the proposed technique requires no subjective thresholding or domain information. An experimental evaluation is performed on artificial data and also applied to a real-world mammographic risk assessment problem with encouraging results. Index Terms—Rough sets, fuzzy sets, mammographic analysis, semi-supervised learning I

    Application of Computational Intelligence Techniques to Process Industry Problems

    Get PDF
    In the last two decades there has been a large progress in the computational intelligence research field. The fruits of the effort spent on the research in the discussed field are powerful techniques for pattern recognition, data mining, data modelling, etc. These techniques achieve high performance on traditional data sets like the UCI machine learning database. Unfortunately, this kind of data sources usually represent clean data without any problems like data outliers, missing values, feature co-linearity, etc. common to real-life industrial data. The presence of faulty data samples can have very harmful effects on the models, for example if presented during the training of the models, it can either cause sub-optimal performance of the trained model or in the worst case destroy the so far learnt knowledge of the model. For these reasons the application of present modelling techniques to industrial problems has developed into a research field on its own. Based on the discussion of the properties and issues of the data and the state-of-the-art modelling techniques in the process industry, in this paper a novel unified approach to the development of predictive models in the process industry is presented

    An Efficient Classification Model using Fuzzy Rough Set Theory and Random Weight Neural Network

    Get PDF
    In the area of fuzzy rough set theory (FRST), researchers have gained much interest in handling the high-dimensional data. Rough set theory (RST) is one of the important tools used to pre-process the data and helps to obtain a better predictive model, but in RST, the process of discretization may loss useful information. Therefore, fuzzy rough set theory contributes well with the real-valued data. In this paper, an efficient technique is presented based on Fuzzy rough set theory (FRST) to pre-process the large-scale data sets to increase the efficacy of the predictive model. Therefore, a fuzzy rough set-based feature selection (FRSFS) technique is associated with a Random weight neural network (RWNN) classifier to obtain the better generalization ability. Results on different dataset show that the proposed technique performs well and provides better speed and accuracy when compared by associating FRSFS with other machine learning classifiers (i.e., KNN, Naive Bayes, SVM, decision tree and backpropagation neural network)

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust
    • …
    corecore