23 research outputs found

    Coalitions’ Weights in a Dispersed System with Pawlak Conflict Model

    Get PDF
    The article addresses the issues related to making decisions by an ensemble of classifiers. Classifiers are built based on local tables, the set of local tables is called a dispersed knowledge. The paper discusses a novel application of Pawlak analysis model to examine the relations between classifiers and to create coalitions of classifiers. Each coalition has access to some aggregated knowledge on the basis of which joint decisions are made. Various types of coalitions are formed—a strong coalitions consisting of a large number and significant classifiers, and a weak coalitions consisting of insignificant classifiers. The new contributions of the paper is a systematical investigation of the weights of coalitions that influence the final decision. Four different method of calculating the strength of the coalitions have been applied. Each of these methods consider another aspect of the structure of the coalitions. Generally, it has been experimentally confirmed that, for a method that correctly identifies the relations between base classifiers, the use of coalitions weights improves the quality of classification. More specifically, it has been statistically confirmed that the best results are generated by the weighting method that is based on the size of the coalitions and the method based on the unambiguous of the decisions

    Are coalitions needed when classifiers make decisions?

    Get PDF
    Cooperation and coalitions’ formation are usually the preferred behavior when conflict situation occurs in real life. The question arises: is this approach should also be used when an ensemble of classifiers makes decisions? In this paper different approaches to classification based on dispersed knowledge are analysed and compared. The first group of approaches does not generate coalitions. Each local classifier generate a classification vector based on the local table, and then one of the most popular fusion methods is used (the sum method or the maximum method). In addition, the approach in which the final classification is made by the strongest classifier is analysed. The second group of approaches uses a coalitions creating method. The final classification is generated based on the coalitions’ predictions by using the two, mentioned above, fusion methods. In addition, the approach is analysed in which the final classification is made by the strongest coalition. For both groups of approaches, with and without coalitions, methods based on the maximum correlation and methods based on the covering rules are considered. The main conclusion that is made in this article is as follows. When classifiers generate fair and rational classification vectors, it is better to consider a coalition-based approach and the fusion method that collectively takes into account all vectors generated by classifiers

    Study on the Twoing Criterion with Pre-pruning and Bagging Method for Dispersed Data

    Get PDF
    In the paper, the issues related to classification based on dispersed data are considered. Dispersed data’s idea is to be able to effectively make use of data collected independently from different information systems/sources/units on a single topic in the development of a classification model that could classify a new object irrespective of the information systems/source/unit the object is from. As in Federated learning approaches, also here, data is protected and not shared between the owners. Local models are built using the bagging method and decision trees with the Twoing criterion. Only prediction vectors generated based on the local models are sent to central server. Final aggregation is done using majority voting. The main purpose of the paper is to study the quality of classification obtained with the proposed approach. Another goal is to investigate the impact of the pre-pruning tree process on the quality of classification. Moreover, the comparison of results obtained for the Twoing criterion and for the Gini index during the tree construction is presented. The experiments were performed on seventeen dispersed data sets, two of which reflect the natural dispersion that occurs in reality – dispersed medical data collected by different hospitals and dispersed medical data collected in different countries. The contribution of this paper is to observe the effectiveness of using Twoing criteria as a splitting criterion together with bagging method in development of classification model for data stored in independent dispersed sources

    Influence of Noise and Data Characteristics on Classification Quality of Dispersed Data Using Neural Networks on the Fusion of Predictions

    Get PDF
    In this paper, the issues of classification based on dispersed data are considered. For this purpose, an approach is used in which prediction vectors are generated locally using the k-nearest neighbors classifier. However, in central server, the final fusion of prediction vectors is made with the use of a neural network. The main aim of the study is to check the influence of various data characteristics (the number of conditional attributes, the number of objects, the number of decision classes) and the degree of dispersion and noise intensity on the quality of classification of the considered approach. For this purpose, 270 data sets were generated that differed by the above factors. Experiments were carried out using these data sets and statistical tests were performed. It was found that each of the examined factors has a statistically significant impact on the quality of classification. However, the number of conditional attributes, degree of dispersion, and noise intensity have the greatest impact. Multidimensionality in dispersed data affects the results positively, but the analyzed method is only resistant to a certain degree of noise intensity and dispersion

    Stop Criterion in Building Decision Trees with Bagging Method for Dispersed Data

    Get PDF
    This article discusses issues related to decision making based on applying decision trees and bagging methods on dispersed knowledge. In dispersed knowledge, local decision tables possess data independently in fragments. In this study, sub-tables are further generated with bagging method for each local table, based on which the decision trees are built. These decision trees classify the test object, and a probability vector is defined over the decision classes for each local table. For each vector, decision classes with the maximum value of the coordinates are selected and final joint decisions for all local tables are made by majority voting. Quality of decision making has been observed to increase when bagging method as an ensemble method is combined with decision trees on independent dispersed data. An important criterion in building a decision tree is to know when to stop growing the tree (stop splitting). That is, at what minimum number of objects on a working node do we stop building the tree to ensure the best decision results. The contribution of the paper is to observe the influence a stop criterion (expressed in the number of objects in the node) for decision trees used in conjunction with bagging method on independent data sources. It can be concluded that in dispersed data set, the stop split criteria does not influence the classification quality much. The statistical significance of the difference in the mean classification error values was confirmed only for a very high stop criterion (0.1× number of objects in training set) and for a very low stop criterion (equal to two). There is no significant statistical difference in the classification quality obtained for the stop criterion values: 4, 6, 8 and 10. An interesting remark is that for some dispersed data sets, in the case of smaller number of local tables and larger number of bootstrap samples, better quality of classification is obtained for a small number of objects in the stop criterion (mostly for two objects). Only, at a significant increase in the minimum number of objects at which growth of trees is stopped is quality of classification affected. However, the gain in reducing the complexity for trees that we get when using the larger values of stop criterion is significant

    Neural Network Used for the Fusion of Predictions Obtained by the K-Nearest Neighbors Algorithm Based on Independent Data Sources

    Get PDF
    The article concerns the problem of classification based on independent data sets-local decision tables. The aim of the paper is to propose a classification model for dispersed data using a modified k-nearest neighbors algorithm and a neural network. A neural network, more specifically a multilayer perceptron, is used to combine the prediction results obtained based on local tables. Prediction results are stored in the measurement level and generated using a modified k-nearest neighbors algorithm. The task of neural networks is to combine these results and provide a common prediction. In the article various structures of neural networks (different number of neurons in the hidden layer) are studied and the results are compared with the results generated by other fusion methods, such as the majority voting, the Borda count method, the sum rule, the method that is based on decision templates and the method that is based on theory of evidence. Based on the obtained results, it was found that the neural network always generates unambiguous decisions, which is a great advantage as most of the other fusion methods generate ties. Moreover, if only unambiguous results were considered, the use of a neural network gives much better results than other fusion methods. If we allow ambiguity, some fusion methods are slightly better, but it is the result of this fact that it is possible to generate few decisions for the test object

    Pawlak's Conflict Model: Directions of Development

    Full text link

    Comparative International Research in the Area of Educational Platforms and MOOCS: an Opinion of It Students Using Data Mining Analysis

    Get PDF
    In the paper, a study was conducted using data mining methods such as decision tree, NaĂŻve Bayes classifier and the Generalized linear model to detect patterns in data obtained from a questionnaire on educational platforms. The questionnaire was conducted among students from three countries Poland, Kazakhstan and Ukraine. Questions on the frequency of use of educational platforms, various platforms that are most popular among students, the most popular topics of courses on educational platforms and the use of Git systems, and the Stack Overflow platform were analysed and some interesting relations were identified in the pape

    C/C++ Specific

    Get PDF
    Erasmus+ FITPED Work-Based Learning in Future IT Professionals Education Project 2018-1-SK01-KA203-04638
    corecore