117,802 research outputs found

    Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization

    Full text link
    Building accurate and interpretable Machine Learning (ML) models for heterogeneous/mixed data is a long-standing challenge for algorithms designed for numeric data. This work focuses on developing numeric coding schemes for non-numeric attributes for ML algorithms to support accurate and explainable ML models, methods for lossless visualization of n-D non-numeric categorical data with visual rule discovery in these visualizations, and accurate and explainable ML models for categorical data. This study proposes a classification of mixed data types and analyzes their important role in Machine Learning. It presents a toolkit for enforcing interpretability of all internal operations of ML algorithms on mixed data with a visual data exploration on mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable rule generation with categorical data is proposed and successfully evaluated in multiple computational experiments. This work is one of the steps to the full scope ML algorithms for mixed data supported by lossless visualization of n-D data in General Line Coordinates beyond Parallel Coordinates.Comment: 46 pages, 32 figures, 29 tables. arXiv admin note: substantial text overlap with arXiv:2206.0647

    Multi-stage mixed rule learning approach for advancing performance of rule-based classification

    Get PDF
    Rule learning is a special type of machine learning approaches, and its key advantage is the generation of interpretable models, which provides a transparent process of showing how an input is mapped to an output. Traditional rule learning algorithms are typically based on Boolean logic for inducing rule antecedents, which are very effective for training models on data sets that involve discrete attributes only. When continuous attributes are present in a data set, traditional rule learning approaches need to employ crisp intervals. However, in reality, problems usually show shades of grey, which motivated the development of fuzzy rule learning approaches by employing fuzzy intervals for handling continuous attributes. While a data set contains a large portion of discrete attributes or even no continuous attributes, fuzzy approaches cannot be used to learn rules effectively, leading to a drop in the performance. In this paper, a multi-stage approach of mixed rule learning is proposed, which involves strategic combination of both traditional and fuzzy approaches to handle effectively various types of attributes. We compare our proposed approach with existing algorithms of rule learning. Our experimental results show that our proposed approach leads to significant advances in the performance compared with the existing algorithms

    Criteria and Analysis for Human-Centered Browser Fingerprinting Countermeasures

    Get PDF
    Browser fingerprinting is a surveillance technique that uses browser and device attributes to track visitors across the web. Defeating fingerprinting requires blocking attribute information or spoofing attributes, which can result in loss of functionality. To address the challenge of escaping surveillance while obtaining functionality, we identify six design criteria for an ideal spoofing system. We present three fingerprint generation algorithms as well as a baseline algorithm that simply samples a dataset of fingerprints. For each algorithm, we identify trade-offs among the criteria: distinguishability from a non-spoofed fingerprint, uniqueness, size of the anonymity set, efficient generation, loss of web functionality, and whether or not the algorithm protects the confidentiality of the underlying dataset. We report on a series of experiments illustrating that the use of our partially-dependent algorithm for spoofing fingerprints will avoid detection by Machine Learning approaches to surveillance

    An incremental approach to genetic algorithms based classification

    Get PDF
    Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed

    Incremental multiple objective genetic algorithms

    Get PDF
    This paper presents a new genetic algorithm approach to multi-objective optimization problemsIncremental Multiple Objective Genetic Algorithms (IMOGA). Different from conventional MOGA methods, it takes each objective into consideration incrementally. The whole evolution is divided into as many phases as the number of objectives, and one more objective is considered in each phase. Each phase is composed of two stages: first, an independent population is evolved to optimize one specific objective; second, the better-performing individuals from the evolved single-objective population and the multi-objective population evolved in the last phase are joined together by the operation of integration. The resulting population then becomes an initial multi-objective population, to which a multi-objective evolution based on the incremented objective set is applied. The experiment results show that, in most problems, the performance of IMOGA is better than that of three other MOGAs, NSGA-II, SPEA and PAES. IMOGA can find more solutions during the same time span, and the quality of solutions is better

    Collaborative decision making by ensemble rule based classification systems

    Get PDF
    corecore