117,802 research outputs found
Explainable Machine Learning for Categorical and Mixed Data with Lossless Visualization
Building accurate and interpretable Machine Learning (ML) models for
heterogeneous/mixed data is a long-standing challenge for algorithms designed
for numeric data. This work focuses on developing numeric coding schemes for
non-numeric attributes for ML algorithms to support accurate and explainable ML
models, methods for lossless visualization of n-D non-numeric categorical data
with visual rule discovery in these visualizations, and accurate and
explainable ML models for categorical data. This study proposes a
classification of mixed data types and analyzes their important role in Machine
Learning. It presents a toolkit for enforcing interpretability of all internal
operations of ML algorithms on mixed data with a visual data exploration on
mixed data. A new Sequential Rule Generation (SRG) algorithm for explainable
rule generation with categorical data is proposed and successfully evaluated in
multiple computational experiments. This work is one of the steps to the full
scope ML algorithms for mixed data supported by lossless visualization of n-D
data in General Line Coordinates beyond Parallel Coordinates.Comment: 46 pages, 32 figures, 29 tables. arXiv admin note: substantial text
overlap with arXiv:2206.0647
Multi-stage mixed rule learning approach for advancing performance of rule-based classification
Rule learning is a special type of machine learning approaches, and its key advantage is the generation of interpretable models, which provides a transparent process of showing how an input is mapped to an output. Traditional rule learning algorithms are typically based on Boolean logic for inducing rule antecedents, which are very effective for training models on data sets that involve discrete attributes only. When continuous attributes are present in a data set, traditional rule learning approaches need to employ crisp intervals. However, in reality, problems usually show shades of grey, which motivated the development of fuzzy rule learning approaches by employing fuzzy intervals for handling continuous attributes. While a data set contains a large portion of discrete attributes or even no continuous attributes, fuzzy approaches cannot be used to learn rules effectively, leading to a drop in the performance. In this paper, a multi-stage approach of mixed rule learning is proposed, which involves strategic combination of both traditional and fuzzy approaches to handle effectively various types of attributes. We compare our proposed approach with existing algorithms of rule learning. Our experimental results show that our proposed approach leads to significant advances in the performance compared with the existing algorithms
Criteria and Analysis for Human-Centered Browser Fingerprinting Countermeasures
Browser fingerprinting is a surveillance technique that uses browser and device attributes to track visitors across the web. Defeating fingerprinting requires blocking attribute information or spoofing attributes, which can result in loss of functionality. To address the challenge of escaping surveillance while obtaining functionality, we identify six design criteria for an ideal spoofing system. We present three fingerprint generation algorithms as well as a baseline algorithm that simply samples a dataset of fingerprints. For each algorithm, we identify trade-offs among the criteria: distinguishability from a non-spoofed fingerprint, uniqueness, size of the anonymity set, efficient generation, loss of web functionality, and whether or not the algorithm protects the confidentiality of the underlying dataset. We report on a series of experiments illustrating that the use of our partially-dependent algorithm for spoofing fingerprints will avoid detection by Machine Learning approaches to surveillance
An incremental approach to genetic algorithms based classification
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed
Incremental multiple objective genetic algorithms
This paper presents a new genetic algorithm approach to multi-objective optimization problemsIncremental Multiple Objective Genetic Algorithms (IMOGA). Different from conventional MOGA methods, it takes each objective into consideration incrementally. The whole evolution is divided into as many phases as the number of objectives, and one more objective is considered in each phase. Each phase is composed of two stages: first, an independent population is evolved to optimize one specific objective; second, the better-performing individuals from the evolved single-objective population and the multi-objective population evolved in the last phase are joined together by the operation of integration. The resulting population then becomes an initial multi-objective population, to which a multi-objective evolution based on the incremented objective set is applied. The experiment results show that, in most problems, the performance of IMOGA is better than that of three other MOGAs, NSGA-II, SPEA and PAES. IMOGA can find more solutions during the same time span, and the quality of solutions is better
- …