2 research outputs found
Interactive Visual Self-service Data Classification Approach to Democratize Machine Learning
Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. Such algorithms fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. iv Interactive pattern discovery becomes challenging while dealing with large datasets with hundreds of dimensions/features. To overcome this problem, an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm (GA) is proposed. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and GA helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification
Interactive Decision Tree Creation and Enhancement with Complete Visualization for Explainable Modeling
To increase the interpretability and prediction accuracy of the Machine
Learning (ML) models, visualization of ML models is a key part of the ML
process. Decision Trees (DTs) are essential in machine learning (ML) because
they are used to understand many black box ML models including Deep Learning
models. In this research, two new methods for creation and enhancement with
complete visualizing Decision Trees as understandable models are suggested.
These methods use two versions of General Line Coordinates (GLC): Bended
Coordinates (BC) and Shifted Paired Coordinates (SPC). The Bended Coordinates
are a set of line coordinates, where each coordinate is bended in a threshold
point of the respective DT node. In SPC, each n-D point is visualized in a set
of shifted pairs of 2-D Cartesian coordinates as a directed graph. These new
methods expand and complement the capabilities of existing methods to visualize
DT models more completely. These capabilities allow us to observe and analyze:
(1) relations between attributes, (2) individual cases relative to the DT
structure, (3) data flow in the DT, (4) sensitivity of each split threshold in
the DT nodes, and (5) density of cases in parts of the n-D space. These
features are critical for DT models' performance evaluation and improvement by
domain experts and end users as they help to prevent overgeneralization and
overfitting of the models. The advantages of this methodology are illustrated
in the case studies on benchmark real-world datasets. The paper also
demonstrates how to generalize them for decision tree visualizations in
different General Line Coordinates.Comment: 36 pages, 45 figures, 5 table