174,869 research outputs found

    Visual Knowledge Discovery with General Line Coordinates

    Full text link
    Understanding black-box Machine Learning methods on multidimensional data is a key challenge in Machine Learning. While many powerful Machine Learning methods already exist, these methods are often unexplainable or perform poorly on complex data. This paper proposes visual knowledge discovery approaches based on several forms of lossless General Line Coordinates. These are an expansion of the previously introduced General Line Coordinates Linear and Dynamic Scaffolding Coordinates to produce, explain, and visualize non-linear classifiers with explanation rules. To ensure these non-linear models and rules are accurate, General Line Coordinates Linear also developed new interactive visual knowledge discovery algorithms for finding worst-case validation splits. These expansions are General Line Coordinates non-linear, interactive rules linear, hyperblock rules linear, and worst-case linear. Experiments across multiple benchmark datasets show that this visual knowledge discovery method can compete with other visual and computational Machine Learning algorithms while improving both interpretability and accuracy in linear and non-linear classifications. Major benefits from these expansions consist of the ability to build accurate and highly interpretable models and rules from hyperblocks, the ability to analyze interpretability weaknesses in a model, and the input of expert knowledge through interactive and human-guided visual knowledge discovery methods.Comment: 44 pages, 26 figures, 3 table

    Visual Knowledge Discovery and Machine Learning for Investment Strategy

    Get PDF
    Knowledge discovery is an important aspect of human cognition. The advantage of the visual approach is in opportunity to substitute some complex cognitive tasks by easier perceptual tasks. However for cognitive tasks such as financial investment decision making this opportunity faces the challenge that financial data are abstract multidimensional and multivariate, i.e., outside of traditional visual perception in 2D or 3D world. This paper presents an approach to find an investment strategy based on pattern discovery in multidimensional space of specifically prepared time series. Visualization based on the lossless Collocated Paired Coordinates (CPC) plays an important role in this approach for building the criteria in the multidimensional space for finding an efficient investment strategy. Criteria generated with the CPC approach allow reducing/compressing space using simple directed graphs with beginnings and the ends located in different time points. The dedicated subspaces constructed for time series include characteristics such as Bollinger Band, difference between moving averages, changes in volume etc. Extensive simulation studies have been performed in learning/testing context. Effective relations were found for one-hour EURUSD pair for recent and historical data. Also the method has been explored for one-day EURUSD time series n 2D and 3D visualization spaces. The main positive result is finding the effective split of a normalized 3D space on 4x4x4 cubes in the visualization space that leads to a profitable investment decision (long, short position or nothing). The strategy is ready for implementation in algotrading mode

    An unsupervised approach to Geographical Knowledge Discovery using street level and street network images

    Get PDF
    Recent researches have shown the increasing use of machine learn-ing methods in geography and urban analytics, primarily to extract features and patterns from spatial and temporal data using a supervised approach. Researches integrating geographical processes in machine learning models and the use of unsupervised approacheson geographical data for knowledge discovery had been sparse. This research contributes to the ladder, where we show how latent variables learned from unsupervised learning methods on urbanimages can be used for geographic knowledge discovery. In particular, we propose a simple approach called Convolutional-PCA(ConvPCA) which are applied on both street level and street network images to find a set of uncorrelated and ordered visual latentcomponents. The approach allows for meaningful explanations using a combination of geographical and generative visualisations to explore the latent space, and to show how the learned representation can be used to predict urban characteristics such as streetquality and street network attributes. The research also finds that the visual components from the ConvPCA model achieves similaraccuracy when compared to less interpretable dimension reduction techniques.Comment: SigSpatial 2019 GeoA

    User-centered visual analysis using a hybrid reasoning architecture for intensive care units

    Get PDF
    One problem pertaining to Intensive Care Unit information systems is that, in some cases, a very dense display of data can result. To ensure the overview and readability of the increasing volumes of data, some special features are required (e.g., data prioritization, clustering, and selection mechanisms) with the application of analytical methods (e.g., temporal data abstraction, principal component analysis, and detection of events). This paper addresses the problem of improving the integration of the visual and analytical methods applied to medical monitoring systems. We present a knowledge- and machine learning-based approach to support the knowledge discovery process with appropriate analytical and visual methods. Its potential benefit to the development of user interfaces for intelligent monitors that can assist with the detection and explanation of new, potentially threatening medical events. The proposed hybrid reasoning architecture provides an interactive graphical user interface to adjust the parameters of the analytical methods based on the users' task at hand. The action sequences performed on the graphical user interface by the user are consolidated in a dynamic knowledge base with specific hybrid reasoning that integrates symbolic and connectionist approaches. These sequences of expert knowledge acquisition can be very efficient for making easier knowledge emergence during a similar experience and positively impact the monitoring of critical situations. The provided graphical user interface incorporating a user-centered visual analysis is exploited to facilitate the natural and effective representation of clinical information for patient care

    Interpretable Machine Learning for Self-service High-risk Decision Making

    Get PDF
    This research contributes to interpretable machine learning via visual knowledge discovery in General Line Coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and GLC are combined to create a visual self-service machine learning model. Two variants of GLC known as Dynamic Scaffold Coordinates (DSC) are proposed. DSC1 and DSC2 can map in a lossless manner multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a dynamic scaffolding graph construction algorithm. Hyperblock analysis is used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize in a lossless manner n-D data in accordance with a decision tree model. For large decision trees with many branches such as MNIST handwritten digits where hyperblock discovery was hampered, dimensionality reduction techniques such as principal component analysis, singular value decomposition, and t-distributed stochastic neighbor embedding were used to create new attributes of interest for visual class separation. Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery. A software package referred to as Dynamic Scaffold Coordinates Visualization System (DSCViz) was created to showcase the DSC1 and DSC2 systems. DSCViz expands the end-user’s capabilities by offering several functions such as real-time drag and zoom, scaling techniques, sample clipping, attribute reordering, and the ability to hide classes or change their colors. DSC2 was used to estimate and visualize the worst-case validation splits in the Wisconsin Breast Cancer, Iris, and Seeds dataset. DSC2 was also plotted against MNIST Handwritten digits to determine its feasibility in large datasets. In general, the technique of estimating worst-case validation splits is important for every high-risk application

    Transformation of an uncertain video search pipeline to a sketch-based visual analytics loop

    Get PDF
    Traditional sketch-based image or video search systems rely on machine learning concepts as their core technology. However, in many applications, machine learning alone is impractical since videos may not be semantically annotated sufficiently, there may be a lack of suitable training data, and the search requirements of the user may frequently change for different tasks. In this work, we develop a visual analytics systems that overcomes the shortcomings of the traditional approach. We make use of a sketch-based interface to enable users to specify search requirement in a flexible manner without depending on semantic annotation. We employ active machine learning to train different analytical models for different types of search requirements. We use visualization to facilitate knowledge discovery at the different stages of visual analytics. This includes visualizing the parameter space of the trained model, visualizing the search space to support interactive browsing, visualizing candidature search results to support rapid interaction for active learning while minimizing watching videos, and visualizing aggregated information of the search results. We demonstrate the system for searching spatiotemporal attributes from sports video to identify key instances of the team and player performance. © 1995-2012 IEEE

    Decreasing Occlusion and Increasing Explanation in Interactive Visual Knowledge Discovery

    Get PDF
    Lack of explanation and occlusion are the major problems for interactive visual knowledge discovery, machine learning and data mining in multidimensional data. This thesis proposes a hybrid method that combines visual and analytical means to deal with these problems. This method, denoted as FSP, uses visualization of n-D data in 2-D in a set of Shifted Paired Coordinates (SPC). SPC for n-D data consists of n/2 pairs of Cartesian coordinates that are shifted relative to each other to avoid their overlap. Each n-D point is represented as a directed graph in SPC. It is shown that the FSP method simplifies pattern discovery in n-D data providing explainable rules in a visual form with significantly decrease of the cognitive load for analysis of n-D data. The computational experiments on real data has shown its efficiency on both training and validation data

    Deep Learning of 2-D Images Representing n-D Data in General Line Coordinates

    Get PDF
    While knowledge discovery and n-D data visualization procedures are often efficient, the loss of information, occlusion, and clutter continue to be a challenge. General Line Coordinates (GLC) is a rather new technique to deal with such artifacts. GLC-Linear, which is one of the methods in GLC, allows transforming n-D numerical data to their visual representation as polylines losslessly. The method proposed in this paper uses these 2-D visual representations as input to Convolutional Neural Network (CNN) classifiers. The obtained classification accuracies are close to the ones obtained by other machine learning algorithms. The main benefit of the method is the possibility to use the lossless visualization of n-dimensional data for interpretation and explanation of the discovered relationships besides the classical classification using statistical learning strategies

    Full Interpretable Machine Learning Method with In-line Coordinates

    Get PDF
    This thesis explores a new approach for machine learning classification task in 2-dimensional space (2-D ML) with In-line Coordinates. This is a full machine learning approach that does not require to deal with n-dimensional data in n-dimensional space. In-line coordinates method allows discovering n-D patterns in 2-D space without loss of n-D information using graph representation of n-D data in 2-D. Specifically, this thesis shows that it can be done with In-line Based Coordinates in different modifications, which are defined, including static and dynamic ones. Some classification and regression algorithms based on these In-line Coordinates were explored. Two successful cases studies based on benchmark datasets (Wisconsin Breast Cancer dataset and Page Block Classification dataset) demonstrated the feasibility of the approach. This approach helps to consolidate further a whole new area of full 2-D machine learning with a respective methodology. In-line coordinates method has advantages to actively include the end-users into the discovering of models and their justification. Another advantage is providing interpretable ML models. Keywords— interpretable machine learning, classification, regression, visual knowledge discovery

    Constructing Interactive Visual Classification, Clustering and Dimension Reduction Models for n-D Data

    Get PDF
    The exploration of multidimensional datasets of all possible sizes and dimensions is a long-standing challenge in knowledge discovery, machine learning, and visualization. While multiple efficient visualization methods for n-D data analysis exist, the loss of information, occlusion, and clutter continue to be a challenge. This paper proposes and explores a new interactive method for visual discovery of n-D relations for supervised learning. The method includes automatic, interactive, and combined algorithms for discovering linear relations, dimension reduction, and generalization for non-linear relations. This method is a special category of reversible General Line Coordinates (GLC). It produces graphs in 2-D that represent n-D points losslessly, i.e., allowing the restoration of n-D data from the graphs. The projections of graphs are used for classification. The method is illustrated by solving machine-learning classification and dimension-reduction tasks from the domains of image processing, computer-aided medical diagnostics, and finance. Experiments conducted on several datasets show that this visual interactive method can compete in accuracy with analytical machine learning algorithms
    corecore