2,873 research outputs found

    The State-of-the-Art of Set Visualization

    Get PDF
    Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net

    Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information

    Get PDF
    Dimensionality reduction and manifold learning methods such as t-Distributed Stochastic Neighbor Embedding (t-SNE) are routinely used to map high-dimensional data into a 2-dimensional space to visualize and explore the data. However, two dimensions are typically insufficient to capture all structure in the data, the salient structure is often already known, and it is not obvious how to extract the remaining information in a similarly effective manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a generalization of t-SNE that discounts prior information from the embedding in the form of labels. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has one extra parameter over t-SNE; we investigate its effects and show how to efficiently optimize the objective. Factoring out prior knowledge allows complementary structure to be captured in the embedding, providing new insights. Qualitative and quantitative empirical results on synthetic and (large) real data show ct-SNE is effective and achieves its goal

    Explainable Neural Networks based Anomaly Detection for Cyber-Physical Systems

    Get PDF
    Cyber-Physical Systems (CPSs) are the core of modern critical infrastructure (e.g. power-grids) and securing them is of paramount importance. Anomaly detection in data is crucial for CPS security. While Artificial Neural Networks (ANNs) are strong candidates for the task, they are seldom deployed in safety-critical domains due to the perception that ANNs are black-boxes. Therefore, to leverage ANNs in CPSs, cracking open the black box through explanation is essential. The main objective of this dissertation is developing explainable ANN-based Anomaly Detection Systems for Cyber-Physical Systems (CP-ADS). The main objective was broken down into three sub-objectives: 1) Identifying key-requirements that an explainable CP-ADS should satisfy, 2) Developing supervised ANN-based explainable CP-ADSs, 3) Developing unsupervised ANN-based explainable CP-ADSs. In achieving those objectives, this dissertation provides the following contributions: 1) a set of key-requirements that an explainable CP-ADS should satisfy, 2) a methodology for deriving summaries of the knowledge of a trained supervised CP-ADS, 3) a methodology for validating derived summaries, 4) an unsupervised neural network methodology for learning cyber-physical (CP) behavior, 5) a methodology for visually and linguistically explaining the learned CP behavior. All the methods were implemented on real-world and benchmark datasets. The set of key-requirements presented in the first contribution was used to evaluate the performance of the presented methods. The successes and limitations of the presented methods were identified. Furthermore, steps that can be taken to overcome the limitations were proposed. Therefore, this dissertation takes several necessary steps toward developing explainable ANN-based CP-ADS and serves as a framework that can be expanded to develop trustworthy ANN-based CP-ADSs

    FDive: Learning Relevance Models using Pattern-based Similarity Measures

    Full text link
    The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.Comment: 12 pages, 7 figures, 2 tables, LaTeX; corrected typo; added DO

    Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

    Full text link
    This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs

    How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations

    Full text link
    Bidirectional Encoder Representations from Transformers (BERT) reach state-of-the-art results in a variety of Natural Language Processing tasks. However, understanding of their internal functioning is still insufficient and unsatisfactory. In order to better understand BERT and other Transformer-based models, we present a layer-wise analysis of BERT's hidden states. Unlike previous research, which mainly focuses on explaining Transformer models by their attention weights, we argue that hidden states contain equally valuable information. Specifically, our analysis focuses on models fine-tuned on the task of Question Answering (QA) as an example of a complex downstream task. We inspect how QA models transform token vectors in order to find the correct answer. To this end, we apply a set of general and QA-specific probing tasks that reveal the information stored in each representation layer. Our qualitative analysis of hidden state visualizations provides additional insights into BERT's reasoning process. Our results show that the transformations within BERT go through phases that are related to traditional pipeline tasks. The system can therefore implicitly incorporate task-specific information into its token representations. Furthermore, our analysis reveals that fine-tuning has little impact on the models' semantic abilities and that prediction errors can be recognized in the vector representations of even early layers.Comment: Accepted at CIKM 201
    corecore