2,873 research outputs found
The State-of-the-Art of Set Visualization
Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net
Conditional t-SNE: Complementary t-SNE embeddings through factoring out prior information
Dimensionality reduction and manifold learning methods such as t-Distributed
Stochastic Neighbor Embedding (t-SNE) are routinely used to map
high-dimensional data into a 2-dimensional space to visualize and explore the
data. However, two dimensions are typically insufficient to capture all
structure in the data, the salient structure is often already known, and it is
not obvious how to extract the remaining information in a similarly effective
manner. To fill this gap, we introduce \emph{conditional t-SNE} (ct-SNE), a
generalization of t-SNE that discounts prior information from the embedding in
the form of labels. To achieve this, we propose a conditioned version of the
t-SNE objective, obtaining a single, integrated, and elegant method. ct-SNE has
one extra parameter over t-SNE; we investigate its effects and show how to
efficiently optimize the objective. Factoring out prior knowledge allows
complementary structure to be captured in the embedding, providing new
insights. Qualitative and quantitative empirical results on synthetic and
(large) real data show ct-SNE is effective and achieves its goal
Explainable Neural Networks based Anomaly Detection for Cyber-Physical Systems
Cyber-Physical Systems (CPSs) are the core of modern critical infrastructure (e.g. power-grids) and securing them is of paramount importance. Anomaly detection in data is crucial for CPS security. While Artificial Neural Networks (ANNs) are strong candidates for the task, they are seldom deployed in safety-critical domains due to the perception that ANNs are black-boxes. Therefore, to leverage ANNs in CPSs, cracking open the black box through explanation is essential.
The main objective of this dissertation is developing explainable ANN-based Anomaly Detection Systems for Cyber-Physical Systems (CP-ADS). The main objective was broken down into three sub-objectives: 1) Identifying key-requirements that an explainable CP-ADS should satisfy, 2) Developing supervised ANN-based explainable CP-ADSs, 3) Developing unsupervised ANN-based explainable CP-ADSs.
In achieving those objectives, this dissertation provides the following contributions: 1) a set of key-requirements that an explainable CP-ADS should satisfy, 2) a methodology for deriving summaries of the knowledge of a trained supervised CP-ADS, 3) a methodology for validating derived summaries, 4) an unsupervised neural network methodology for learning cyber-physical (CP) behavior, 5) a methodology for visually and linguistically explaining the learned CP behavior.
All the methods were implemented on real-world and benchmark datasets. The set of key-requirements presented in the first contribution was used to evaluate the performance of the presented methods. The successes and limitations of the presented methods were identified. Furthermore, steps that can be taken to overcome the limitations were proposed. Therefore, this dissertation takes several necessary steps toward developing explainable ANN-based CP-ADS and serves as a framework that can be expanded to develop trustworthy ANN-based CP-ADSs
FDive: Learning Relevance Models using Pattern-based Similarity Measures
The detection of interesting patterns in large high-dimensional datasets is
difficult because of their dimensionality and pattern complexity. Therefore,
analysts require automated support for the extraction of relevant patterns. In
this paper, we present FDive, a visual active learning system that helps to
create visually explorable relevance models, assisted by learning a
pattern-based similarity. We use a small set of user-provided labels to rank
similarity measures, consisting of feature descriptor and distance function
combinations, by their ability to distinguish relevant from irrelevant data.
Based on the best-ranked similarity measure, the system calculates an
interactive Self-Organizing Map-based relevance model, which classifies data
according to the cluster affiliation. It also automatically prompts further
relevance feedback to improve its accuracy. Uncertain areas, especially near
the decision boundaries, are highlighted and can be refined by the user. We
evaluate our approach by comparison to state-of-the-art feature selection
techniques and demonstrate the usefulness of our approach by a case study
classifying electron microscopy images of brain cells. The results show that
FDive enhances both the quality and understanding of relevance models and can
thus lead to new insights for brain research.Comment: 12 pages, 7 figures, 2 tables, LaTeX; corrected typo; added DO
Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach
This paper proposes an organized generalization of Newman and Girvan's
modularity measure for graph clustering. Optimized via a deterministic
annealing scheme, this measure produces topologically ordered graph clusterings
that lead to faithful and readable graph representations based on clustering
induced graphs. Topographic graph clustering provides an alternative to more
classical solutions in which a standard graph clustering method is applied to
build a simpler graph that is then represented with a graph layout algorithm. A
comparative study on four real world graphs ranging from 34 to 1 133 vertices
shows the interest of the proposed approach with respect to classical solutions
and to self-organizing maps for graphs
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations
Bidirectional Encoder Representations from Transformers (BERT) reach
state-of-the-art results in a variety of Natural Language Processing tasks.
However, understanding of their internal functioning is still insufficient and
unsatisfactory. In order to better understand BERT and other Transformer-based
models, we present a layer-wise analysis of BERT's hidden states. Unlike
previous research, which mainly focuses on explaining Transformer models by
their attention weights, we argue that hidden states contain equally valuable
information. Specifically, our analysis focuses on models fine-tuned on the
task of Question Answering (QA) as an example of a complex downstream task. We
inspect how QA models transform token vectors in order to find the correct
answer. To this end, we apply a set of general and QA-specific probing tasks
that reveal the information stored in each representation layer. Our
qualitative analysis of hidden state visualizations provides additional
insights into BERT's reasoning process. Our results show that the
transformations within BERT go through phases that are related to traditional
pipeline tasks. The system can therefore implicitly incorporate task-specific
information into its token representations. Furthermore, our analysis reveals
that fine-tuning has little impact on the models' semantic abilities and that
prediction errors can be recognized in the vector representations of even early
layers.Comment: Accepted at CIKM 201
- …