453 research outputs found
Visual Analytics for Understanding Draco's Knowledge Base
Draco has been developed as an automated visualization recommendation system
formalizing design knowledge as logical constraints in ASP (Answer-Set
Programming). With an increasing set of constraints and incorporated design
knowledge, even visualization experts lose overview in Draco and struggle to
retrace the automated recommendation decisions made by the system. Our paper
proposes an Visual Analytics (VA) approach to visualize and analyze Draco's
constraints. Our VA approach is supposed to enable visualization experts to
accomplish identified tasks regarding the knowledge base and support them in
better understanding Draco. We extend the existing data extraction strategy of
Draco with a data processing architecture capable of extracting features of
interest from the knowledge base. A revised version of the ASP grammar provides
the basis for this data processing strategy. The resulting incorporated and
shared features of the constraints are then visualized using a hypergraph
structure inside the radial-arranged constraints of the elaborated
visualization. The hierarchical categories of the constraints are indicated by
arcs surrounding the constraints. Our approach is supposed to enable
visualization experts to interactively explore the design rules' violations
based on highlighting respective constraints or recommendations. A qualitative
and quantitative evaluation of the prototype confirms the prototype's
effectiveness and value in acquiring insights into Draco's recommendation
process and design constraints.Comment: To be presented at VIS 202
Visual Exploration System for Analyzing Trends in Annual Recruitment Using Time-varying Graphs
Annual recruitment data of new graduates are manually analyzed by human
resources specialists (HR) in industries, which signifies the need to evaluate
the recruitment strategy of HR specialists. Every year, different applicants
send in job applications to companies. The relationships between applicants'
attributes (e.g., English skill or academic credential) can be used to analyze
the changes in recruitment trends across multiple years' data. However, most
attributes are unnormalized and thus require thorough preprocessing. Such
unnormalized data hinder the effective comparison of the relationship between
applicants in the early stage of data analysis. Thus, a visual exploration
system is highly needed to gain insight from the overview of the relationship
between applicants across multiple years. In this study, we propose the
Polarizing Attributes for Network Analysis of Correlation on Entities
Association (Panacea) visualization system. The proposed system integrates a
time-varying graph model and dynamic graph visualization for heterogeneous
tabular data. Using this system, human resource specialists can interactively
inspect the relationships between two attributes of prospective employees
across multiple years. Further, we demonstrate the usability of Panacea with
representative examples for finding hidden trends in real-world datasets and
then describe HR specialists' feedback obtained throughout Panacea's
development. The proposed Panacea system enables HR specialists to visually
explore the annual recruitment of new graduates
Navigating Diverse Datasets in the Face of Uncertainty
When exploring big volumes of data, one of the challenging aspects is their diversity
of origin. Multiple files that have not yet been ingested into a database system may
contain information of interest to a researcher, who must curate, understand and sieve
their content before being able to extract knowledge.
Performance is one of the greatest difficulties in exploring these datasets. On the
one hand, examining non-indexed, unprocessed files can be inefficient. On the other
hand, any processing before its understanding introduces latency and potentially un-
necessary work if the chosen schema matches poorly the data. We have surveyed the
state-of-the-art and, fortunately, there exist multiple proposal of solutions to handle
data in-situ performantly.
Another major difficulty is matching files from multiple origins since their schema
and layout may not be compatible or properly documented. Most surveyed solutions
overlook this problem, especially for numeric, uncertain data, as is typical in fields
like astronomy.
The main objective of our research is to assist data scientists during the exploration
of unprocessed, numerical, raw data distributed across multiple files based solely on
its intrinsic distribution.
In this thesis, we first introduce the concept of Equally-Distributed Dependencies,
which provides the foundations to match this kind of dataset. We propose PresQ,
a novel algorithm that finds quasi-cliques on hypergraphs based on their expected
statistical properties. The probabilistic approach of PresQ can be successfully exploited to mine EDD between diverse datasets when the underlying populations can
be assumed to be the same.
Finally, we propose a two-sample statistical test based on Self-Organizing Maps
(SOM). This method can outperform, in terms of power, other classifier-based two-
sample tests, being in some cases comparable to kernel-based methods, with the
advantage of being interpretable.
Both PresQ and the SOM-based statistical test can provide insights that drive
serendipitous discoveries
Visualization of Metabolic Networks
The metabolism constitutes the universe of biochemical reactions taking place in
a cell of an organism. These processes include the synthesis, transformation, and
degradation of molecules for an organism to grow, to reproduce and to interact
with its environment. A good way to capture the complexity of these processes
is the representation as metabolic network, in which sets of molecules are transformed
into products by a chemical reaction, and the products are being processed
further. The underlying graph model allows a structural analysis of this network
using established graphtheoretical algorithms on the one hand, and a visual representation
by applying layout algorithms combined with information visualization
techniques on the other.
In this thesis we will take a look at three different aspects of graph visualization
within the context of biochemical systems: the representation and interactive
exploration of static networks, the visual analysis of dynamic networks, and the
comparison of two network graphs. We will demonstrate, how established infovis
techniques can be combined with new algorithms and applied to specific problems
in the area of metabolic network visualization.
We reconstruct the metabolic network covering the complete set of chemical reactions
present in a generalized eucaryotic cell from real world data available from
a popular metabolic pathway data base and present a suitable data structure. As
the constructed network is very large, it is not feasible for the display as a whole.
Instead, we introduce a technique to analyse this static network in a top-down
approach starting with an overview and displaying detailed reaction networks on
demand. This exploration method is also applied to compare metabolic networks
in different species and from different resources. As for the analysis of dynamic
networks, we present a framework to capture changes in the connectivity as well
as changes in the attributes associated with the network’s elements
3D IC optimal layout design. A parallel and distributed topological approach
The task of 3D ICs layout design involves the assembly of millions of
components taking into account many different requirements and constraints such
as topological, wiring or manufacturability ones. It is a NP-hard problem that
requires new non-deterministic and heuristic algorithms. Considering the time
complexity, the commonly applied Fiduccia-Mattheyses partitioning algorithm is
superior to any other local search method. Nevertheless, it can often miss to
reach a quasi-optimal solution in 3D spaces. The presented approach uses an
original 3D layout graph partitioning heuristics implemented with use of the
extremal optimization method. The goal is to minimize the total wire-length in
the chip. In order to improve the time complexity a parallel and distributed
Java implementation is applied. Inside one Java Virtual Machine separate
optimization algorithms are executed by independent threads. The work may also
be shared among different machines by means of The Java Remote Method
Invocation system.Comment: 26 pages, 9 figure
The State-of-the-Art of Set Visualization
Sets comprise a generic data model that has been used in a variety of data analysis problems. Such problems involve analysing and visualizing set relations between multiple sets defined over the same collection of elements. However, visualizing sets is a non-trivial problem due to the large number of possible relations between them. We provide a systematic overview of state-of-the-art techniques for visualizing different kinds of set relations. We classify these techniques into six main categories according to the visual representations they use and the tasks they support. We compare the categories to provide guidance for choosing an appropriate technique for a given problem. Finally, we identify challenges in this area that need further research and propose possible directions to address these challenges. Further resources on set visualization are available at http://www.setviz.net
- …