10 research outputs found
On Semantic Word Cloud Representation
We study the problem of computing semantic-preserving word clouds in which
semantically related words are close to each other. While several heuristic
approaches have been described in the literature, we formalize the underlying
geometric algorithm problem: Word Rectangle Adjacency Contact (WRAC). In this
model each word is associated with rectangle with fixed dimensions, and the
goal is to represent semantically related words by ensuring that the two
corresponding rectangles touch. We design and analyze efficient polynomial-time
algorithms for some variants of the WRAC problem, show that several general
variants are NP-hard, and describe a number of approximation algorithms.
Finally, we experimentally demonstrate that our theoretically-sound algorithms
outperform the early heuristics
Space Partitioning Schemes and Algorithms for Generating Regular and Spiral Treemaps
Treemaps have been widely applied to the visualization of hierarchical data.
A treemap takes a weighted tree and visualizes its leaves in a nested planar
geometric shape, with sub-regions partitioned such that each sub-region has an
area proportional to the weight of its associated leaf nodes. Efficiently
generating visually appealing treemaps that also satisfy other quality criteria
is an interesting problem that has been tackled from many directions. We
present an optimization model and five new algorithms for this problem,
including two divide and conquer approaches and three spiral treemap
algorithms. Our optimization model is able to generate superior treemaps that
could serve as a benchmark for comparing the quality of more computationally
efficient algorithms. Our divide and conquer and spiral algorithms either
improve the performance of their existing counterparts with respect to aspect
ratio and stability or perform competitively. Our spiral algorithms also expand
their applicability to a wider range of input scenarios. Four of these
algorithms are computationally efficient as well with quasilinear running times
and the last algorithm achieves a cubic running time. A full version of this
paper with all appendices, data, and source codes is available at
\anonymizeOSF{\OSFSupplementText}
Visualizing multidimensional data similarities:Improvements and applications
Multidimensional data is increasingly more prominent and important in many application domains. Such data typically consist of a large set of elements, each of which described by several measurements (dimensions). During the design of techniques and tools to process this data, a key component is to gather insights into their structure and patterns, which can be described by the notion of similarity between elements. Among these techniques, multidimensional projections and similarity trees can effectively capture similarity patterns and handle a large number of data elements and dimensions. However, understanding and interpreting these patterns in terms of the original data dimensions is still hard. This thesis addresses the development of visual explanatory techniques for the easy interpretation of similarity patterns present in multidimensional projections and similarity trees, by several contributions. First, we propose methods that make the computation of similarity trees efficient for large datasets, and also enhance its visual representation to allow the exploration of more data in a limited screen. Secondly, we propose methods for the visual explanation of multidimensional projections in terms of groups of similar elements. These are automatically annotated to describe which dimensions are more important to define their notion of group similarity. We show next how these explanatory mechanisms can be adapted to handle both static and time-dependent data. Our proposed techniques are designed to be easy to use, work nearly automatically, and are demonstrated on a variety of real-world large data obtained from image collections, text archives, scientific measurements, and software engineering
Visual Analytics to Support Evidence-Based Decision Making
The aim of this thesis is the design of visual analytics solutions to support evidence-based decision making. Due to the ever-growing complexity of the world, strategical decision making has become an increasingly challenging task. At the business level, decisions are not solely driven by economic factors anymore. Environmental and social aspects are also taken into account in modern business decisions. At the political level, sustainable decision making is additionally influenced by the public opinion, since politicians target the conservation of their power. Decision makers face the challenge of taking all these factors into consideration and, at the same time, of increasing their efficiency to immediately react on abrupt changes in their environment. Due to the digitization era, large amounts of data are digitally stored. The knowledge hidden in these datasets can be used to address the mentioned challenges in decision making. However, handling large datasets, extracting knowledge from them, and incorporating this knowledge into the decision making process poses significant challenges. Additional complexity is added by the varying expertises of stakeholders involved in the decision making process. Strategical decisions today are not solely made by individuals. In contrast, a consortium of advisers, domain experts, analysts, etc. support decision makers in their final choice. The amount of involved stakeholders bears the risk of hampering communication efficiency and effectiveness due to knowledge gaps coming from different expertise levels. Information systems research has reacted to these challenges by promoting research in computational decision support systems. However, recent research shows that most of the challenges remain unsolved. During the last decades, visual analytics has evolved as a research field for extracting knowledge from large datasets. Therefore, combining human perception capabilities and computers’ processing power offers great analysis potential, also for decision making. However, despite obvious overlaps between decision making and visual analytics, theoretical foundations for applying visual analytics to decision making have been missing.
In this thesis, we promote the augmentation of decision support systems with visual analytics. Our concept comprises a methodology for the design of visual analytics systems that target decision making support. Therefore, we first introduce a general decision making domain characterization, comprising the analysis of potential users, relevant data categories, and decision making tasks to be supported with visual analytics technologies. Second, we introduce a specialized design process for the development of visual analytics decision support systems. Third, we present two models on how visual analytics facilitates the bridging of knowledge gaps between stakeholders involved in the decision making process: one for decision making at the business level and one for political decision making. To prove the applicability of our concepts, we apply our design methodology in several design studies targeting concrete decision making support scenarios. The presented design studies cover the full range of data, user, and task categories characterized as relevant for decision making. Within these design studies, we first tailor our general decision making domain characterization to the specific domain problem at hand. We show that our concept supports a consistent characterization of user types, data categories and decision making tasks for specific scenarios. Second, each design study follows the design process presented in our concept. And third, the design studies demonstrate how to bridge knowledge gaps between stakeholders. The resulting visual analytics systems allow the incorporation of knowledge extracted from data into the decision making process and support the collaboration of stakeholders with varying levels of expertises
Explanatory visualization of multidimensional projections
Het verkrijgen van inzicht in grote gegevensverzalelingen (tegenwoording bekend als ‘big data’) kan gedaan worden door ze visueel af te beelden en deze visualisaties vervolgens interactief exploreren. Toch kunnen beide het aantal datapunten of metingen, en ook het aantal dimensies die elke meting beschrijven, zeer groot zijn – zoals een table met veel rijen en kolommen. Het visualiseren van dergelijke zogenaamde hoog-dimensionale datasets is zeer uitdagend. Een manier om dit te doen is door het maken van een laag (twee of drie) dimensionale afbeelding, waarin men dan zoekt naar interessante datapatronen in plaats van deze te zoeken in de oorspronkelijke hoog-dimensionale data. Technieken die dit scenario ondersteunen, de zogenaamde projecties, hebben verschillende voordelen – ze zijn visueel schaalbaar, ze werken robuust met ruizige data, en ze zijn snel. Toch is het gebruik van projecties ernstig beperkt door het feit dat ze moeilijk te interpreteren zijn. We benaderen dit problem door verschillende technieken te ontwikkelen die de interpretative vergemakkelijken, zoals het weergeven van projectiefouten en het uitleggen van projecties door middel van de oorpronkelijke hoge dimensies. Onze technieken zijn makkelijk te leren, snel te rekenen, en makkelijk toe te voegen aan elke dataexploratiescenario dat gebruik maakt van elke projectie. We demonstreren onze oplossingen met verschillende toepassingen en data van metingen, wetenschappelijke simulaties, software-engineering, en netwerken
Exploratory search in time-oriented primary data
In a variety of research fields, primary data that describes scientific phenomena in an original condition is obtained.
Time-oriented primary data, in particular, is an indispensable data type, derived from complex measurements depending
on time. Today, time-oriented primary data is collected at rates that exceed the domain experts’ abilities to seek
valuable information undiscovered in the data. It is widely accepted that the magnitudes of uninvestigated data will
disclose tremendous knowledge in data-driven research, provided that domain experts are able to gain insight into the
data. Domain experts involved in data-driven research urgently require analytical capabilities. In scientific practice,
predominant activities are the generation and validation of hypotheses. In analytical terms, these activities are often
expressed in confirmatory and exploratory data analysis. Ideally, analytical support would combine the strengths of
both types of activities.
Exploratory search (ES) is a concept that seamlessly includes information-seeking behaviors ranging from search
to exploration. ES supports domain experts in both gaining an understanding of huge and potentially unknown data
collections and the drill-down to relevant subsets, e.g., to validate hypotheses. As such, ES combines predominant tasks
of domain experts applied to data-driven research. For the design of useful and usable ES systems (ESS), data scientists
have to incorporate different sources of knowledge and technology. Of particular importance is the state-of-the-art
in interactive data visualization and data analysis. Research in these factors is at heart of Information Visualization
(IV) and Visual Analytics (VA). Approaches in IV and VA provide meaningful visualization and interaction designs,
allowing domain experts to perform the information-seeking process in an effective and efficient way. Today, bestpractice
ESS almost exclusively exist for textual data content, e.g., put into practice in digital libraries to facilitate the
reuse of digital documents. For time-oriented primary data, ES mainly remains at a theoretical state.
Motivation and Problem Statement. This thesis is motivated by two main assumptions. First, we expect that
ES will have a tremendous impact on data-driven research for many research fields. In this thesis, we focus on
time-oriented primary data, as a complex and important data type for data-driven research. Second, we assume that
research conducted to IV and VA will particularly facilitate ES. For time-oriented primary data, however, novel
concepts and techniques are required that enhance the design and the application of ESS. In particular, we observe a
lack of methodological research in ESS for time-oriented primary data. In addition, the size, the complexity, and the
quality of time-oriented primary data hampers the content-based access, as well as the design of visual interfaces
for gaining an overview of the data content. Furthermore, the question arises how ESS can incorporate techniques
for seeking relations between data content and metadata to foster data-driven research. Overarching challenges for
data scientists are to create usable and useful designs, urgently requiring the involvement of the targeted user group
and support techniques for choosing meaningful algorithmic models and model parameters. Throughout this thesis,
we will resolve these challenges from conceptual, technical, and systemic perspectives. In turn, domain experts can
benefit from novel ESS as a powerful analytical support to conduct data-driven research.
Concepts for Exploratory Search Systems (Chapter 3). We postulate concepts for the ES in time-oriented primary
data. Based on a survey of analysis tasks supported in IV and VA research, we present a comprehensive selection of
tasks and techniques relevant for search and exploration activities. The assembly guides data scientists in the choice of
meaningful techniques presented in IV and VA. Furthermore, we present a reference workflow for the design and
the application of ESS for time-oriented primary data. The workflow divides the data processing and transformation
process into four steps, and thus divides the complexity of the design space into manageable parts. In addition, the
reference workflow describes how users can be involved in the design. The reference workflow is the framework for
the technical contributions of this thesis.
Visual-Interactive Preprocessing of Time-Oriented Primary Data (Chapter 4). We present a visual-interactive
system that enables users to construct workflows for preprocessing time-oriented primary data. In this way, we
introduce a means of providing content-based access. Based on a rich set of preprocessing routines, users can create
individual solutions for data cleansing, normalization, segmentation, and other preprocessing tasks. In addition, the
system supports the definition of time series descriptors and time series distance measures. Guidance concepts support
users in assessing the workflow generalizability, which is important for large data sets. The execution of the workflows
transforms time-oriented primary data into feature vectors, which can subsequently be used for downstream search
and exploration techniques. We demonstrate the applicability of the system in usage scenarios and case studies.
Content-Based Overviews (Chapter 5). We introduce novel guidelines and techniques for the design of contentbased
overviews. The three key factors are the creation of meaningful data aggregates, the visual mapping of these
aggregates into the visual space, and the view transformation providing layouts of these aggregates in the display
space. For each of these steps, we characterize important visualization and interaction design parameters allowing the
involvement of users. We introduce guidelines supporting data scientists in choosing meaningful solutions. In addition,
we present novel visual-interactive quality assessment techniques enhancing the choice of algorithmic model and
model parameters. Finally, we present visual interfaces enabling users to formulate visual queries of the time-oriented
data content. In this way, we provide means of combining content-based exploration with content-based search.
Relation Seeking Between Data Content and Metadata (Chapter 6). We present novel visual interfaces enabling
domain experts to seek relations between data content and metadata. These interfaces can be integrated into ESS
to bridge analytical gaps between the data content and attached metadata. In three different approaches, we focus
on different types of relations and define algorithmic support to guide users towards most interesting relations.
Furthermore, each of the three approaches comprises individual visualization and interaction designs, enabling users
to explore both the data and the relations in an efficient and effective way. We demonstrate the applicability of our
interfaces with usage scenarios, each conducted together with domain experts. The results confirm that our techniques
are beneficial for seeking relations between data content and metadata, particularly for data-centered research.
Case Studies - Exploratory Search Systems (Chapter 7). In two case studies, we put our concepts and techniques
into practice. We present two ESS constructed in design studies with real users, and real ES tasks, and real timeoriented
primary data collections. The web-based VisInfo ESS is a digital library system facilitating the visual access to
time-oriented primary data content. A content-based overview enables users to explore large collections of time series
measurements and serves as a baseline for content-based queries by example. In addition, VisInfo provides a visual
interface for querying time oriented data content by sketch. A result visualization combines different views of the data
content and metadata with faceted search functionality. The MotionExplorer ESS supports domain experts in human
motion analysis. Two content-based overviews enhance the exploration of large collections of human motion capture
data from two perspectives. MotionExplorer provides a search interface, allowing domain experts to query human
motion sequences by example. Retrieval results are depicted in a visual-interactive view enabling the exploration of
variations of human motions. Field study evaluations performed for both ESS confirm the applicability of the systems
in the environment of the involved user groups. The systems yield a significant improvement of both the effectiveness
and the efficiency in the day-to-day work of the domain experts. As such, both ESS demonstrate how large collections
of time-oriented primary data can be reused to enhance data-centered research.
In essence, our contributions cover the entire time series analysis process starting from accessing raw time-oriented
primary data, processing and transforming time series data, to visual-interactive analysis of time series. We present
visual search interfaces providing content-based access to time-oriented primary data. In a series of novel explorationsupport
techniques, we facilitate both gaining an overview of large and complex time-oriented primary data collections
and seeking relations between data content and metadata. Throughout this thesis, we introduce VA as a means of
designing effective and efficient visual-interactive systems. Our VA techniques empower data scientists to choose
appropriate models and model parameters, as well as to involve users in the design. With both principles, we support
the design of usable and useful interfaces which can be included into ESS. In this way, our contributions bridge the gap
between search systems requiring exploration support and exploratory data analysis systems requiring visual querying
capability. In the ESS presented in two case studies, we prove that our techniques and systems support data-driven
research in an efficient and effective way