6,834 research outputs found
Block Crossings in Storyline Visualizations
Storyline visualizations help visualize encounters of the characters in a
story over time. Each character is represented by an x-monotone curve that goes
from left to right. A meeting is represented by having the characters that
participate in the meeting run close together for some time. In order to keep
the visual complexity low, rather than just minimizing pairwise crossings of
curves, we propose to count block crossings, that is, pairs of intersecting
bundles of lines.
Our main results are as follows. We show that minimizing the number of block
crossings is NP-hard, and we develop, for meetings of bounded size, a
constant-factor approximation. We also present two fixed-parameter algorithms
and, for meetings of size 2, a greedy heuristic that we evaluate
experimentally.Comment: Appears in the Proceedings of the 24th International Symposium on
Graph Drawing and Network Visualization (GD 2016
Improved Bounds for 3SUM, -SUM, and Linear Degeneracy
Given a set of real numbers, the 3SUM problem is to decide whether there
are three of them that sum to zero. Until a recent breakthrough by Gr{\o}nlund
and Pettie [FOCS'14], a simple -time deterministic algorithm for
this problem was conjectured to be optimal. Over the years many algorithmic
problems have been shown to be reducible from the 3SUM problem or its variants,
including the more generalized forms of the problem, such as -SUM and
-variate linear degeneracy testing (-LDT). The conjectured hardness of
these problems have become extremely popular for basing conditional lower
bounds for numerous algorithmic problems in P.
In this paper, we show that the randomized -linear decision tree
complexity of 3SUM is , and that the randomized -linear
decision tree complexity of -SUM and -LDT is , for any odd
. These bounds improve (albeit randomized) the corresponding
and decision tree bounds
obtained by Gr{\o}nlund and Pettie. Our technique includes a specialized
randomized variant of fractional cascading data structure. Additionally, we
give another deterministic algorithm for 3SUM that runs in time. The latter bound matches a recent independent bound by Freund
[Algorithmica 2017], but our algorithm is somewhat simpler, due to a better use
of word-RAM model
Topic Similarity Networks: Visual Analytics for Large Document Sets
We investigate ways in which to improve the interpretability of LDA topic
models by better analyzing and visualizing their outputs. We focus on examining
what we refer to as topic similarity networks: graphs in which nodes represent
latent topics in text collections and links represent similarity among topics.
We describe efficient and effective approaches to both building and labeling
such networks. Visualizations of topic models based on these networks are shown
to be a powerful means of exploring, characterizing, and summarizing large
collections of unstructured text documents. They help to "tease out"
non-obvious connections among different sets of documents and provide insights
into how topics form larger themes. We demonstrate the efficacy and
practicality of these approaches through two case studies: 1) NSF grants for
basic research spanning a 14 year period and 2) the entire English portion of
Wikipedia.Comment: 9 pages; 2014 IEEE International Conference on Big Data (IEEE BigData
2014
Natural data structure extracted from neighborhood-similarity graphs
'Big' high-dimensional data are commonly analyzed in low-dimensions, after
performing a dimensionality-reduction step that inherently distorts the data
structure. For the same purpose, clustering methods are also often used. These
methods also introduce a bias, either by starting from the assumption of a
particular geometric form of the clusters, or by using iterative schemes to
enhance cluster contours, with uncontrollable consequences. The goal of data
analysis should, however, be to encode and detect structural data features at
all scales and densities simultaneously, without assuming a parametric form of
data point distances, or modifying them. We propose a novel approach that
directly encodes data point neighborhood similarities as a sparse graph. Our
non-iterative framework permits a transparent interpretation of data, without
altering the original data dimension and metric. Several natural and synthetic
data applications demonstrate the efficacy of our novel approach
Using Visualization to Support Data Mining of Large Existing Databases
In this paper. we present ideas how visualization technology can be used to improve the difficult process of querying very large databases. With our VisDB system, we try to provide visual support not only for the query specification process. but also for evaluating query results and. thereafter, refining the query accordingly. The main idea of our system is to represent as many data items as possible by the pixels of the display device. By arranging and coloring the pixels according to the relevance for the query, the user gets a visual impression of the resulting data set and of its relevance for the query. Using an interactive query interface, the user may change the query dynamically and receives immediate feedback by the visual representation of the resulting data set. By using multiple windows for different parts of the query, the user gets visual feedback for each part of the query and, therefore, may easier understand the overall result. To support complex queries, we introduce the notion of approximate joins which allow the user to find data items that only approximately fulfill join conditions. We also present ideas how our technique may be extended to support the interoperation of heterogeneous databases. Finally, we discuss the performance problems that are caused by interfacing to existing database systems and present ideas to solve these problems by using data structures supporting a multidimensional search of the database
Run Generation Revisited: What Goes Up May or May Not Come Down
In this paper, we revisit the classic problem of run generation. Run
generation is the first phase of external-memory sorting, where the objective
is to scan through the data, reorder elements using a small buffer of size M ,
and output runs (contiguously sorted chunks of elements) that are as long as
possible.
We develop algorithms for minimizing the total number of runs (or
equivalently, maximizing the average run length) when the runs are allowed to
be sorted or reverse sorted. We study the problem in the online setting, both
with and without resource augmentation, and in the offline setting.
(1) We analyze alternating-up-down replacement selection (runs alternate
between sorted and reverse sorted), which was studied by Knuth as far back as
1963. We show that this simple policy is asymptotically optimal. Specifically,
we show that alternating-up-down replacement selection is 2-competitive and no
deterministic online algorithm can perform better.
(2) We give online algorithms having smaller competitive ratios with resource
augmentation. Specifically, we exhibit a deterministic algorithm that, when
given a buffer of size 4M , is able to match or beat any optimal algorithm
having a buffer of size M . Furthermore, we present a randomized online
algorithm which is 7/4-competitive when given a buffer twice that of the
optimal.
(3) We demonstrate that performance can also be improved with a small amount
of foresight. We give an algorithm, which is 3/2-competitive, with
foreknowledge of the next 3M elements of the input stream. For the extreme case
where all future elements are known, we design a PTAS for computing the optimal
strategy a run generation algorithm must follow.
(4) Finally, we present algorithms tailored for nearly sorted inputs which
are guaranteed to have optimal solutions with sufficiently long runs
Quasiconvex Programming
We define quasiconvex programming, a form of generalized linear programming
in which one seeks the point minimizing the pointwise maximum of a collection
of quasiconvex functions. We survey algorithms for solving quasiconvex programs
either numerically or via generalizations of the dual simplex method from
linear programming, and describe varied applications of this geometric
optimization technique in meshing, scientific computation, information
visualization, automated algorithm analysis, and robust statistics.Comment: 33 pages, 14 figure
Data-Oblivious Graph Algorithms in Outsourced External Memory
Motivated by privacy preservation for outsourced data, data-oblivious
external memory is a computational framework where a client performs
computations on data stored at a semi-trusted server in a way that does not
reveal her data to the server. This approach facilitates collaboration and
reliability over traditional frameworks, and it provides privacy protection,
even though the server has full access to the data and he can monitor how it is
accessed by the client. The challenge is that even if data is encrypted, the
server can learn information based on the client data access pattern; hence,
access patterns must also be obfuscated. We investigate privacy-preserving
algorithms for outsourced external memory that are based on the use of
data-oblivious algorithms, that is, algorithms where each possible sequence of
data accesses is independent of the data values. We give new efficient
data-oblivious algorithms in the outsourced external memory model for a number
of fundamental graph problems. Our results include new data-oblivious
external-memory methods for constructing minimum spanning trees, performing
various traversals on rooted trees, answering least common ancestor queries on
trees, computing biconnected components, and forming open ear decompositions.
None of our algorithms make use of constant-time random oracles.Comment: 20 page
- …