80,729 research outputs found
External-Memory Graph Algorithms
We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include:
Proximate-neighboring. We present a simple
method for deriving external-memory lower bounds
via reductions from a problem we call the âproximate neighborsâ problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for efficiently
simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms.
Time-forward processing. We present a general
technique for evaluating circuits (or âcircuit-likeâ
computations) in external memory. We also usethis in a deterministic list ranking algorithm.
Deterministic 3-coloring of a cycle. We give
several optimal methods for 3-coloring a cycle,
which can be used as a subroutine for finding large
independent sets for list ranking. Our ideas go
beyond a straightforward PRAM simulation, and
may be of independent interest.
External depth-first search. We discuss a method
for performing depth first search and solving related
problems efficiently in external memory. Our
technique can be used in conjunction with ideas
due to Ullman and Yannakakis in order to solve
graph problems involving closed semi-ring computations even when their assumption that vertices fit in main memory does not hold.
Our techniques apply to a number of problems, including list ranking, which we discuss in detail, finding Euler tours, expression-tree evaluation, centroid decomposition of a tree, least-common ancestors, minimum spanning tree verification, connected and biconnected components, minimum spanning forest, ear decomposition, topological sorting, reachability, graph drawing, and visibility representation
How Many Topics? Stability Analysis for Topic Models
Topic modeling refers to the task of discovering the underlying thematic
structure in a text corpus, where the output is commonly presented as a report
of the top terms appearing in each topic. Despite the diversity of topic
modeling algorithms that have been proposed, a common challenge in successfully
applying these techniques is the selection of an appropriate number of topics
for a given corpus. Choosing too few topics will produce results that are
overly broad, while choosing too many will result in the "over-clustering" of a
corpus into many small, highly-similar topics. In this paper, we propose a
term-centric stability analysis strategy to address this issue, the idea being
that a model with an appropriate number of topics will be more robust to
perturbations in the data. Using a topic modeling approach based on matrix
factorization, evaluations performed on a range of corpora show that this
strategy can successfully guide the model selection process.Comment: Improve readability of plots. Add minor clarification
Improving search order for reachability testing in timed automata
Standard algorithms for reachability analysis of timed automata are sensitive
to the order in which the transitions of the automata are taken. To tackle this
problem, we propose a ranking system and a waiting strategy. This paper
discusses the reason why the search order matters and shows how a ranking
system and a waiting strategy can be integrated into the standard reachability
algorithm to alleviate and prevent the problem respectively. Experiments show
that the combination of the two approaches gives optimal search order on
standard benchmarks except for one example. This suggests that it should be
used instead of the standard BFS algorithm for reachability analysis of timed
automata
Stochastic Query Covering for Fast Approximate Document Retrieval
We design algorithms that, given a collection of documents and a distribution over user queries, return a
small subset of the document collection in such a way that we can efficiently provide high-quality answers
to user queries using only the selected subset. This approach has applications when space is a constraint
or when the query-processing time increases significantly with the size of the collection. We study our
algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction
of the entire collection, they can provide answers to most user queries, achieving a performance close to the
optimal. To complement our theoretical findings, we experimentally show the versatility of our approach
by considering two important cases in the context of Web search. In the first case, we favor the retrieval of
documents that are relevant to the query, whereas in the second case we aim for document diversification.
Both the theoretical and the experimental analysis provide strong evidence of the potential value of query
covering in diverse application scenarios
Solving the undirected feedback vertex set problem by local search
An undirected graph consists of a set of vertices and a set of undirected
edges between vertices. Such a graph may contain an abundant number of cycles,
then a feedback vertex set (FVS) is a set of vertices intersecting with each of
these cycles. Constructing a FVS of cardinality approaching the global minimum
value is a optimization problem in the nondeterministic polynomial-complete
complexity class, therefore it might be extremely difficult for some large
graph instances. In this paper we develop a simulated annealing local search
algorithm for the undirected FVS problem. By defining an order for the vertices
outside the FVS, we replace the global cycle constraints by a set of local
vertex constraints on this order. Under these local constraints the cardinality
of the focal FVS is then gradually reduced by the simulated annealing dynamical
process. We test this heuristic algorithm on large instances of Er\"odos-Renyi
random graph and regular random graph, and find that this algorithm is
comparable in performance to the belief propagation-guided decimation
algorithm.Comment: 6 page
- âŠ