80,729 research outputs found

    External-Memory Graph Algorithms

    Get PDF
    We present a collection of new techniques for designing and analyzing efficient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of specific problems. Our results include: Proximate-neighboring. We present a simple method for deriving external-memory lower bounds via reductions from a problem we call the “proximate neighbors” problem. We use this technique to derive non-trivial lower bounds for such problems as list ranking, expression tree evaluation, and connected components. PRAM simulation. We give methods for efficiently simulating PRAM computations in external memory, even for some cases in which the PRAM algorithm is not work-optimal. We apply this to derive a number of optimal (and simple) external-memory graph algorithms. Time-forward processing. We present a general technique for evaluating circuits (or “circuit-like” computations) in external memory. We also usethis in a deterministic list ranking algorithm. Deterministic 3-coloring of a cycle. We give several optimal methods for 3-coloring a cycle, which can be used as a subroutine for finding large independent sets for list ranking. Our ideas go beyond a straightforward PRAM simulation, and may be of independent interest. External depth-first search. We discuss a method for performing depth first search and solving related problems efficiently in external memory. Our technique can be used in conjunction with ideas due to Ullman and Yannakakis in order to solve graph problems involving closed semi-ring computations even when their assumption that vertices fit in main memory does not hold. Our techniques apply to a number of problems, including list ranking, which we discuss in detail, finding Euler tours, expression-tree evaluation, centroid decomposition of a tree, least-common ancestors, minimum spanning tree verification, connected and biconnected components, minimum spanning forest, ear decomposition, topological sorting, reachability, graph drawing, and visibility representation

    How Many Topics? Stability Analysis for Topic Models

    Full text link
    Topic modeling refers to the task of discovering the underlying thematic structure in a text corpus, where the output is commonly presented as a report of the top terms appearing in each topic. Despite the diversity of topic modeling algorithms that have been proposed, a common challenge in successfully applying these techniques is the selection of an appropriate number of topics for a given corpus. Choosing too few topics will produce results that are overly broad, while choosing too many will result in the "over-clustering" of a corpus into many small, highly-similar topics. In this paper, we propose a term-centric stability analysis strategy to address this issue, the idea being that a model with an appropriate number of topics will be more robust to perturbations in the data. Using a topic modeling approach based on matrix factorization, evaluations performed on a range of corpora show that this strategy can successfully guide the model selection process.Comment: Improve readability of plots. Add minor clarification

    Improving search order for reachability testing in timed automata

    Get PDF
    Standard algorithms for reachability analysis of timed automata are sensitive to the order in which the transitions of the automata are taken. To tackle this problem, we propose a ranking system and a waiting strategy. This paper discusses the reason why the search order matters and shows how a ranking system and a waiting strategy can be integrated into the standard reachability algorithm to alleviate and prevent the problem respectively. Experiments show that the combination of the two approaches gives optimal search order on standard benchmarks except for one example. This suggests that it should be used instead of the standard BFS algorithm for reachability analysis of timed automata

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    Solving the undirected feedback vertex set problem by local search

    Full text link
    An undirected graph consists of a set of vertices and a set of undirected edges between vertices. Such a graph may contain an abundant number of cycles, then a feedback vertex set (FVS) is a set of vertices intersecting with each of these cycles. Constructing a FVS of cardinality approaching the global minimum value is a optimization problem in the nondeterministic polynomial-complete complexity class, therefore it might be extremely difficult for some large graph instances. In this paper we develop a simulated annealing local search algorithm for the undirected FVS problem. By defining an order for the vertices outside the FVS, we replace the global cycle constraints by a set of local vertex constraints on this order. Under these local constraints the cardinality of the focal FVS is then gradually reduced by the simulated annealing dynamical process. We test this heuristic algorithm on large instances of Er\"odos-Renyi random graph and regular random graph, and find that this algorithm is comparable in performance to the belief propagation-guided decimation algorithm.Comment: 6 page
    • 

    corecore