78 research outputs found
Improved algorithms for topic distillation in a hyperlinked environment
Abstract This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45 % over pure connectivity analysis.
Hyperlink analysis for the Web
Hyperlink analysis algorithms significantly improve the relevance of the search results on the Web, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform, mostly to avoid manipulation of search results by Web-positioning companies. The article discusses how hyperlink analysis can be applied to ranking algorithms, and surveys other ways Web search engines can use this analysi
Fully dynamic cycle-equivalence in graphs
Two edges e_1 and e_2 of an undirected graph are cycle-equivalent iff all cycles that contain e_1 also contain e_2, i.e., iff e_1 and e_2 are a cut-edge pair. The cycle-equivalence classes of the control-flow graph are used in optimizing compilers to speed up existing control-flow and data-flow algorithms. While the cycle-equivalence classes can be computed in linear time, we present the first fully dynamic algorithm for maintaining the cycle-equivalence relation. In an n-node graph our data structure executes an edge insertion or deletion in O(sqrt(n.log n)) time and answers the query whether two given edges are cycle-equivalent in O(pow2(log(n))) time. We also present an algorithm for plane graphs with O(log n) update and query time and for planar graphs with O(log n) insertion time and O(log2 n) query and deletion time. Additionally, we show a lower bound of Ω(log n/log log n) for the amortized time per operation for the dynamic cycle-equivalence problem in the cell probe mode
Improved data structures for fully dynamic biconnectivity
We present fully dynamic algorithms for maintaining the biconnected components in general and plane graphs. A fully dynamic algorithm maintains a graph during a sequence of insertions and deletions of edges or isolated vertices. Let m be the number of edges and n be the number of vertices in a graph. The time per operation of the best deterministic algorithms is O(sqrt(n)) in general graphs and O(log n) in plane graphs for fully dynamic connectivity and O(minm2/3, n) in general graphs and O(sqrt(n)) in plane graphs for fully dynamic biconnectivity. We improve the later running times to O(sqrt(m.log(n)) in general graphs and O(log2 n) in plane graphs. Our algorithm for general graphs can also find the biconnected components of all vertices in time O(n)
Finding near-duplicate web pages: A large-scale evaluation of algorithms
Broder et al.'s [3] shingling algorithm and Charikar's [4] random projection based approach are considered "state-of-the-art" algorithms for finding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very large scale, namely on a set of 1.6B distinct web pages. The results show that neither of the algorithms works well for finding near-duplicate pairs on the same site, while both achieve high precision for near-duplicate pairs on different sites. Since Charikar's algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall, namely 0.50 versus 0.38 for Broder et al. 's algorithm. We present a combined algorithm which achieves precision 0.79 with 79% of the recall of the other algorithms. Copyright 2006 ACM
Combinatorial algorithms for web search engines: three success stories
How much can smart combinatorial algorithms improve web search engines? To address this question we will describe three algorithms that have had a positive impact on web search engines: The PageRank algorithm, algorithms for finding near-duplicate web pages, and algorithms for index server loadbalancing
Tighter Bounds for Local Differentially Private Core Decomposition and Densest Subgraph
Computing the core decomposition of a graph is a fundamental problem that has
recently been studied in the differentially private setting, motivated by
practical applications in data mining. In particular, Dhulipala et al. [FOCS
2022] gave the first mechanism for approximate core decomposition in the
challenging and practically relevant setting of local differential privacy. One
of the main open problems left by their work is whether the accuracy, i.e., the
approximation ratio and additive error, of their mechanism can be improved. We
show the first lower bounds on the additive error of approximate and exact core
decomposition mechanisms in the centralized and local model of differential
privacy, respectively. We also give mechanisms for exact and approximate core
decomposition in the local model, with almost matching additive error bounds.
Our mechanisms are based on a black-box application of continual counting. They
also yield improved mechanisms for the approximate densest subgraph problem in
the local model
Fine-Grained Complexity Lower Bounds for Families of Dynamic Graphs
A dynamic graph algorithm is a data structure that answers queries about a property of the current graph while supporting graph modifications such as edge insertions and deletions. Prior work has shown strong conditional lower bounds for general dynamic graphs, yet graph families that arise in practice often exhibit structural properties that the existing lower bound constructions do not possess. We study three specific graph families that are ubiquitous, namely constant-degree graphs, power-law graphs, and expander graphs, and give the first conditional lower bounds for them. Our results show that even when restricting our attention to one of these graph classes, any algorithm for fundamental graph problems such as distance computation or approximation or maximum matching, cannot simultaneously achieve a sub-polynomial update time and query time. For example, we show that the same lower bounds as for general graphs hold for maximum matching and (s,t)-distance in constant-degree graphs, power-law graphs or expanders. Namely, in an m-edge graph, there exists no dynamic algorithms with both O(m^{1/2 - ?}) update time and O(m^{1 -?}) query time, for any small ? > 0. Note that for (s,t)-distance the trivial dynamic algorithm achieves an almost matching upper bound of constant update time and O(m) query time. We prove similar bounds for the other graph families and for other fundamental problems such as densest subgraph detection and perfect matching
- …