33,825 research outputs found
Asynchronous iterative computations with Web information retrieval structures: The PageRank case
There are several ideas being used today for Web information retrieval, and
specifically in Web search engines. The PageRank algorithm is one of those that
introduce a content-neutral ranking function over Web pages. This ranking is
applied to the set of pages returned by the Google search engine in response to
posting a search query. PageRank is based in part on two simple common sense
concepts: (i)A page is important if many important pages include links to it.
(ii)A page containing many links has reduced impact on the importance of the
pages it links to. In this paper we focus on asynchronous iterative schemes to
compute PageRank over large sets of Web pages. The elimination of the
synchronizing phases is expected to be advantageous on heterogeneous platforms.
The motivation for a possible move to such large scale distributed platforms
lies in the size of matrices representing Web structure. In orders of
magnitude: pages with nonzero elements and bytes
just to store a small percentage of the Web (the already crawled); distributed
memory machines are necessary for such computations. The present research is
part of our general objective, to explore the potential of asynchronous
computational models as an underlying framework for very large scale
computations over the Grid. The area of ``internet algorithmics'' appears to
offer many occasions for computations of unprecedent dimensionality that would
be good candidates for this framework.Comment: 8 pages to appear at ParCo2005 Conference Proceeding
Privately Connecting Mobility to Infectious Diseases via Applied Cryptography
Human mobility is undisputedly one of the critical factors in infectious
disease dynamics. Until a few years ago, researchers had to rely on static data
to model human mobility, which was then combined with a transmission model of a
particular disease resulting in an epidemiological model. Recent works have
consistently been showing that substituting the static mobility data with
mobile phone data leads to significantly more accurate models. While prior
studies have exclusively relied on a mobile network operator's subscribers'
aggregated data, it may be preferable to contemplate aggregated mobility data
of infected individuals only. Clearly, naively linking mobile phone data with
infected individuals would massively intrude privacy. This research aims to
develop a solution that reports the aggregated mobile phone location data of
infected individuals while still maintaining compliance with privacy
expectations. To achieve privacy, we use homomorphic encryption, zero-knowledge
proof techniques, and differential privacy. Our protocol's open-source
implementation can process eight million subscribers in one and a half hours.
Additionally, we provide a legal analysis of our solution with regards to the
EU General Data Protection Regulation.Comment: Added differentlial privacy experiments and new benchmark
Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Retrieval pipelines commonly rely on a term-based search to obtain candidate
records, which are subsequently re-ranked. Some candidates are missed by this
approach, e.g., due to a vocabulary mismatch. We address this issue by
replacing the term-based search with a generic k-NN retrieval algorithm, where
a similarity function can take into account subtle term associations. While an
exact brute-force k-NN search using this similarity function is slow, we
demonstrate that an approximate algorithm can be nearly two orders of magnitude
faster at the expense of only a small loss in accuracy. A retrieval pipeline
using an approximate k-NN search can be more effective and efficient than the
term-based pipeline. This opens up new possibilities for designing effective
retrieval pipelines. Our software (including data-generating code) and
derivative data based on the Stack Overflow collection is available online
- …