Search CORE

36 research outputs found

Search Engine Similarity Analysis: A Combined Content and Rankings Approach

Author: CS Wallace
E Enge
J Bar-Ilan
J Sachse
K Bharat
L Vaughan
M Gordon
MA Jaro
R Fagin
SH Lee
W Ding
W Webber
Y Wang
Z Bar-Yossef
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/11/2020
Field of study

How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habit. In this work we present a thorough analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo, which goes to great lengths to emphasize its privacy-friendly credentials. To do so, we collected search results using a comprehensive set of 300 unique queries for two time periods in 2016 and 2019, and developed a new similarity metric that leverages both the content and the ranking of search responses. We evaluated the characteristics of the metric against other metrics and approaches that have been proposed in the literature, and used it to (1) investigate the similarities of search engine results, (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services. We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.Comment: Shorter version of this paper was accepted in the 21st International Conference on Web Information Systems Engineering (WISE 2020). The final authenticated version is available online at https://doi.org/10.1007/978-3-030-62008-0_

arXiv.org e-Print Archive

Crossref

Inter-arrival times of message propagation on directed networks

Author: A. Barrat
A. Capocci
B. Bollobás
B. D. Hughes
D. Ben Avraham
F. Omori
Hans J. Herrmann
J. B. Grizzard
J. Zimmermann
Lucilla de Arcangelis
N. Bisnik
P. Varga
S. Redner
T. Utsu
Tamara Mihaljev
Y. Li
Z. Bar-Yossef
Publication venue: 'American Physical Society (APS)'
Publication date: 02/11/2010
Field of study

One of the challenges in fighting cybercrime is to understand the dynamics of message propagation on botnets, networks of infected computers used to send viruses, unsolicited commercial emails (SPAM) or denial of service attacks. We map this problem to the propagation of multiple random walkers on directed networks and we evaluate the inter-arrival time distribution between successive walkers arriving at a target. We show that the temporal organization of this process, which models information propagation on unstructured peer to peer networks, has the same features as SPAM arriving to a single user. We study the behavior of the message inter-arrival time distribution on three different network topologies using two different rules for sending messages. In all networks the propagation is not a pure Poisson process. It shows universal features on Poissonian networks and a more complex behavior on scale free networks. Results open the possibility to indirectly learn about the process of sending messages on networks with unknown topologies, by studying inter-arrival times at any node of the network.Comment: 9 pages, 12 figure

arXiv.org e-Print Archive

Crossref

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Estimating search engine index size variability: a 9-year longitudinal study

Author: A Anagnostopoulos
A Broder
A Kilgarriff
A Kilgarriff
A Spink
A Uyar
Antal van den Bosch
D Lewandowski
GK Zipf
H Turtle
J Bar-Ilan
J Bar-Ilan
J Bar-Ilan
J Rice
L Vaughan
M Henzinger
M Thelwall
M Thelwall
M Thelwall
M Zimmer
Maurice de Kunder
N Payne
R Rousseau
S Lawrence
S Lawrence
Toine Bogers
Y Hirate
Z Bar-Yossef
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Lower Bounds on the Time/Memory Tradeoff of Function Inversion

Author: A Biryukov
A Biryukov
A De
A Fiat
A Golovnev
AA Razborov
AC Yao
D Unruh
E Lubetzky
H Abusalah
H Corrigan-Gibbs
LG Valiant
LG Valiant
M Hellman
N Alon
P Oechslin
R Gennaro
S Coretti
Y Dodis
Z Bar-Yossef
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 14/11/2020
Field of study

We study time/memory tradeoffs of function inversion: an algorithm, i.e., an inverter, equipped with an

s

-bit advice on a randomly chosen function

f\colon [n] \mapsto [n]

and using

q

oracle queries to

f

, tries to invert a randomly chosen output

y

f

, i.e., to find

x\in f^{-1}(y)

. Much progress was done regarding adaptive function inversion - the inverter is allowed to make adaptive oracle queries. Hellman [IEEE transactions on Information Theory \u2780] presented an adaptive inverter that inverts with high probability a random

f

. Fiat and Naor [SICOMP \u2700] proved that for any

s,q

with

s^3 q = n^3

(ignoring low-order terms), an

s

-advice,

q

-query variant of Hellman\u27s algorithm inverts a constant fraction of the image points of any function. Yao [STOC \u2790] proved a lower bound of

sq\ge n

for this problem. Closing the gap between the above lower and upper bounds is a long-standing open question. Very little is known for the non-adaptive variant of the question - the inverter chooses its queries in advance. The only known upper bounds, i.e., inverters, are the trivial ones (with

s+q= n

), and the only lower bound is the above bound of Yao. In a recent work, Corrigan-Gibbs and Kogan [TCC \u2719] partially justified the difficulty of finding lower bounds on non-adaptive inverters, showing that a lower bound on the time/memory tradeoff of non-adaptive inverters implies a lower bound on low-depth Boolean circuits. Bounds that, for a strong enough choice of parameters, are notoriously hard to prove. We make progress on the above intriguing question, both for the adaptive and the non-adaptive case, proving the following lower bounds on restricted families of inverters: - Linear-advice (adaptive inverter): If the advice string is a linear function of

f

(e.g.,

A\times f

, for some matrix

A

, viewing

f

as a vector in

[n]^n

), then

s+q \in \Omega(n)

. The bound generalizes to the case where the advice string of

f_1 + f_2

, i.e., the coordinate-wise addition of the truth tables of

f_1

and

f_2

, can be computed from the description of

f_1

and

f_2

by a low communication protocol. - Affine non-adaptive decoders: If the non-adaptive inverter has an affine decoder - it outputs a linear function, determined by the advice string and the element to invert, of the query answers - then

s \in \Omega(n)

(regardless of

q

). - Affine non-adaptive decision trees: If the non-adaptive inversion algorithm is a

d

-depth affine decision tree - it outputs the evaluation of a decision tree whose nodes compute a linear function of the answers to the queries - and

q 0

, then

s\in \Omega(n/d \log n)

Crossref

Cryptology ePrint Archive

The Communication Complexity of Threshold Private Set Intersection

Author: AA Razborov
B Applebaum
B Kalyanasundaram
B Pinkas
C Hazay
E Cristofaro De
E Cristofaro De
E Grigorescu
E Kiltz
I Damgård
J Müller-Quade
L Kissner
MJ Freedman
P Paillier
P Rindal
R Cramer
R Egert
RL Rivest
S Ghosh
S Hohenberger
SK Debnath
Y Ishai
Y Minsky
Z Bar-Yossef
Z Brakerski
Á Kiss
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 27/05/2020
Field of study

Threshold private set intersection enables Alice and Bob who hold sets

A

and

B

of size

n

to compute the intersection

A \cap B

if the sets do not differ by more than some threshold parameter

t

. In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds. We show that any protocol has to have a communication complexity of

\Omega(t)

. We show that an almost matching upper bound of

\tilde{\mathcal{O}}(t)

can be obtained via fully homomorphic encryption. We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of

\tilde{\mathcal{O}}(t^2)

. We show how our protocols can be extended to the multiparty setting. For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings. We, furthermore, show how to extend all of our protocols to the multiparty setting. Prior to this work, all previous protocols had a communication complexity of