36 research outputs found
Search Engine Similarity Analysis: A Combined Content and Rankings Approach
How different are search engines? The search engine wars are a favorite topic
of on-line analysts, as two of the biggest companies in the world, Google and
Microsoft, battle for prevalence of the web search space. Differences in search
engine popularity can be explained by their effectiveness or other factors,
such as familiarity with the most popular first engine, peer imitation, or
force of habit. In this work we present a thorough analysis of the affinity of
the two major search engines, Google and Bing, along with DuckDuckGo, which
goes to great lengths to emphasize its privacy-friendly credentials. To do so,
we collected search results using a comprehensive set of 300 unique queries for
two time periods in 2016 and 2019, and developed a new similarity metric that
leverages both the content and the ranking of search responses. We evaluated
the characteristics of the metric against other metrics and approaches that
have been proposed in the literature, and used it to (1) investigate the
similarities of search engine results, (2) the evolution of their affinity over
time, (3) what aspects of the results influence similarity, and (4) how the
metric differs over different kinds of search services. We found that Google
stands apart, but Bing and DuckDuckGo are largely indistinguishable from each
other.Comment: Shorter version of this paper was accepted in the 21st International
Conference on Web Information Systems Engineering (WISE 2020). The final
authenticated version is available online at
https://doi.org/10.1007/978-3-030-62008-0_
Inter-arrival times of message propagation on directed networks
One of the challenges in fighting cybercrime is to understand the dynamics of
message propagation on botnets, networks of infected computers used to send
viruses, unsolicited commercial emails (SPAM) or denial of service attacks. We
map this problem to the propagation of multiple random walkers on directed
networks and we evaluate the inter-arrival time distribution between successive
walkers arriving at a target. We show that the temporal organization of this
process, which models information propagation on unstructured peer to peer
networks, has the same features as SPAM arriving to a single user. We study the
behavior of the message inter-arrival time distribution on three different
network topologies using two different rules for sending messages. In all
networks the propagation is not a pure Poisson process. It shows universal
features on Poissonian networks and a more complex behavior on scale free
networks. Results open the possibility to indirectly learn about the process of
sending messages on networks with unknown topologies, by studying inter-arrival
times at any node of the network.Comment: 9 pages, 12 figure
Lower Bounds on the Time/Memory Tradeoff of Function Inversion
We study time/memory tradeoffs of function inversion: an algorithm, i.e., an inverter, equipped with an -bit advice on a randomly chosen function and using oracle queries to , tries to invert a randomly chosen output of , i.e., to find . Much progress was done regarding adaptive function inversion - the inverter is allowed to make adaptive oracle queries. Hellman [IEEE transactions on Information Theory \u2780] presented an adaptive inverter that inverts with high probability a random . Fiat and Naor [SICOMP \u2700] proved that for any with (ignoring low-order terms), an -advice, -query variant of Hellman\u27s algorithm inverts a constant fraction of the image points of any function. Yao [STOC \u2790] proved a lower bound of for this problem. Closing the gap between the above lower and upper bounds is a long-standing open question.
Very little is known for the non-adaptive variant of the question - the inverter chooses its queries in advance. The only known upper bounds, i.e., inverters, are the trivial ones (with ), and the only lower bound is the above bound of Yao. In a recent work, Corrigan-Gibbs and Kogan [TCC \u2719] partially justified the difficulty of finding lower bounds on non-adaptive inverters, showing that a lower bound on the time/memory tradeoff of non-adaptive inverters implies a lower bound on low-depth Boolean circuits. Bounds that, for a strong enough choice of parameters, are notoriously hard to prove.
We make progress on the above intriguing question, both for the adaptive and the non-adaptive case, proving the following lower bounds on restricted families of inverters:
- Linear-advice (adaptive inverter): If the advice string is a linear function of (e.g., , for some matrix , viewing as a vector in ), then . The bound generalizes to the case where the advice string of , i.e., the coordinate-wise addition of the truth tables of and , can be computed from the description of and by a low communication protocol.
- Affine non-adaptive decoders: If the non-adaptive inverter has an affine decoder - it outputs a linear function, determined by the advice string and the element to invert, of the query answers - then (regardless of ).
- Affine non-adaptive decision trees: If the non-adaptive inversion algorithm is a -depth affine decision tree - it outputs the evaluation of a decision tree whose nodes compute a linear function of the answers to the queries - and , then
The Communication Complexity of Threshold Private Set Intersection
Threshold private set intersection enables Alice and Bob who hold sets and of size to compute the intersection if the sets do not differ by more than some threshold parameter .
In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds.
We show that any protocol has to have a communication complexity of .
We show that an almost matching upper bound of can be obtained via fully homomorphic encryption.
We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of .
We show how our protocols can be extended to the multiparty setting.
For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings.
We, furthermore, show how to extend all of our protocols to the multiparty setting.
Prior to this work, all previous protocols had a communication complexity of .
Our protocols are the first ones with communication complexities that mainly depend on the threshold parameter and only logarithmically on the set size