36 research outputs found

    Search Engine Similarity Analysis: A Combined Content and Rankings Approach

    Full text link
    How different are search engines? The search engine wars are a favorite topic of on-line analysts, as two of the biggest companies in the world, Google and Microsoft, battle for prevalence of the web search space. Differences in search engine popularity can be explained by their effectiveness or other factors, such as familiarity with the most popular first engine, peer imitation, or force of habit. In this work we present a thorough analysis of the affinity of the two major search engines, Google and Bing, along with DuckDuckGo, which goes to great lengths to emphasize its privacy-friendly credentials. To do so, we collected search results using a comprehensive set of 300 unique queries for two time periods in 2016 and 2019, and developed a new similarity metric that leverages both the content and the ranking of search responses. We evaluated the characteristics of the metric against other metrics and approaches that have been proposed in the literature, and used it to (1) investigate the similarities of search engine results, (2) the evolution of their affinity over time, (3) what aspects of the results influence similarity, and (4) how the metric differs over different kinds of search services. We found that Google stands apart, but Bing and DuckDuckGo are largely indistinguishable from each other.Comment: Shorter version of this paper was accepted in the 21st International Conference on Web Information Systems Engineering (WISE 2020). The final authenticated version is available online at https://doi.org/10.1007/978-3-030-62008-0_

    Inter-arrival times of message propagation on directed networks

    Full text link
    One of the challenges in fighting cybercrime is to understand the dynamics of message propagation on botnets, networks of infected computers used to send viruses, unsolicited commercial emails (SPAM) or denial of service attacks. We map this problem to the propagation of multiple random walkers on directed networks and we evaluate the inter-arrival time distribution between successive walkers arriving at a target. We show that the temporal organization of this process, which models information propagation on unstructured peer to peer networks, has the same features as SPAM arriving to a single user. We study the behavior of the message inter-arrival time distribution on three different network topologies using two different rules for sending messages. In all networks the propagation is not a pure Poisson process. It shows universal features on Poissonian networks and a more complex behavior on scale free networks. Results open the possibility to indirectly learn about the process of sending messages on networks with unknown topologies, by studying inter-arrival times at any node of the network.Comment: 9 pages, 12 figure

    Lower Bounds on the Time/Memory Tradeoff of Function Inversion

    Get PDF
    We study time/memory tradeoffs of function inversion: an algorithm, i.e., an inverter, equipped with an ss-bit advice on a randomly chosen function f ⁣:[n][n]f\colon [n] \mapsto [n] and using qq oracle queries to ff, tries to invert a randomly chosen output yy of ff, i.e., to find xf1(y)x\in f^{-1}(y). Much progress was done regarding adaptive function inversion - the inverter is allowed to make adaptive oracle queries. Hellman [IEEE transactions on Information Theory \u2780] presented an adaptive inverter that inverts with high probability a random ff. Fiat and Naor [SICOMP \u2700] proved that for any s,qs,q with s3q=n3s^3 q = n^3 (ignoring low-order terms), an ss-advice, qq-query variant of Hellman\u27s algorithm inverts a constant fraction of the image points of any function. Yao [STOC \u2790] proved a lower bound of sqnsq\ge n for this problem. Closing the gap between the above lower and upper bounds is a long-standing open question. Very little is known for the non-adaptive variant of the question - the inverter chooses its queries in advance. The only known upper bounds, i.e., inverters, are the trivial ones (with s+q=ns+q= n), and the only lower bound is the above bound of Yao. In a recent work, Corrigan-Gibbs and Kogan [TCC \u2719] partially justified the difficulty of finding lower bounds on non-adaptive inverters, showing that a lower bound on the time/memory tradeoff of non-adaptive inverters implies a lower bound on low-depth Boolean circuits. Bounds that, for a strong enough choice of parameters, are notoriously hard to prove. We make progress on the above intriguing question, both for the adaptive and the non-adaptive case, proving the following lower bounds on restricted families of inverters: - Linear-advice (adaptive inverter): If the advice string is a linear function of ff (e.g., A×fA\times f, for some matrix AA, viewing ff as a vector in [n]n[n]^n), then s+qΩ(n)s+q \in \Omega(n). The bound generalizes to the case where the advice string of f1+f2f_1 + f_2, i.e., the coordinate-wise addition of the truth tables of f1f_1 and f2f_2, can be computed from the description of f1f_1 and f2f_2 by a low communication protocol. - Affine non-adaptive decoders: If the non-adaptive inverter has an affine decoder - it outputs a linear function, determined by the advice string and the element to invert, of the query answers - then sΩ(n)s \in \Omega(n) (regardless of qq). - Affine non-adaptive decision trees: If the non-adaptive inversion algorithm is a dd-depth affine decision tree - it outputs the evaluation of a decision tree whose nodes compute a linear function of the answers to the queries - and q0q 0, then sΩ(n/dlogn)s\in \Omega(n/d \log n)

    The Communication Complexity of Threshold Private Set Intersection

    Get PDF
    Threshold private set intersection enables Alice and Bob who hold sets AA and BB of size nn to compute the intersection ABA \cap B if the sets do not differ by more than some threshold parameter tt. In this work, we investigate the communication complexity of this problem and we establish the first upper and lower bounds. We show that any protocol has to have a communication complexity of Ω(t)\Omega(t). We show that an almost matching upper bound of O~(t)\tilde{\mathcal{O}}(t) can be obtained via fully homomorphic encryption. We present a computationally more efficient protocol based on weaker assumptions, namely additively homomorphic encryption, with a communication complexity of O~(t2)\tilde{\mathcal{O}}(t^2). We show how our protocols can be extended to the multiparty setting. For applications like biometric authentication, where a given fingerprint has to have a large intersection with a fingerprint from a database, our protocols may result in significant communication savings. We, furthermore, show how to extend all of our protocols to the multiparty setting. Prior to this work, all previous protocols had a communication complexity of Ω(n)\Omega(n). Our protocols are the first ones with communication complexities that mainly depend on the threshold parameter tt and only logarithmically on the set size nn

    Dremel

    No full text
    corecore