Search CORE

84 research outputs found

Delphic Costs and Benefits in Web Search: A utilitarian and historical analysis

Author: Broder Andrei Z.
McAfee Preston
Publication venue
Publication date: 14/08/2023
Field of study

We present a new framework to conceptualize and operationalize the total user experience of search, by studying the entirety of a search journey from an utilitarian point of view. Web search engines are widely perceived as "free". But search requires time and effort: in reality there are many intermingled non-monetary costs (e.g. time costs, cognitive costs, interactivity costs) and the benefits may be marred by various impairments, such as misunderstanding and misinformation. This characterization of costs and benefits appears to be inherent to the human search for information within the pursuit of some larger task: most of the costs and impairments can be identified in interactions with any web search engine, interactions with public libraries, and even in interactions with ancient oracles. To emphasize this innate connection, we call these costs and benefits Delphic, in contrast to explicitly financial costs and benefits. Our main thesis is that the users' satisfaction with a search engine mostly depends on their experience of Delphic cost and benefits, in other words on their utility. The consumer utility is correlated with classic measures of search engine quality, such as ranking, precision, recall, etc., but is not completely determined by them. To argue our thesis, we catalog the Delphic costs and benefits and show how the development of search engines over the last quarter century, from classic Information Retrieval roots to the integration of Large Language Models, was driven to a great extent by the quest of decreasing Delphic costs and increasing Delphic benefits. We hope that the Delphic costs framework will engender new ideas and new research for evaluating and improving the web experience for everyone.Comment: 10 page

arXiv.org e-Print Archive

On-line load balancing

Author: Azar Yossi
Broder Andrei Z.
Karlin Anna R.
Publication venue: Published by Elsevier B.V.
Publication date: 01/08/1994
Field of study

AbstractThe setup for our problem consists of n servers that must complete a set of tasks. Each task can be handled only by a subset of the servers, requires a different level of service, and once assigned cannot be reassigned. We make the natural assumption that the level of service is known at arrival time, but that the duration of service is not. The on-line load balancing problem is to assign each task to an appropriate server in such a way that the maximum load on the servers is minimized. In this paper we derive matching upper and lower bounds for the competitive ratio of the on-line greedy algorithm for this problem, namely, [(3n)23/2](1+o(1)), and derive a lower bound, Ω(n12), for any other deterministic or randomized on-line algorithm

Elsevier - Publisher Connector

Torts

Author: Andrei Z. Broder
Farzin Maghoul
Jan Pedersen
Ronny Lempel
Publication venue: UNM Digital Repository
Publication date: 01/01/1982
Field of study

Crossref

Nobody cares if you liked Star Wars: KNN graph construction on the cheap

Author: Andrei Z. Broder
F. Maxwell Harper
G Linden
P Li
Y Bachrach
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2018
Field of study

International audienceK-Nearest-Neighbors (KNN) graphs play a key role in a large range of applications. A KNN graph typically connects entities characterized by a set of features so that each entity becomes linked to its k most similar counterparts according to some similarity function. As datasets grow, KNN graphs are unfortunately becoming increasingly costly to construct, and the general approach, which consists in reducing the number of comparisons between entities, seems to have reached its full potential. In this paper we propose to overcome this limit with a simple yet powerful strategy that samples the set of features of each entity and only keeps the least popular features. We show that this strategy outperforms other more straightforward policies on a range of four representative datasets: for instance, keeping the 25 least popular items reduces computational time by up to 63%, while producing a KNN graph close to the ideal one

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions. PODS

Author: Afshani Peyman
Afshani Peyman
Aumüller Martin
Broder Andrei Z.
Dwork Cynthia
Har-Peled Sariel
Hardt Moritz
Ilya
Leonhardt Jurek
Riazi M. Sadegh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the

r

-near neighbor (

r

-NN) problem: given a radius

r>0

and a set of points

S

, construct a data structure that, for any given query point

q

, returns a point

p

within distance at most

r

from

q

. In this paper, we study the

r

-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance

r

from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for

r

-NN where all points in

S

that are near

q

have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights (un)fairness in a recommendation setting on real-world datasets and discusses the inherent unfairness introduced by solving other variants of the problem.Comment: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), Pages 191-204, June 202

arXiv.org e-Print Archive

Crossref

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova

Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018

Author: Alfredo Cuzzocrea James Allan, Norman W. Paton, Divesh Srivastava, Rakesh Agrawal, Andrei Z. Broder, Mohammed J. Zaki, K. Sel\ue7uk Candan, Alexandros Labrinidis, Assaf Schuster, Haixun Wang
Publication venue: country:USA
Publication date: 01/01/2018
Field of study

Proceedings of the 27th ACM International Conference on Information and Knowledge Management, CIKM 2018, Torino, Italy, October 22-26, 201

Archivio istituzionale della ricerca - Università di Trieste

Distance-Sensitive Hashing

Author: Alexandr Andoni Piotr Indyk
Andoni Alexandr
Andoni Alexandr
Broder Andrei Z.
Chierichetti Flavio
Cristofaro Emiliano De
Indyk Piotr
Jain Prateek
Liu Wei
Neyshabur Behnam
Rahimi Ali
Riazi M. Sadegh
Savage Richard I.
Shrivastava Anshumali
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy or uncertain data, for example in connection with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH framework is not known to yield good solutions, and instead ad hoc solutions have been designed for particular similarity and distance measures. For example, this is true for output-sensitive similarity search/join, and for indexes supporting annulus queries that aim to report a point close to a certain given distance from the query point. In this paper we initiate the study of distance-sensitive hashing (DSH), a generalization of LSH that seeks a family of hash functions such that the probability of two points having the same hash value is a given function of the distance between them. More precisely, given a distance space

(X, \text{dist})

and a "collision probability function" (CPF)

f\colon \mathbb{R}\rightarrow [0,1]

we seek a distribution over pairs of functions

(h,g)

such that for every pair of points

x, y \in X

the collision probability is

\Pr[h(x)=g(y)] = f(\text{dist}(x,y))

. Locality-sensitive hashing is the study of how fast a CPF can decrease as the distance grows. For many spaces,

f

can be made exponentially decreasing even if we restrict attention to the symmetric case where

g=h

. We show that the asymmetry achieved by having a pair of functions makes it possible to achieve CPFs that are, for example, increasing or unimodal, and show how this leads to principled solutions to problems not addressed by the LSH framework. This includes a novel application to privacy-preserving distance estimation. We believe that the DSH framework will find further applications in high-dimensional data management.Comment: Accepted at PODS'18. Abstract shortened due to character limi

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

The IT University of Copenhagen's Repository

Archivio istituzionale della ricerca - Università di Padova

The development of the Emerald programming language

Crossref

Copenhagen University Research Information System