Search CORE

521 research outputs found

On the capacity of information processing systems

Author: Massoulie Laurent
Xu Kuang
Publication venue
Publication date: 30/05/2016
Field of study

We propose and analyze a family of information processing systems, where a finite set of experts or servers are employed to extract information about a stream of incoming jobs. Each job is associated with a hidden label drawn from some prior distribution. An inspection by an expert produces a noisy outcome that depends both on the job's hidden label and the type of the expert, and occupies the expert for a finite time duration. A decision maker's task is to dynamically assign inspections so that the resulting outcomes can be used to accurately recover the labels of all jobs, while keeping the system stable. Among our chief motivations are applications in crowd-sourcing, diagnostics, and experiment designs, where one wishes to efficiently learn the nature of a large number of items, using a finite pool of computational resources or human agents. We focus on the capacity of such an information processing system. Given a level of accuracy guarantee, we ask how many experts are needed in order to stabilize the system, and through what inspection architecture. Our main result provides an adaptive inspection policy that is asymptotically optimal in the following sense: the ratio between the required number of experts under our policy and the theoretical optimal converges to one, as the probability of error in label recovery tends to zero

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Gossiping with Multiple Messages

Author: Hajek Bruce
Massoulie Laurent
Sanghavi Sujay
Publication venue
Publication date: 22/12/2006
Field of study

This paper investigates the dissemination of multiple pieces of information in large networks where users contact each other in a random uncoordinated manner, and users upload one piece per unit time. The underlying motivation is the design and analysis of piece selection protocols for peer-to-peer networks which disseminate files by dividing them into pieces. We first investigate one-sided protocols, where piece selection is based on the states of either the transmitter or the receiver. We show that any such protocol relying only on pushes, or alternatively only on pulls, is inefficient in disseminating all pieces to all users. We propose a hybrid one-sided piece selection protocol -- INTERLEAVE -- and show that by using both pushes and pulls it disseminates

k

pieces from a single source to

n

users in

10(k+\log n)

time, while obeying the constraint that each user can upload at most one piece in one unit of time, with high probability for large

n

. An optimal, unrealistic centralized protocol would take

k+\log_2 n

time in this setting. Moreover, efficient dissemination is also possible if the source implements forward erasure coding, and users push the latest-released coded pieces (but do not pull). We also investigate two-sided protocols where piece selection is based on the states of both the transmitter and the receiver. We show that it is possible to disseminate

n

pieces to

n

users in

n+O(\log n)

time, starting from an initial state where each user has a unique piece.Comment: Accepted to IEEE INFOCOM 200

arXiv.org e-Print Archive

CiteSeerX

An Accelerated Decentralized Stochastic Proximal Algorithm for Finite Sums

Author: Bach Francis
Hendrikx Hadrien
Massoulie Laurent
Publication venue
Publication date: 12/06/2019
Field of study

Modern large-scale finite-sum optimization relies on two key aspects: distribution and stochastic updates. For smooth and strongly convex problems, existing decentralized algorithms are slower than modern accelerated variance-reduced stochastic algorithms when run on a single machine, and are therefore not efficient. Centralized algorithms are fast, but their scaling is limited by global aggregation steps that result in communication bottlenecks. In this work, we propose an efficient \textbf{A}ccelerated \textbf{D}ecentralized stochastic algorithm for \textbf{F}inite \textbf{S}ums named ADFS, which uses local stochastic proximal updates and randomized pairwise communications between nodes. On

n

machines, ADFS learns from

nm

samples in the same time it takes optimal algorithms to learn from

m

samples on one machine. This scaling holds until a critical network size is reached, which depends on communication delays, on the number of samples

m

, and on the network topology. We provide a theoretical analysis based on a novel augmented graph approach combined with a precise evaluation of synchronization times and an extension of the accelerated proximal coordinate gradient algorithm to arbitrary sampling. We illustrate the improvement of ADFS over state-of-the-art decentralized approaches with experiments.Comment: Code available in source files. arXiv admin note: substantial text overlap with arXiv:1901.0986

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Adaptive Matching for Expert Systems with Uncertain Task Types

Author: Gulikers Lennart
Massoulie Laurent
Shah Virag
Vojnovic Milan
Publication venue
Publication date: 03/10/2017
Field of study

A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited. To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics.Comment: A part of it presented at Allerton Conference 2017, 18 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Group Synchronization on Grids

Author: Abbe Emmanuel
Massoulie Laurent
Montanari Andrea
Sly Allan
Srivastava Nikhil
Publication venue
Publication date: 26/06/2017
Field of study

Group synchronization requires to estimate unknown elements

({\theta}_v)_{v\in V}

of a compact group

{\mathfrak G}

associated to the vertices of a graph

G=(V,E)

, using noisy observations of the group differences associated to the edges. This model is relevant to a variety of applications ranging from structure from motion in computer vision to graph localization and positioning, to certain families of community detection problems. We focus on the case in which the graph

G

is the

d

-dimensional grid. Since the unknowns

{\boldsymbol \theta}_v

are only determined up to a global action of the group, we consider the following weak recovery question. Can we determine the group difference

{\theta}_u^{-1}{\theta}_v

between far apart vertices

u, v

better than by random guessing? We prove that weak recovery is possible (provided the noise is small enough) for

d\ge 3

and, for certain finite groups, for

d\ge 2

. Viceversa, for some continuous groups, we prove that weak recovery is impossible for

d=2

. Finally, for strong enough noise, weak recovery is always impossible.Comment: 21 page

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Exponential random graphs as models of overlay networks

Author: Draief M.
Ganesh A.
Massoulie L.
Publication venue
Publication date: 17/10/2008
Field of study

In this paper, we give an analytic solution for graphs with n nodes and E edges for which the probability of obtaining a given graph G is specified in terms of the degree sequence of G. We describe how this model naturally appears in the context of load balancing in communication networks, namely Peer-to-Peer overlays. We then analyse the degree distribution of such graphs and show that the degrees are concentrated around their mean value. Finally, we derive asymptotic results on the number of edges crossing a graph cut and use these results

(i)

to compute the graph expansion and conductance, and

(ii)

to analyse the graph resilience to random failures.Comment: 18 page

arXiv.org e-Print Archive

CiteSeerX

Explore Bristol Research

Non-Metric Coordinates for Predicting Network Proximity

Author: D.-C. Tomozei
L. Massoulie
P. Key
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

Recommended from our members

Faithfulness in Internet Algorithms

Author: Massoulie Laurent
Parkes David C.
Shneidman Jeffrey
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2010
Field of study

Proving or disproving faithfulness (a property describing robustness to rational manipulation in action as well as information revelation) is an appealing goal when reasoning about distributed systems containing rational participants. Recent work formalizes the notion of faithfulness and its foundation properties, and presents a general proof technique in the course of proving the ex post Nash faithfulness of a theoretical routing problem [11].In this paper, we use a less formal approach and take some first steps in faithfulness analysis for existing algorithms running on the Internet. To this end, we consider the expected faithfulness of BitTorrent, a popular file download system, and show how manual backtracing (similar to the the ideas behind program slicing [22]) can be used to find rational manipulation problems. Although this primitive technique has serious drawbacks, it can be useful in disproving faithfulness.Building provably faithful Internet protocols and their corresponding specifications can be quite difficult depending on the system knowledge assumptions and problem complexity. We present some of the open problems that are associated with these challenges.Engineering and Applied Science

Harvard University - DASH