113 research outputs found
What Makes a Good Plan? An Efficient Planning Approach to Control Diffusion Processes in Networks
In this paper, we analyze the quality of a large class of simple dynamic
resource allocation (DRA) strategies which we name priority planning. Their aim
is to control an undesired diffusion process by distributing resources to the
contagious nodes of the network according to a predefined priority-order. In
our analysis, we reduce the DRA problem to the linear arrangement of the nodes
of the network. Under this perspective, we shed light on the role of a
fundamental characteristic of this arrangement, the maximum cutwidth, for
assessing the quality of any priority planning strategy. Our theoretical
analysis validates the role of the maximum cutwidth by deriving bounds for the
extinction time of the diffusion process. Finally, using the results of our
analysis, we propose a novel and efficient DRA strategy, called Maximum
Cutwidth Minimization, that outperforms other competing strategies in our
simulations.Comment: 18 pages, 3 figure
Multivariate Hawkes Processes for Large-scale Inference
In this paper, we present a framework for fitting multivariate Hawkes
processes for large-scale problems both in the number of events in the observed
history and the number of event types (i.e. dimensions). The proposed
Low-Rank Hawkes Process (LRHP) framework introduces a low-rank approximation of
the kernel matrix that allows to perform the nonparametric learning of the
triggering kernels using at most operations, where is the
rank of the approximation (). This comes as a major improvement to
the existing state-of-the-art inference algorithms that are in .
Furthermore, the low-rank approximation allows LRHP to learn representative
patterns of interaction between event types, which may be valuable for the
analysis of such complex processes in real world datasets. The efficiency and
scalability of our approach is illustrated with numerical experiments on
simulated as well as real datasets.Comment: 16 pages, 5 figure
Breaking the Log Barrier: a Novel Universal Restart Strategy for Faster Las Vegas Algorithms
Let be a Las Vegas algorithm, i.e. an algorithm whose running
time is a random variable drawn according to a certain probability
distribution . In 1993, Luby, Sinclair and Zuckerman [LSZ93] proved that a
simple universal restart strategy can, for any probability distribution ,
provide an algorithm executing and whose expected running time is
, where is the minimum expected running time achievable with
full prior knowledge of the probability distribution , and is the
-quantile of . Moreover, the authors showed that the logarithmic term
could not be removed for universal restart strategies and was, in a certain
sense, optimal. In this work, we show that, quite surprisingly, the logarithmic
term can be replaced by a smaller quantity, thus reducing the expected running
time in practical settings of interest. More precisely, we propose a novel
restart strategy that executes and whose expected running time is
where . This quantity is, up to a multiplicative factor, better than: 1)
the universal restart strategy of [LSZ93], 2) any -quantile of for
, 3) the original algorithm, and 4) any quantity of the form
for a large class of concave functions .
The latter extends the recent restart strategy of [Zam22] achieving
, and can be thought of as algorithmic
reverse Jensen's inequalities. Finally, we show that the behavior of
at infinity controls the existence of reverse
Jensen's inequalities by providing a necessary and a sufficient condition for
these inequalities to hold.Comment: 13 pages, 0 figure
Optimal Algorithms for Non-Smooth Distributed Optimization in Networks
In this work, we consider the distributed optimization of non-smooth convex
functions using a network of computing units. We investigate this problem under
two regularity assumptions: (1) the Lipschitz continuity of the global
objective function, and (2) the Lipschitz continuity of local individual
functions. Under the local regularity assumption, we provide the first optimal
first-order decentralized algorithm called multi-step primal-dual (MSPD) and
its corresponding optimal convergence rate. A notable aspect of this result is
that, for non-smooth functions, while the dominant term of the error is in
, the structure of the communication network only impacts a
second-order term in , where is time. In other words, the error due
to limits in communication resources decreases at a fast rate even in the case
of non-strongly-convex objective functions. Under the global regularity
assumption, we provide a simple yet efficient algorithm called distributed
randomized smoothing (DRS) based on a local smoothing of the objective
function, and show that DRS is within a multiplicative factor of the
optimal convergence rate, where is the underlying dimension.Comment: 17 page
Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles
In this paper, we provide a novel framework for the analysis of
generalization error of first-order optimization algorithms for statistical
learning when the gradient can only be accessed through partial observations
given by an oracle. Our analysis relies on the regularity of the gradient
w.r.t. the data samples, and allows to derive near matching upper and lower
bounds for the generalization error of multiple learning problems, including
supervised learning, transfer learning, robust learning, distributed learning
and communication efficient learning using gradient quantization. These results
hold for smooth and strongly-convex optimization problems, as well as smooth
non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In
particular, our upper and lower bounds depend on a novel quantity that extends
the notion of conditional standard deviation, and is a measure of the extent to
which the gradient can be approximated by having access to the oracle. As a
consequence, our analysis provides a precise meaning to the intuition that
optimization of the statistical learning objective is as hard as the estimation
of its gradient. Finally, we show that, in the case of standard supervised
learning, mini-batch gradient descent with increasing batch sizes and a warm
start can reach a generalization error that is optimal up to a multiplicative
factor, thus motivating the use of this optimization scheme in practical
applications.Comment: 18 pages, 0 figure
- …