3,819 research outputs found
Shampoo: Preconditioned Stochastic Tensor Optimization
Preconditioned gradient methods are among the most general and powerful tools
in optimization. However, preconditioning requires storing and manipulating
prohibitively large matrices. We describe and analyze a new structure-aware
preconditioning algorithm, called Shampoo, for stochastic optimization over
tensor spaces. Shampoo maintains a set of preconditioning matrices, each of
which operates on a single dimension, contracting over the remaining
dimensions. We establish convergence guarantees in the stochastic convex
setting, the proof of which builds upon matrix trace inequalities. Our
experiments with state-of-the-art deep learning models show that Shampoo is
capable of converging considerably faster than commonly used optimizers.
Although it involves a more complex update rule, Shampoo's runtime per step is
comparable to that of simple gradient methods such as SGD, AdaGrad, and Adam
A Novel Piecewise Linear Recursive Convolution Approach for Dispersive Media Using the Finite-Difference Time-Domain Method
Peer reviewedPublisher PD
String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure
Burrows-Wheeler transform (BWT) is an invertible text transformation that,
given a text of length , permutes its symbols according to the
lexicographic order of suffixes of . BWT is one of the most heavily studied
algorithms in data compression with numerous applications in indexing, sequence
analysis, and bioinformatics. Its construction is a bottleneck in many
scenarios, and settling the complexity of this task is one of the most
important unsolved problems in sequence analysis that has remained open for 25
years. Given a binary string of length , occupying machine
words, the BWT construction algorithm due to Hon et al. (SIAM J. Comput., 2009)
runs in time and space. Recent advancements (Belazzougui,
STOC 2014, and Munro et al., SODA 2017) focus on removing the alphabet-size
dependency in the time complexity, but they still require time.
In this paper, we propose the first algorithm that breaks the -time
barrier for BWT construction. Given a binary string of length , our
procedure builds the Burrows-Wheeler transform in time and
space. We complement this result with a conditional lower bound
proving that any further progress in the time complexity of BWT construction
would yield faster algorithms for the very well studied problem of counting
inversions: it would improve the state-of-the-art -time
solution by Chan and P\v{a}tra\c{s}cu (SODA 2010). Our algorithm is based on a
novel concept of string synchronizing sets, which is of independent interest.
As one of the applications, we show that this technique lets us design a data
structure of the optimal size that answers Longest Common
Extension queries (LCE queries) in time and, furthermore, can be
deterministically constructed in the optimal time.Comment: Full version of a paper accepted to STOC 201
Accelerated filtering on graphs using Lanczos method
Signal-processing on graphs has developed into a very active field of
research during the last decade. In particular, the number of applications
using frames constructed from graphs, like wavelets on graphs, has
substantially increased. To attain scalability for large graphs, fast
graph-signal filtering techniques are needed. In this contribution, we propose
an accelerated algorithm based on the Lanczos method that adapts to the
Laplacian spectrum without explicitly computing it. The result is an accurate,
robust, scalable and efficient algorithm. Compared to existing methods based on
Chebyshev polynomials, our solution achieves higher accuracy without increasing
the overall complexity significantly. Furthermore, it is particularly well
suited for graphs with large spectral gaps
Hardness of Exact Distance Queries in Sparse Graphs Through Hub Labeling
A distance labeling scheme is an assignment of bit-labels to the vertices of
an undirected, unweighted graph such that the distance between any pair of
vertices can be decoded solely from their labels. An important class of
distance labeling schemes is that of hub labelings, where a node
stores its distance to the so-called hubs , chosen so that for
any there is belonging to some shortest
path. Notice that for most existing graph classes, the best distance labelling
constructions existing use at some point a hub labeling scheme at least as a
key building block. Our interest lies in hub labelings of sparse graphs, i.e.,
those with , for which we show a lowerbound of
for the average size of the hubsets.
Additionally, we show a hub-labeling construction for sparse graphs of average
size for some , where is the
so-called Ruzsa-Szemer{\'e}di function, linked to structure of induced
matchings in dense graphs. This implies that further improving the lower bound
on hub labeling size to would require a
breakthrough in the study of lower bounds on , which have resisted
substantial improvement in the last 70 years. For general distance labeling of
sparse graphs, we show a lowerbound of , where is the communication complexity of the
Sum-Index problem over . Our results suggest that the best achievable
hub-label size and distance-label size in sparse graphs may be
for some
A network-based rating system and its resistance to bribery
We study a rating system in which a set of individuals (e.g., the customers of a restaurant) evaluate a given service (e.g, the restaurant), with their aggregated opinion determining the probability of all individuals to use the service and thus its generated revenue. We explicitly model the influence relation by a social network, with individuals being influenced by the evaluation of their trusted peers. On top of that we allow a malicious service provider (e.g., the restaurant owner) to bribe some individuals, i.e., to invest a part of his or her expected income to modify their opinion, therefore influencing his or her final gain. We analyse the effect of bribing strategies under various constraints, and we show under what conditions the system is bribery-proof, i.e., no bribing strategy yields a strictly positive expected gain to the service provider
On the Sample Complexity of Subspace Learning
A large number of algorithms in machine learning, from principal component
analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral
embedding and support estimation methods, rely on estimating a linear subspace
from samples. In this paper we introduce a general formulation of this problem
and derive novel learning error estimates. Our results rely on natural
assumptions on the spectral properties of the covariance operator associated to
the data distribu- tion, and hold for a wide class of metrics between
subspaces. As special cases, we discuss sharp error estimates for the
reconstruction properties of PCA and spectral support estimation. Key to our
analysis is an operator theoretic approach that has broad applicability to
spectral learning methods.Comment: Extendend Version of conference pape
- …