2,664 research outputs found
Dual Averaging Method for Online Graph-structured Sparsity
Online learning algorithms update models via one sample per iteration, thus
efficient to process large-scale datasets and useful to detect malicious events
for social benefits, such as disease outbreak and traffic congestion on the
fly. However, existing algorithms for graph-structured models focused on the
offline setting and the least square loss, incapable for online setting, while
methods designed for online setting cannot be directly applied to the problem
of complex (usually non-convex) graph-structured sparsity model. To address
these limitations, in this paper we propose a new algorithm for
graph-structured sparsity constraint problems under online setting, which we
call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both
averaging gradient (in dual space) and primal variables (in primal space) onto
lower dimensional subspaces, thus capturing the graph-structured sparsity
effectively. Furthermore, the objective functions assumed here are generally
convex so as to handle different losses for online learning settings. To the
best of our knowledge, \textsc{GraphDA} is the first online learning algorithm
for graph-structure constrained optimization problems. To validate our method,
we conduct extensive experiments on both benchmark graph and real-world graph
datasets. Our experiment results show that, compared to other baseline methods,
\textsc{GraphDA} not only improves classification performance, but also
successfully captures graph-structured features more effectively, hence
stronger interpretability.Comment: 11 pages, 14 figure
HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks
The unsupervised detection of anomalies in time series data has important
applications in user behavioral modeling, fraud detection, and cybersecurity.
Anomaly detection has, in fact, been extensively studied in categorical
sequences. However, we often have access to time series data that represent
paths through networks. Examples include transaction sequences in financial
networks, click streams of users in networks of cross-referenced documents, or
travel itineraries in transportation networks. To reliably detect anomalies, we
must account for the fact that such data contain a large number of independent
observations of paths constrained by a graph topology. Moreover, the
heterogeneity of real systems rules out frequency-based anomaly detection
techniques, which do not account for highly skewed edge and degree statistics.
To address this problem, we introduce HYPA, a novel framework for the
unsupervised detection of anomalies in large corpora of variable-length
temporal paths in a graph. HYPA provides an efficient analytical method to
detect paths with anomalous frequencies that result from nodes being traversed
in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM
Data Mining (SDM 2020
A Size-Free CLT for Poisson Multinomials and its Applications
An -Poisson Multinomial Distribution (PMD) is the distribution of the
sum of independent random vectors supported on the set of standard basis vectors in . We show
that any -PMD is -close in total
variation distance to the (appropriately discretized) multi-dimensional
Gaussian with the same first two moments, removing the dependence on from
the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is
obtained by bootstrapping the Valiant-Valiant CLT itself through the structural
characterization of PMDs shown in recent work by Daskalakis, Kamath, and
Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS
for approximate Nash equilibria in anonymous games, significantly improving the
state of the art, and matching qualitatively the running time dependence on
and of the best known algorithm for two-strategy anonymous
games. Our new CLT also enables the construction of covers for the set of
-PMDs, which are proper and whose size is shown to be essentially
optimal. Our cover construction combines our CLT with the Shapley-Folkman
theorem and recent sparsification results for Laplacian matrices by Batson,
Spielman, and Srivastava. Our cover size lower bound is based on an algebraic
geometric construction. Finally, leveraging the structural properties of the
Fourier spectrum of PMDs we show that these distributions can be learned from
samples in -time, removing
the quasi-polynomial dependence of the running time on from the
algorithm of Daskalakis, Kamath, and Tzamos.Comment: To appear in STOC 201
Extracting tag hierarchies
Tagging items with descriptive annotations or keywords is a very natural way
to compress and highlight information about the properties of the given entity.
Over the years several methods have been proposed for extracting a hierarchy
between the tags for systems with a "flat", egalitarian organization of the
tags, which is very common when the tags correspond to free words given by
numerous independent people. Here we present a complete framework for automated
tag hierarchy extraction based on tag occurrence statistics. Along with
proposing new algorithms, we are also introducing different quality measures
enabling the detailed comparison of competing approaches from different
aspects. Furthermore, we set up a synthetic, computer generated benchmark
providing a versatile tool for testing, with a couple of tunable parameters
capable of generating a wide range of test beds. Beside the computer generated
input we also use real data in our studies, including a biological example with
a pre-defined hierarchy between the tags. The encouraging similarity between
the pre-defined and reconstructed hierarchy, as well as the seemingly
meaningful hierarchies obtained for other real systems indicate that tag
hierarchy extraction is a very promising direction for further research with a
great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure
Distributed field estimation in wireless sensor networks
This work takes into account the problem of distributed estimation of a physical field of interest through a wireless sesnor networks
- …