Search CORE

2,664 research outputs found

Dual Averaging Method for Online Graph-structured Sparsity

Author: Bahmani Sohail
Bottou Léon
Chen Feng
Chen Lin
Duchi John
Duchi John
Gao Xiand
Hegde Chinmay
Johnson David S
Kingma Diederik P
Langford John
Qian Jing
Xiao Lin
Zhou Pan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 25/05/2019
Field of study

Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.Comment: 11 pages, 14 figure

arXiv.org e-Print Archive

Crossref

Scipedia

HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

Author: Casiraghi Giona
Eliassi-Rad Tina
LaRock Timothy
Nanumyan Vahan
Scholtes Ingo
Schweitzer Frank
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 29/01/2020
Field of study

The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

arXiv.org e-Print Archive

Crossref

A Size-Free CLT for Poisson Multinomials and its Applications

Author: Daskalakis Konstantinos
De Anindya
Kamath Gautam
Kamath Gautam Chetan
Tzamos Christos
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2016
Field of study

(n,k)

-Poisson Multinomial Distribution (PMD) is the distribution of the sum of

n

independent random vectors supported on the set

{\cal B}_k=\{e_1,\ldots,e_k\}

of standard basis vectors in

\mathbb{R}^k

. We show that any

(n,k)

-PMD is

{\rm poly}\left({k\over \sigma}\right)

-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on

n

from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on

n

and

1/\varepsilon

of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of

(n,k)

-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from

O_k(1/\varepsilon^2)

samples in

{\rm poly}_k(1/\varepsilon)

-time, removing the quasi-polynomial dependence of the running time on

1/\varepsilon

from the algorithm of Daskalakis, Kamath, and Tzamos.Comment: To appear in STOC 201

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Extracting tag hierarchies

Author: Palla Gergely
Pollner Péter
Tibély Gergely
Vicsek Tamás
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

FigShare

Distributed field estimation in wireless sensor networks

Author: BATTISTI TIMOTHY
Publication venue
Publication date: 01/12/2010
Field of study

This work takes into account the problem of distributed estimation of a physical field of interest through a wireless sesnor networks

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza