108 research outputs found
Targeted matrix completion
Matrix completion is a problem that arises in many data-analysis settings
where the input consists of a partially-observed matrix (e.g., recommender
systems, traffic matrix analysis etc.). Classical approaches to matrix
completion assume that the input partially-observed matrix is low rank. The
success of these methods depends on the number of observed entries and the rank
of the matrix; the larger the rank, the more entries need to be observed in
order to accurately complete the matrix. In this paper, we deal with matrices
that are not necessarily low rank themselves, but rather they contain low-rank
submatrices. We propose Targeted, which is a general framework for completing
such matrices. In this framework, we first extract the low-rank submatrices and
then apply a matrix-completion algorithm to these low-rank submatrices as well
as the remainder matrix separately. Although for the completion itself we use
state-of-the-art completion methods, our results demonstrate that Targeted
achieves significantly smaller reconstruction errors than other classical
matrix-completion methods. One of the key technical contributions of the paper
lies in the identification of the low-rank submatrices from the input
partially-observed matrices.Comment: Proceedings of the 2017 SIAM International Conference on Data Mining
(SDM
Pricing Algorithms For a Two-sided Internet Advertisement Market
The Google AdSense Program is a successful internet advertisement program where Google places contextual adverts on third-party websites and shares the resulting revenue with each publisher. Advertisers have budgets and bid on ad slots while publishers set reserve prices for the ad slots on their websites. Following previous modelling efforts, we model the program as a two-sided market with advertisers on one side and publishers on the other. We show a reduction from the Generalised Assignment Problem (GAP) to the problem of computing the revenue maximising allocation and pricing of publisher slots under a first-price auction. GAP is APX-hard but a (1-1/e) approximation is known. We compute truthful and revenue-maximizing prices and allocation of ad slots to advertisers under a second-price auction. The auctioneer's revenue is within (1-1/e) second-price optimal
Matrix completion with queries
In many applications, e.g., recommender systems and traffic monitoring, the
data comes in the form of a matrix that is only partially observed and low
rank. A fundamental data-analysis task for these datasets is matrix completion,
where the goal is to accurately infer the entries missing from the matrix. Even
when the data satisfies the low-rank assumption, classical matrix-completion
methods may output completions with significant error -- in that the
reconstructed matrix differs significantly from the true underlying matrix.
Often, this is due to the fact that the information contained in the observed
entries is insufficient. In this work, we address this problem by proposing an
active version of matrix completion, where queries can be made to the true
underlying matrix. Subsequently, we design Order&Extend, which is the first
algorithm to unify a matrix-completion approach and a querying strategy into a
single algorithm. Order&Extend is able identify and alleviate insufficient
information by judiciously querying a small number of additional entries. In an
extensive experimental evaluation on real-world datasets, we demonstrate that
our algorithm is efficient and is able to accurately reconstruct the true
matrix while asking only a small number of queries.Comment: Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Minin
Markov Chain Monitoring
In networking applications, one often wishes to obtain estimates about the
number of objects at different parts of the network (e.g., the number of cars
at an intersection of a road network or the number of packets expected to reach
a node in a computer network) by monitoring the traffic in a small number of
network nodes or edges. We formalize this task by defining the 'Markov Chain
Monitoring' problem.
Given an initial distribution of items over the nodes of a Markov chain, we
wish to estimate the distribution of items at subsequent times. We do this by
asking a limited number of queries that retrieve, for example, how many items
transitioned to a specific node or over a specific edge at a particular time.
We consider different types of queries, each defining a different variant of
the Markov chain monitoring. For each variant, we design efficient algorithms
for choosing the queries that make our estimates as accurate as possible. In
our experiments with synthetic and real datasets we demonstrate the efficiency
and the efficacy of our algorithms in a variety of settings.Comment: 13 pages, 10 figures, 1 tabl
Team Formation for Scheduling Educational Material in Massive Online Classes
Whether teaching in a classroom or a Massive Online Open Course it is crucial
to present the material in a way that benefits the audience as a whole. We
identify two important tasks to solve towards this objective, 1 group students
so that they can maximally benefit from peer interaction and 2 find an optimal
schedule of the educational material for each group. Thus, in this paper, we
solve the problem of team formation and content scheduling for education. Given
a time frame d, a set of students S with their required need to learn different
activities T and given k as the number of desired groups, we study the problem
of finding k group of students. The goal is to teach students within time frame
d such that their potential for learning is maximized and find the best
schedule for each group. We show this problem to be NP-hard and develop a
polynomial algorithm for it. We show our algorithm to be effective both on
synthetic as well as a real data set. For our experiments, we use real data on
students' grades in a Computer Science department. As part of our contribution,
we release a semi-synthetic dataset that mimics the properties of the real
data
A Divide-and-Conquer Algorithm for Betweenness Centrality
The problem of efficiently computing the betweenness centrality of nodes has
been researched extensively. To date, the best known exact and centralized
algorithm for this task is an algorithm proposed in 2001 by Brandes. The
contribution of our paper is Brandes++, an algorithm for exact efficient
computation of betweenness centrality. The crux of our algorithm is that we
create a sketch of the graph, that we call the skeleton, by replacing subgraphs
with simpler graph structures. Depending on the underlying graph structure,
using this skeleton and by keeping appropriate summaries Brandes++ we can
achieve significantly low running times in our computations. Extensive
experimental evaluation on real life datasets demonstrate the efficacy of our
algorithm for different types of graphs. We release our code for benefit of the
research community.Comment: Shorter version of this paper appeared in Siam Data Mining 201
Community-aware network sparsification
Network sparsification aims to reduce the number of edges of a network while maintaining its structural properties; such properties include shortest paths, cuts, spectral measures, or network modularity. Sparsification has multiple applications, such as, speeding up graph-mining algorithms, graph visualization, as well as identifying the important network edges.
In this paper we consider a novel formulation of the network-sparsification problem. In addition to the network, we also consider as input a set of communities. The goal is to sparsify the network so as to preserve the network structure with respect to the given communities. We introduce two variants of the community-aware sparsification problem, leading to sparsifiers that satisfy different connectedness community properties. From the technical point of view, we prove hardness results and devise effective approximation algorithms. Our experimental results on a large collection of datasets demonstrate the effectiveness of our algorithms.https://epubs.siam.org/doi/10.1137/1.9781611974973.48Accepted manuscrip
- …