31 research outputs found
The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem
We consider a Markov process on a connected graph, with edges labeled with transition rates between the adjacent vertices. The distribution of the Markov process converges to the uniform distribution at a rate determined by the second smallest eigenvalue lambda_2 of the Laplacian of the weighted graph. In this paper we consider the problem of assigning transition rates to the edges so as to maximize lambda_2 subject to a linear constraint on the rates. This is the problem of finding the fastest mixing Markov process (FMMP) on the graph. We show that the FMMP problem is a convex optimization problem, which can in turn be expressed as a semidefinite program, and therefore effectively solved numerically. We formulate a dual of the FMMP problem and show that it has a natural geometric interpretation as a maximum variance unfolding (MVU) problem, , the problem of choosing a set of points to be as far apart as possible, measured by their variance, while respecting local distance constraints. This MVU problem is closely related to a problem recently proposed by Weinberger and Saul as a method for "unfolding" high-dimensional data that lies on a low-dimensional manifold. The duality between the FMMP and MVU problems sheds light on both problems, and allows us to characterize and, in some cases, find optimal solutions
On the Sample Complexity of Subspace Learning
A large number of algorithms in machine learning, from principal component
analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral
embedding and support estimation methods, rely on estimating a linear subspace
from samples. In this paper we introduce a general formulation of this problem
and derive novel learning error estimates. Our results rely on natural
assumptions on the spectral properties of the covariance operator associated to
the data distribu- tion, and hold for a wide class of metrics between
subspaces. As special cases, we discuss sharp error estimates for the
reconstruction properties of PCA and spectral support estimation. Key to our
analysis is an operator theoretic approach that has broad applicability to
spectral learning methods.Comment: Extendend Version of conference pape
Minimizing Polarization and Disagreement in Social Networks
The rise of social media and online social networks has been a disruptive
force in society. Opinions are increasingly shaped by interactions on online
social media, and social phenomena including disagreement and polarization are
now tightly woven into everyday life. In this work we initiate the study of the
following question: given agents, each with its own initial opinion that
reflects its core value on a topic, and an opinion dynamics model, what is the
structure of a social network that minimizes {\em polarization} and {\em
disagreement} simultaneously?
This question is central to recommender systems: should a recommender system
prefer a link suggestion between two online users with similar mindsets in
order to keep disagreement low, or between two users with different opinions in
order to expose each to the other's viewpoint of the world, and decrease
overall levels of polarization? Our contributions include a mathematical
formalization of this question as an optimization problem and an exact,
time-efficient algorithm. We also prove that there always exists a network with
edges that is a approximation to the optimum.
For a fixed graph, we additionally show how to optimize our objective function
over the agents' innate opinions in polynomial time.
We perform an empirical study of our proposed methods on synthetic and
real-world data that verify their value as mining tools to better understand
the trade-off between of disagreement and polarization. We find that there is a
lot of space to reduce both polarization and disagreement in real-world
networks; for instance, on a Reddit network where users exchange comments on
politics, our methods achieve a -fold reduction in polarization
and disagreement.Comment: 19 pages (accepted, WWW 2018
A Duality View of Spectral Methods for Dimensionality Reduction
We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show that other methods are directly related to either its primal formulation or its dual formulation, or can be interpreted from the optimality conditions. This duality framework reveals close connections between these seemingly quite different algorithms. In particular, it resolves the myth about these methods in using either the top eigenvectors of a dense matrix, or the bottom eigenvectors of a sparse matrix — these two eigenspaces are exactly aligned at primaldual optimality
Subsampling Algorithms for Semidefinite Programming
We derive a stochastic gradient algorithm for semidefinite optimization using
randomization techniques. The algorithm uses subsampling to reduce the
computational cost of each iteration and the subsampling ratio explicitly
controls granularity, i.e. the tradeoff between cost per iteration and total
number of iterations. Furthermore, the total computational cost is directly
proportional to the complexity (i.e. rank) of the solution. We study numerical
performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System
Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs
Given a graph where vertices represent alternatives and arcs represent
pairwise comparison data, the statistical ranking problem is to find a
potential function, defined on the vertices, such that the gradient of the
potential function agrees with the pairwise comparisons. Our goal in this paper
is to develop a method for collecting data for which the least squares
estimator for the ranking problem has maximal Fisher information. Our approach,
based on experimental design, is to view data collection as a bi-level
optimization problem where the inner problem is the ranking problem and the
outer problem is to identify data which maximizes the informativeness of the
ranking. Under certain assumptions, the data collection problem decouples,
reducing to a problem of finding multigraphs with large algebraic connectivity.
This reduction of the data collection problem to graph-theoretic questions is
one of the primary contributions of this work. As an application, we study the
Yahoo! Movie user rating dataset and demonstrate that the addition of a small
number of well-chosen pairwise comparisons can significantly increase the
Fisher informativeness of the ranking. As another application, we study the
2011-12 NCAA football schedule and propose schedules with the same number of
games which are significantly more informative. Using spectral clustering
methods to identify highly-connected communities within the division, we argue
that the NCAA could improve its notoriously poor rankings by simply scheduling
more out-of-conference games.Comment: 31 pages, 10 figures, 3 table
Distributed Maximum Likelihood Sensor Network Localization
We propose a class of convex relaxations to solve the sensor network
localization problem, based on a maximum likelihood (ML) formulation. This
class, as well as the tightness of the relaxations, depends on the noise
probability density function (PDF) of the collected measurements. We derive a
computational efficient edge-based version of this ML convex relaxation class
and we design a distributed algorithm that enables the sensor nodes to solve
these edge-based convex programs locally by communicating only with their close
neighbors. This algorithm relies on the alternating direction method of
multipliers (ADMM), it converges to the centralized solution, it can run
asynchronously, and it is computation error-resilient. Finally, we compare our
proposed distributed scheme with other available methods, both analytically and
numerically, and we argue the added value of ADMM, especially for large-scale
networks