Search CORE

31 research outputs found

The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem

Author: Boyd Stephen
Diaconis Persi
Sun Jun
Xiao Lin
Publication venue: 'The Japan Society for Industrial and Applied Mathematics'
Publication date: 01/01/2006
Field of study

We consider a Markov process on a connected graph, with edges labeled with transition rates between the adjacent vertices. The distribution of the Markov process converges to the uniform distribution at a rate determined by the second smallest eigenvalue lambda_2 of the Laplacian of the weighted graph. In this paper we consider the problem of assigning transition rates to the edges so as to maximize lambda_2 subject to a linear constraint on the rates. This is the problem of finding the fastest mixing Markov process (FMMP) on the graph. We show that the FMMP problem is a convex optimization problem, which can in turn be expressed as a semidefinite program, and therefore effectively solved numerically. We formulate a dual of the FMMP problem and show that it has a natural geometric interpretation as a maximum variance unfolding (MVU) problem, , the problem of choosing a set of points to be as far apart as possible, measured by their variance, while respecting local distance constraints. This MVU problem is closely related to a problem recently proposed by Weinberger and Saul as a method for "unfolding" high-dimensional data that lies on a low-dimensional manifold. The duality between the FMMP and MVU problems sheds light on both problems, and allows us to characterize and, in some cases, find optimal solutions

CiteSeerX

Caltech Authors

On the Sample Complexity of Subspace Learning

Author: Canas Guille D.
Rosasco Lorenzo
Rudi Alessandro
Publication venue
Publication date: 01/01/2013
Field of study

A large number of algorithms in machine learning, from principal component analysis (PCA), and its non-linear (kernel) extensions, to more recent spectral embedding and support estimation methods, rely on estimating a linear subspace from samples. In this paper we introduce a general formulation of this problem and derive novel learning error estimates. Our results rely on natural assumptions on the spectral properties of the covariance operator associated to the data distribu- tion, and hold for a wide class of metrics between subspaces. As special cases, we discuss sharp error estimates for the reconstruction properties of PCA and spectral support estimation. Key to our analysis is an operator theoretic approach that has broad applicability to spectral learning methods.Comment: Extendend Version of conference pape

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Genova

Minimizing Polarization and Disagreement in Social Networks

Author: Musco Cameron
Musco Christopher
Tsourakakis Charalampos E.
Publication venue
Publication date: 28/12/2017
Field of study

The rise of social media and online social networks has been a disruptive force in society. Opinions are increasingly shaped by interactions on online social media, and social phenomena including disagreement and polarization are now tightly woven into everyday life. In this work we initiate the study of the following question: given

n

agents, each with its own initial opinion that reflects its core value on a topic, and an opinion dynamics model, what is the structure of a social network that minimizes {\em polarization} and {\em disagreement} simultaneously? This question is central to recommender systems: should a recommender system prefer a link suggestion between two online users with similar mindsets in order to keep disagreement low, or between two users with different opinions in order to expose each to the other's viewpoint of the world, and decrease overall levels of polarization? Our contributions include a mathematical formalization of this question as an optimization problem and an exact, time-efficient algorithm. We also prove that there always exists a network with

O(n/\epsilon^2)

edges that is a

(1+\epsilon)

approximation to the optimum. For a fixed graph, we additionally show how to optimize our objective function over the agents' innate opinions in polynomial time. We perform an empirical study of our proposed methods on synthetic and real-world data that verify their value as mining tools to better understand the trade-off between of disagreement and polarization. We find that there is a lot of space to reduce both polarization and disagreement in real-world networks; for instance, on a Reddit network where users exchange comments on politics, our methods achieve a

\sim 60\,000

-fold reduction in polarization and disagreement.Comment: 19 pages (accepted, WWW 2018

arXiv.org e-Print Archive

Crossref

A Duality View of Spectral Methods for Dimensionality Reduction

Author: Jun Sun
Lin Xiao
Stephen Boyd
Publication venue
Publication date: 01/01/2006
Field of study

We present a unified duality view of several recently emerged spectral methods for nonlinear dimensionality reduction, including Isomap, locally linear embedding, Laplacian eigenmaps, and maximum variance unfolding. We discuss the duality theory for the maximum variance unfolding problem, and show that other methods are directly related to either its primal formulation or its dual formulation, or can be interpreted from the optimality conditions. This duality framework reveals close connections between these seemingly quite different algorithms. In particular, it resolves the myth about these methods in using either the top eigenvectors of a dense matrix, or the bottom eigenvectors of a sparse matrix — these two eigenspaces are exactly aligned at primaldual optimality

CiteSeerX

Crossref

Caltech Authors

Subsampling Algorithms for Semidefinite Programming

Author: d'Aspremont Alexandre
Publication venue
Publication date: 01/01/2011
Field of study

We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls granularity, i.e. the tradeoff between cost per iteration and total number of iterations. Furthermore, the total computational cost is directly proportional to the complexity (i.e. rank) of the solution. We study numerical performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Optimal Data Collection For Informative Rankings Expose Well-Connected Graphs

Author: Brune Christoph
Osher Stanley J.
Osting Braxton
Publication venue
Publication date: 01/01/2014
Field of study

Given a graph where vertices represent alternatives and arcs represent pairwise comparison data, the statistical ranking problem is to find a potential function, defined on the vertices, such that the gradient of the potential function agrees with the pairwise comparisons. Our goal in this paper is to develop a method for collecting data for which the least squares estimator for the ranking problem has maximal Fisher information. Our approach, based on experimental design, is to view data collection as a bi-level optimization problem where the inner problem is the ranking problem and the outer problem is to identify data which maximizes the informativeness of the ranking. Under certain assumptions, the data collection problem decouples, reducing to a problem of finding multigraphs with large algebraic connectivity. This reduction of the data collection problem to graph-theoretic questions is one of the primary contributions of this work. As an application, we study the Yahoo! Movie user rating dataset and demonstrate that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking. As another application, we study the 2011-12 NCAA football schedule and propose schedules with the same number of games which are significantly more informative. Using spectral clustering methods to identify highly-connected communities within the division, we argue that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games.Comment: 31 pages, 10 figures, 3 table

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Distributed Maximum Likelihood Sensor Network Localization

Author: Leus Geert
Simonetto Andrea
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/12/2013
Field of study

We propose a class of convex relaxations to solve the sensor network localization problem, based on a maximum likelihood (ML) formulation. This class, as well as the tightness of the relaxations, depends on the noise probability density function (PDF) of the collected measurements. We derive a computational efficient edge-based version of this ML convex relaxation class and we design a distributed algorithm that enables the sensor nodes to solve these edge-based convex programs locally by communicating only with their close neighbors. This algorithm relies on the alternating direction method of multipliers (ADMM), it converges to the centralized solution, it can run asynchronously, and it is computation error-resilient. Finally, we compare our proposed distributed scheme with other available methods, both analytically and numerically, and we argue the added value of ADMM, especially for large-scale networks

arXiv.org e-Print Archive

CiteSeerX