1,136 research outputs found
Stochastic Block Coordinate Frank-Wolfe Algorithm for Large-Scale Biological Network Alignment
With increasingly "big" data available in biomedical research, deriving
accurate and reproducible biology knowledge from such big data imposes enormous
computational challenges. In this paper, motivated by recently developed
stochastic block coordinate algorithms, we propose a highly scalable randomized
block coordinate Frank-Wolfe algorithm for convex optimization with general
compact convex constraints, which has diverse applications in analyzing
biomedical data for better understanding cellular and disease mechanisms. We
focus on implementing the derived stochastic block coordinate algorithm to
align protein-protein interaction networks for identifying conserved functional
pathways based on the IsoRank framework. Our derived stochastic block
coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and
naturally leads to the decreased computational cost (time and space) for each
iteration. Our experiments for querying conserved functional protein complexes
in yeast networks confirm the effectiveness of this technique for analyzing
large-scale biological networks
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures
In this paper, we study the problem of speeding up a type of optimization
algorithms called Frank-Wolfe, a conditional gradient method. We develop and
employ two novel inner product search data structures, improving the prior
fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021].
* The first data structure uses low-dimensional random projection to reduce
the problem to a lower dimension, then uses efficient inner product data
structure. It has preprocessing time and
per iteration cost for small constant .
* The second data structure leverages the recent development in adaptive
inner product search data structure that can output estimations to all inner
products. It has preprocessing time and per iteration cost
.
The first algorithm improves the state-of-the-art (with preprocessing time
and per iteration cost ) in all
cases, while the second one provides an even faster preprocessing time and is
suitable when the number of iterations is small
Sparsity-based algorithms for inverse problems
Inverse problems are problems where we want to estimate the values of certain parameters of a system given observations of the system. Such problems occur in several areas of science and engineering. Inverse problems are often ill-posed, which means that the observations of the system do not uniquely define the parameters we seek to estimate, or that the solution is highly sensitive to small changes in the observation. In order to solve such problems, therefore, we need to make use of additional knowledge about the system at hand. One such prior information is given by the notion of sparsity. Sparsity refers to the knowledge that the solution to the inverse problem can be expressed as a combination of a few terms. The sparsity of a solution can be controlled explicitly or implicitly. An explicit way to induce sparsity is to minimize the number of non-zero terms in the solution. Implicit use of sparsity can be made, for e.g., by making adjustments to the algorithm used to arrive at the solution.In this thesis we studied various inverse problems that arise in different application areas, such as tomographic imaging and equation learning for biology, and showed how ideas of sparsity can be used in each case to design effective algorithms to solve such problems.Financial support was provided by the European Union's Horizon 2020 Research and Innovation programme under the Marie Sklodowska-Curie grant agreement no.~765604Number theory, Algebra and Geometr
Pattern Recognition on Random Graphs
The field of pattern recognition developed significantly in the 1960s, and the field of random graph inference has enjoyed much recent progress in both theory and application. This dissertation focuses on pattern recognition in the context of a particular family of random graphs, namely the stochastic blockmodels, from the two main perspectives of single graph inference and joint graph inference.
Single graph inference is the performance of statistical inference on one single observed graph. Given a single graph realized from a stochastic blockmodel, we here consider the specific exploitation tasks of vertex classification, clustering, and nomination.
Given an observed graph, vertex classification is the identification of the block labels of test vertices after learning from the training vertices. We propose a robust vertex classifier, which utilizes a representation of a test vertex as a sparse combination of the training vertices. Our proposed classifier is demonstrated to be robust against data contamination, and has superior performance over classical spectral-embedding classifiers in simulation and real data experiments.
Vertex clustering groups vertices based on their similarities. We present a model-based clustering algorithm for graphs drawn from a stochastic blockmodel, and illustrate its usefulness on a case study in online advertising. We demonstrate the practical value of our vertex clustering method for efficiently delivering internet advertisements.
Under the stochastic blockmodel framework, suppose one block is of particular interest. The task of vertex nomination is to create a nomination list so that vertices from the group of interest are concentrated abundantly near the top of the list. We present several vertex nomination schemes, and propose a vertex nomination scheme that is scalable for large graphs. We demonstrate the effectiveness of our methodology on simulation and real datasets.
Next, we consider joint graph inference, which involves the joint space of multiple graphs; in this dissertation, we specifically consider joint graph inference on two graphs. Given two graphs, we consider the tasks of seeded graph matching for large graphs and joint vertex classification.
Graph matching is the task of aligning two graphs so as to minimize the number of edge disagreements between them. We propose a scalable graph matching algorithm, which uses a divide-and-conquer approach to scale the state-of-the-art seeded graph matching algorithm to big graph data. We prove theoretical performance guarantees, and demonstrate desired properties such as scalability, robustness, accuracy and runtime in both simulated data and human brain connectome data.
Within the joint graph inference framework, we present a case study on the paired chemical and electrical Caenorhabditis elegans neural connectomes. Motivated by the success of seeded graph matching on the paired connectomes, we propose joint vertex classification on the paired connectomes. We demonstrate that joint inference on the paired connectomes yields more accurate results than single inference on either connectome. This serves as a first step for providing a methodological and quantitative approach for understanding the coexistent significance of the chemical and electrical connectomes
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Graph inference and graph matching problems : tehory and algorithms
Tribunal: Alex Bronstein (Tel Aviv University), Marcelo Lanzilotta (Universidad de la República), Gonzalo Mateos (University of Rochester), Gadiel Seroussi (Universidad de la República)Almost every field has some problems related with graphs or networks. From natural examples in physics and mathematics, to applications in medicine and signal processing, graphs are either a very powerful tool, or a very rich object of
interest. In this thesis we address two classes of graph-related problems. First, we focus on graph-inference problems, consisting in the estimation of a graph or network from a dataset. In this part of the manuscript, we modify the existing formulations of the inference problem to incorporate prior topological information of the graph, and to jointly infer several graphs in a collaborative way. We apply these techniques to infer genetic regulation networks, brain connectivity patterns, and economyrelated networks. We also present a new problem, which consists of the estimation of mobility patterns from highly asynchronous and incomplete data. We give a first formulation of the problem with its corresponding optimization, and present results for airplane routes and New York taxis mobility patterns. The second class consists of the so-called graph matching problems. In this type of problems two graphs are given, and the objective is to find the best alignment between them. This problem is of great interest both from an algorithmic and theoretical point of view, besides the very important applications. Its interest and difficulty lie in the combinatorial nature of the problem: the cost of seeking among
all the possible permutations grows exponentially with the number of nodes, and hence becomes intractable even for small graphs. First, we focus on the algorithmic aspect of the graph matching problem. We present two methods based on relaxations of the discrete optimization problem. The first one is inspired in ideas from the sparse modeling community, and the second one is based on a theorem presented in this manuscript. The importance of these methods is illustrated with several applications. Finally, we address some theoretical aspects about graph matching and other related problems. The main question tackled in the last chapter is the following: when do the graph matching problem and its convex relaxation have the same solution? A probabilistic approach is first given, showing that, asymptotically, the most common convex relaxation fails, while a non-convex relaxation succeeds with probability one if the graphs to be matched are correlated enough, showing a phase-transition type of behavior. On the other hand, a deterministic approach is presented, stating conditions on the eigenvectors and eigenvalues of the adjacency matrix for guaranteeing the correctness of the convex relaxation solution. Other results and conjectures relating the spectrum and symmetry of a graph are presented as well.En prácticamente todos los campos hay problemas relacionados con grafos o redes. Desde los ejemplos más naturales en física y matem ática, hasta aplicaciones en medicina y procesamiento de señales, los grafos son una herramienta muy poderosa, o un objeto de estudio muy rico e interesante. En esta tesis atacamos dos clases de problemas relacionados con grafos. Primero, nos enfocamos en problemas de inferencia de grafos, que consisten en estimar un grafo o red a partir de cierto conjunto de datos. En esta parte del manuscrito, modificamos las formulaciones existentes de inferencia de grafos para incorporar información topológica previamente conocida sobre el grafo, y para inferir de manera conjunta varios grafos, en un modo colaborativo. Aplicamos estas técnicas para inferir redes de regulaci ón genética, patrones de conectividad cerebral, y redes relacionadas con el mercado accionario. También presentamos un nuevo problema, que consiste en la estimación de patrones de movimiento a partir de un conjunto de datos incompleto, y altamente asíncrono. Mostramos primero una formulación del problema con su correspondiente optimización, y presentamos resultados para rutas de aviones en Estados Unidos, y patrones de movilidad de taxis en New York. La segunda clase consiste en los llamados graph matching problems (problemas de apareamiento de grafos). En este tipo de problemas, dos grafos son dados, y el objetivo es encontrar el mejor alineamiento entre ellos. Este problema es de gran interés tanto desde un punto de vista algorítmico como teórico, además de las importantes aplicaciones que tiene. El interés y la dificultad de este problema tienen raíz en la naturaleza combinatoriadel mismo: el costo de buscar entre todas las permutaciones posibles crece exponencialmente con el número de nodos, y por lo tanto se vuelve rápidamente intratable, incluso para grafos chicos. Primero, nos enfocamos en el aspecto algorítmico del problema de graph match- ing. Presentamos dos métodos basados en relajaciones del problema de optimización discreta. El primero de ellos está inspirado en ideas de la comunidad de sparse modeling, y el segundo est a basado en un teorema presentado en este manuscritp. La importancia de estos m etodos es ilustrada con varias aplicaciones a lo largo del capítulo. Finalmente, atacamos algunos aspectos teóricos sobre graph matching y otros problemas relacionados. La pregunta principal que se encara en el último capítulo es la siguiente: >cuáando el problema de graph matching y su relajación convexa tienen la misma solucióon? Primero damos un enfoque probabilístico mostrando que, asintoticamente, la relajación convexa más común falla, mientras que una relajación no convexa es capaz de resolver el problema con probabilidad uno, siempre y cuando los grafos originales estén lo sufi cientemente correlacionados, mostrando un comportamiento del estilo de transicióon de fases. Por otro lado, un enfoque determinístico es también presentado, estableciendo condiciones sobre los valores y vectores propios de las matrices de adjacencia de los grafos, que garantizan que el problema de graph matching y su relajacióon convexa tienen la misma solución. Otros resultados y conjeturas relacionando el espectro y la simetría de un grafo son presentados también en este capítulo
- …