    Stochastic Block Coordinate Frank-Wolfe Algorithm for Large-Scale Biological Network Alignment

    With increasingly "big" data available in biomedical research, deriving accurate and reproducible biology knowledge from such big data imposes enormous computational challenges. In this paper, motivated by recently developed stochastic block coordinate algorithms, we propose a highly scalable randomized block coordinate Frank-Wolfe algorithm for convex optimization with general compact convex constraints, which has diverse applications in analyzing biomedical data for better understanding cellular and disease mechanisms. We focus on implementing the derived stochastic block coordinate algorithm to align protein-protein interaction networks for identifying conserved functional pathways based on the IsoRank framework. Our derived stochastic block coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and naturally leads to the decreased computational cost (time and space) for each iteration. Our experiments for querying conserved functional protein complexes in yeast networks confirm the effectiveness of this technique for analyzing large-scale biological networks

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

    In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time O~(ndω1+dn1+o(1))\tilde O(nd^{\omega-1}+dn^{1+o(1)}) and per iteration cost O~(d+nρ)\tilde O(d+n^\rho) for small constant ρ\rho. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time O~(nd)\tilde O(nd) and per iteration cost O~(d+n)\tilde O(d+n). The first algorithm improves the state-of-the-art (with preprocessing time O~(d2n1+o(1))\tilde O(d^2n^{1+o(1)}) and per iteration cost O~(dnρ)\tilde O(dn^\rho)) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small

    Sparsity-based algorithms for inverse problems

    Inverse problems are problems where we want to estimate the values of certain parameters of a system given observations of the system. Such problems occur in several areas of science and engineering. Inverse problems are often ill-posed, which means that the observations of the system do not uniquely define the parameters we seek to estimate, or that the solution is highly sensitive to small changes in the observation. In order to solve such problems, therefore, we need to make use of additional knowledge about the system at hand. One such prior information is given by the notion of sparsity. Sparsity refers to the knowledge that the solution to the inverse problem can be expressed as a combination of a few terms. The sparsity of a solution can be controlled explicitly or implicitly. An explicit way to induce sparsity is to minimize the number of non-zero terms in the solution. Implicit use of sparsity can be made, for e.g., by making adjustments to the algorithm used to arrive at the solution.In this thesis we studied various inverse problems that arise in different application areas, such as tomographic imaging and equation learning for biology, and showed how ideas of sparsity can be used in each case to design effective algorithms to solve such problems.Financial support was provided by the European Union's Horizon 2020 Research and Innovation programme under the Marie Sklodowska-Curie grant agreement no.~765604Number theory, Algebra and Geometr

    Graph inference and graph matching problems : tehory and algorithms

    Tribunal: Alex Bronstein (Tel Aviv University), Marcelo Lanzilotta (Universidad de la República), Gonzalo Mateos (University of Rochester), Gadiel Seroussi (Universidad de la República)Almost every field has some problems related with graphs or networks. From natural examples in physics and mathematics, to applications in medicine and signal processing, graphs are either a very powerful tool, or a very rich object of interest. In this thesis we address two classes of graph-related problems. First, we focus on graph-inference problems, consisting in the estimation of a graph or network from a dataset. In this part of the manuscript, we modify the existing formulations of the inference problem to incorporate prior topological information of the graph, and to jointly infer several graphs in a collaborative way. We apply these techniques to infer genetic regulation networks, brain connectivity patterns, and economyrelated networks. We also present a new problem, which consists of the estimation of mobility patterns from highly asynchronous and incomplete data. We give a first formulation of the problem with its corresponding optimization, and present results for airplane routes and New York taxis mobility patterns. The second class consists of the so-called graph matching problems. In this type of problems two graphs are given, and the objective is to find the best alignment between them. This problem is of great interest both from an algorithmic and theoretical point of view, besides the very important applications. Its interest and difficulty lie in the combinatorial nature of the problem: the cost of seeking among all the possible permutations grows exponentially with the number of nodes, and hence becomes intractable even for small graphs. First, we focus on the algorithmic aspect of the graph matching problem. We present two methods based on relaxations of the discrete optimization problem. The first one is inspired in ideas from the sparse modeling community, and the second one is based on a theorem presented in this manuscript. The importance of these methods is illustrated with several applications. Finally, we address some theoretical aspects about graph matching and other related problems. The main question tackled in the last chapter is the following: when do the graph matching problem and its convex relaxation have the same solution? A probabilistic approach is first given, showing that, asymptotically, the most common convex relaxation fails, while a non-convex relaxation succeeds with probability one if the graphs to be matched are correlated enough, showing a phase-transition type of behavior. On the other hand, a deterministic approach is presented, stating conditions on the eigenvectors and eigenvalues of the adjacency matrix for guaranteeing the correctness of the convex relaxation solution. Other results and conjectures relating the spectrum and symmetry of a graph are presented as well.En prácticamente todos los campos hay problemas relacionados con grafos o redes. Desde los ejemplos más naturales en física y matem ática, hasta aplicaciones en medicina y procesamiento de señales, los grafos son una herramienta muy poderosa, o un objeto de estudio muy rico e interesante. En esta tesis atacamos dos clases de problemas relacionados con grafos. Primero, nos enfocamos en problemas de inferencia de grafos, que consisten en estimar un grafo o red a partir de cierto conjunto de datos. En esta parte del manuscrito, modificamos las formulaciones existentes de inferencia de grafos para incorporar información topológica previamente conocida sobre el grafo, y para inferir de manera conjunta varios grafos, en un modo colaborativo. Aplicamos estas técnicas para inferir redes de regulaci ón genética, patrones de conectividad cerebral, y redes relacionadas con el mercado accionario. También presentamos un nuevo problema, que consiste en la estimación de patrones de movimiento a partir de un conjunto de datos incompleto, y altamente asíncrono. Mostramos primero una formulación del problema con su correspondiente optimización, y presentamos resultados para rutas de aviones en Estados Unidos, y patrones de movilidad de taxis en New York. La segunda clase consiste en los llamados graph matching problems (problemas de apareamiento de grafos). En este tipo de problemas, dos grafos son dados, y el objetivo es encontrar el mejor alineamiento entre ellos. Este problema es de gran interés tanto desde un punto de vista algorítmico como teórico, además de las importantes aplicaciones que tiene. El interés y la dificultad de este problema tienen raíz en la naturaleza combinatoriadel mismo: el costo de buscar entre todas las permutaciones posibles crece exponencialmente con el número de nodos, y por lo tanto se vuelve rápidamente intratable, incluso para grafos chicos. Primero, nos enfocamos en el aspecto algorítmico del problema de graph match- ing. Presentamos dos métodos basados en relajaciones del problema de optimización discreta. El primero de ellos está inspirado en ideas de la comunidad de sparse modeling, y el segundo est a basado en un teorema presentado en este manuscritp. La importancia de estos m etodos es ilustrada con varias aplicaciones a lo largo del capítulo. Finalmente, atacamos algunos aspectos teóricos sobre graph matching y otros problemas relacionados. La pregunta principal que se encara en el último capítulo es la siguiente: >cuáando el problema de graph matching y su relajación convexa tienen la misma solucióon? Primero damos un enfoque probabilístico mostrando que, asintoticamente, la relajación convexa más común falla, mientras que una relajación no convexa es capaz de resolver el problema con probabilidad uno, siempre y cuando los grafos originales estén lo sufi cientemente correlacionados, mostrando un comportamiento del estilo de transicióon de fases. Por otro lado, un enfoque determinístico es también presentado, estableciendo condiciones sobre los valores y vectores propios de las matrices de adjacencia de los grafos, que garantizan que el problema de graph matching y su relajacióon convexa tienen la misma solución. Otros resultados y conjeturas relacionando el espectro y la simetría de un grafo son presentados también en este capítulo

    On the power of message passing for learning on graph-structured data

    This thesis proposes novel approaches for machine learning on irregularly structured input data such as graphs, point clouds and manifolds. Specifically, we are breaking up with the regularity restriction of conventional deep learning techniques, and propose solutions in designing, implementing and scaling up deep end-to-end representation learning on graph-structured data, known as Graph Neural Networks (GNNs). GNNs capture local graph structure and feature information by following a neural message passing scheme, in which node representations are recursively updated in a trainable and purely local fashion. In this thesis, we demonstrate the generality of message passing through a unified framework suitable for a wide range of operators and learning tasks. Specifically, we analyze the limitations and inherent weaknesses of GNNs and propose efficient solutions to overcome them, both theoretically and in practice, e.g., by conditioning messages via continuous B-spline kernels, by utilizing hierarchical message passing, or by leveraging positional encodings. In addition, we ensure that our proposed methods scale naturally to large input domains. In particular, we propose novel methods to fully eliminate the exponentially increasing dependency of nodes over layers inherent to message passing GNNs. Lastly, we introduce PyTorch Geometric, a deep learning library for implementing and working with graph-based neural network building blocks, built upon PyTorch

    Module Identification for Biological Networks

    Advances in high-throughput techniques have enabled researchers to produce large-scale data on molecular interactions. Systematic analysis of these large-scale interactome datasets based on their graph representations has the potential to yield a better understanding of the functional organization of the corresponding biological systems. One way to chart out the underlying cellular functional organization is to identify functional modules in these biological networks. However, there are several challenges of module identification for biological networks. First, different from social and computer networks, molecules work together with different interaction patterns; groups of molecules working together may have different sizes. Second, the degrees of nodes in biological networks obey the power-law distribution, which indicates that there exist many nodes with very low degrees and few nodes with high degrees. Third, molecular interaction data contain a large number of false positives and false negatives. In this dissertation, we propose computational algorithms to overcome those challenges. To identify functional modules based on interaction patterns, we develop efficient algorithms based on the concept of block modeling. We propose a subgradient Frank-Wolfe algorithm with path generation method to identify functional modules and recognize the functional organization of biological networks. Additionally, inspired by random walk on networks, we propose a novel two-hop random walk strategy to detect fine-size functional modules based on interaction patterns. To overcome the degree heterogeneity problem, we propose an algorithm to identify functional modules with the topological structure that is well separated from the rest of the network as well as densely connected. In order to minimize the impact of the existence of noisy interactions in biological networks, we propose methods to detect conserved functional modules for multiple biological networks by integrating the topological and orthology information across different biological networks. For every algorithm we developed, we compare each of them with the state-of-the-art algorithms on several biological networks. The comparison results on the known gold standard biological function annotations show that our methods can enhance the accuracy of predicting protein complexes and protein functions