1,102 research outputs found

    Stochastic Block Coordinate Frank-Wolfe Algorithm for Large-Scale Biological Network Alignment

    Get PDF
    With increasingly "big" data available in biomedical research, deriving accurate and reproducible biology knowledge from such big data imposes enormous computational challenges. In this paper, motivated by recently developed stochastic block coordinate algorithms, we propose a highly scalable randomized block coordinate Frank-Wolfe algorithm for convex optimization with general compact convex constraints, which has diverse applications in analyzing biomedical data for better understanding cellular and disease mechanisms. We focus on implementing the derived stochastic block coordinate algorithm to align protein-protein interaction networks for identifying conserved functional pathways based on the IsoRank framework. Our derived stochastic block coordinate Frank-Wolfe (SBCFW) algorithm has the convergence guarantee and naturally leads to the decreased computational cost (time and space) for each iteration. Our experiments for querying conserved functional protein complexes in yeast networks confirm the effectiveness of this technique for analyzing large-scale biological networks

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Accelerating Frank-Wolfe Algorithm using Low-Dimensional and Adaptive Data Structures

    Full text link
    In this paper, we study the problem of speeding up a type of optimization algorithms called Frank-Wolfe, a conditional gradient method. We develop and employ two novel inner product search data structures, improving the prior fastest algorithm in [Shrivastava, Song and Xu, NeurIPS 2021]. * The first data structure uses low-dimensional random projection to reduce the problem to a lower dimension, then uses efficient inner product data structure. It has preprocessing time O~(ndω1+dn1+o(1))\tilde O(nd^{\omega-1}+dn^{1+o(1)}) and per iteration cost O~(d+nρ)\tilde O(d+n^\rho) for small constant ρ\rho. * The second data structure leverages the recent development in adaptive inner product search data structure that can output estimations to all inner products. It has preprocessing time O~(nd)\tilde O(nd) and per iteration cost O~(d+n)\tilde O(d+n). The first algorithm improves the state-of-the-art (with preprocessing time O~(d2n1+o(1))\tilde O(d^2n^{1+o(1)}) and per iteration cost O~(dnρ)\tilde O(dn^\rho)) in all cases, while the second one provides an even faster preprocessing time and is suitable when the number of iterations is small

    Sparsity-based algorithms for inverse problems

    Get PDF
    Inverse problems are problems where we want to estimate the values of certain parameters of a system given observations of the system. Such problems occur in several areas of science and engineering. Inverse problems are often ill-posed, which means that the observations of the system do not uniquely define the parameters we seek to estimate, or that the solution is highly sensitive to small changes in the observation. In order to solve such problems, therefore, we need to make use of additional knowledge about the system at hand. One such prior information is given by the notion of sparsity. Sparsity refers to the knowledge that the solution to the inverse problem can be expressed as a combination of a few terms. The sparsity of a solution can be controlled explicitly or implicitly. An explicit way to induce sparsity is to minimize the number of non-zero terms in the solution. Implicit use of sparsity can be made, for e.g., by making adjustments to the algorithm used to arrive at the solution.In this thesis we studied various inverse problems that arise in different application areas, such as tomographic imaging and equation learning for biology, and showed how ideas of sparsity can be used in each case to design effective algorithms to solve such problems.Financial support was provided by the European Union's Horizon 2020 Research and Innovation programme under the Marie Sklodowska-Curie grant agreement no.~765604Number theory, Algebra and Geometr

    Pattern Recognition on Random Graphs

    Get PDF
    The field of pattern recognition developed significantly in the 1960s, and the field of random graph inference has enjoyed much recent progress in both theory and application. This dissertation focuses on pattern recognition in the context of a particular family of random graphs, namely the stochastic blockmodels, from the two main perspectives of single graph inference and joint graph inference. Single graph inference is the performance of statistical inference on one single observed graph. Given a single graph realized from a stochastic blockmodel, we here consider the specific exploitation tasks of vertex classification, clustering, and nomination. Given an observed graph, vertex classification is the identification of the block labels of test vertices after learning from the training vertices. We propose a robust vertex classifier, which utilizes a representation of a test vertex as a sparse combination of the training vertices. Our proposed classifier is demonstrated to be robust against data contamination, and has superior performance over classical spectral-embedding classifiers in simulation and real data experiments. Vertex clustering groups vertices based on their similarities. We present a model-based clustering algorithm for graphs drawn from a stochastic blockmodel, and illustrate its usefulness on a case study in online advertising. We demonstrate the practical value of our vertex clustering method for efficiently delivering internet advertisements. Under the stochastic blockmodel framework, suppose one block is of particular interest. The task of vertex nomination is to create a nomination list so that vertices from the group of interest are concentrated abundantly near the top of the list. We present several vertex nomination schemes, and propose a vertex nomination scheme that is scalable for large graphs. We demonstrate the effectiveness of our methodology on simulation and real datasets. Next, we consider joint graph inference, which involves the joint space of multiple graphs; in this dissertation, we specifically consider joint graph inference on two graphs. Given two graphs, we consider the tasks of seeded graph matching for large graphs and joint vertex classification. Graph matching is the task of aligning two graphs so as to minimize the number of edge disagreements between them. We propose a scalable graph matching algorithm, which uses a divide-and-conquer approach to scale the state-of-the-art seeded graph matching algorithm to big graph data. We prove theoretical performance guarantees, and demonstrate desired properties such as scalability, robustness, accuracy and runtime in both simulated data and human brain connectome data. Within the joint graph inference framework, we present a case study on the paired chemical and electrical Caenorhabditis elegans neural connectomes. Motivated by the success of seeded graph matching on the paired connectomes, we propose joint vertex classification on the paired connectomes. We demonstrate that joint inference on the paired connectomes yields more accurate results than single inference on either connectome. This serves as a first step for providing a methodological and quantitative approach for understanding the coexistent significance of the chemical and electrical connectomes

    International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book

    Get PDF
    The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions. This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more

    Graph inference and graph matching problems : tehory and algorithms

    Get PDF
    Tribunal: Alex Bronstein (Tel Aviv University), Marcelo Lanzilotta (Universidad de la República), Gonzalo Mateos (University of Rochester), Gadiel Seroussi (Universidad de la República)Almost every field has some problems related with graphs or networks. From natural examples in physics and mathematics, to applications in medicine and signal processing, graphs are either a very powerful tool, or a very rich object of interest. In this thesis we address two classes of graph-related problems. First, we focus on graph-inference problems, consisting in the estimation of a graph or network from a dataset. In this part of the manuscript, we modify the existing formulations of the inference problem to incorporate prior topological information of the graph, and to jointly infer several graphs in a collaborative way. We apply these techniques to infer genetic regulation networks, brain connectivity patterns, and economyrelated networks. We also present a new problem, which consists of the estimation of mobility patterns from highly asynchronous and incomplete data. We give a first formulation of the problem with its corresponding optimization, and present results for airplane routes and New York taxis mobility patterns. The second class consists of the so-called graph matching problems. In this type of problems two graphs are given, and the objective is to find the best alignment between them. This problem is of great interest both from an algorithmic and theoretical point of view, besides the very important applications. Its interest and difficulty lie in the combinatorial nature of the problem: the cost of seeking among all the possible permutations grows exponentially with the number of nodes, and hence becomes intractable even for small graphs. First, we focus on the algorithmic aspect of the graph matching problem. We present two methods based on relaxations of the discrete optimization problem. The first one is inspired in ideas from the sparse modeling community, and the second one is based on a theorem presented in this manuscript. The importance of these methods is illustrated with several applications. Finally, we address some theoretical aspects about graph matching and other related problems. The main question tackled in the last chapter is the following: when do the graph matching problem and its convex relaxation have the same solution? A probabilistic approach is first given, showing that, asymptotically, the most common convex relaxation fails, while a non-convex relaxation succeeds with probability one if the graphs to be matched are correlated enough, showing a phase-transition type of behavior. On the other hand, a deterministic approach is presented, stating conditions on the eigenvectors and eigenvalues of the adjacency matrix for guaranteeing the correctness of the convex relaxation solution. Other results and conjectures relating the spectrum and symmetry of a graph are presented as well.En prácticamente todos los campos hay problemas relacionados con grafos o redes. Desde los ejemplos más naturales en física y matem ática, hasta aplicaciones en medicina y procesamiento de señales, los grafos son una herramienta muy poderosa, o un objeto de estudio muy rico e interesante. En esta tesis atacamos dos clases de problemas relacionados con grafos. Primero, nos enfocamos en problemas de inferencia de grafos, que consisten en estimar un grafo o red a partir de cierto conjunto de datos. En esta parte del manuscrito, modificamos las formulaciones existentes de inferencia de grafos para incorporar información topológica previamente conocida sobre el grafo, y para inferir de manera conjunta varios grafos, en un modo colaborativo. Aplicamos estas técnicas para inferir redes de regulaci ón genética, patrones de conectividad cerebral, y redes relacionadas con el mercado accionario. También presentamos un nuevo problema, que consiste en la estimación de patrones de movimiento a partir de un conjunto de datos incompleto, y altamente asíncrono. Mostramos primero una formulación del problema con su correspondiente optimización, y presentamos resultados para rutas de aviones en Estados Unidos, y patrones de movilidad de taxis en New York. La segunda clase consiste en los llamados graph matching problems (problemas de apareamiento de grafos). En este tipo de problemas, dos grafos son dados, y el objetivo es encontrar el mejor alineamiento entre ellos. Este problema es de gran interés tanto desde un punto de vista algorítmico como teórico, además de las importantes aplicaciones que tiene. El interés y la dificultad de este problema tienen raíz en la naturaleza combinatoriadel mismo: el costo de buscar entre todas las permutaciones posibles crece exponencialmente con el número de nodos, y por lo tanto se vuelve rápidamente intratable, incluso para grafos chicos. Primero, nos enfocamos en el aspecto algorítmico del problema de graph match- ing. Presentamos dos métodos basados en relajaciones del problema de optimización discreta. El primero de ellos está inspirado en ideas de la comunidad de sparse modeling, y el segundo est a basado en un teorema presentado en este manuscritp. La importancia de estos m etodos es ilustrada con varias aplicaciones a lo largo del capítulo. Finalmente, atacamos algunos aspectos teóricos sobre graph matching y otros problemas relacionados. La pregunta principal que se encara en el último capítulo es la siguiente: >cuáando el problema de graph matching y su relajación convexa tienen la misma solucióon? Primero damos un enfoque probabilístico mostrando que, asintoticamente, la relajación convexa más común falla, mientras que una relajación no convexa es capaz de resolver el problema con probabilidad uno, siempre y cuando los grafos originales estén lo sufi cientemente correlacionados, mostrando un comportamiento del estilo de transicióon de fases. Por otro lado, un enfoque determinístico es también presentado, estableciendo condiciones sobre los valores y vectores propios de las matrices de adjacencia de los grafos, que garantizan que el problema de graph matching y su relajacióon convexa tienen la misma solución. Otros resultados y conjeturas relacionando el espectro y la simetría de un grafo son presentados también en este capítulo
    corecore