44 research outputs found
Recommended from our members
GRAPH REPRESENTATION LEARNING WITH BOX EMBEDDINGS
Graphs are ubiquitous data structures, present in many machine-learning tasks, such as link prediction of products and node classification of scientific papers. As gradient descent drives the training of most modern machine learning architectures, the ability to encode graph-structured data using a differentiable representation is essential to make use of this data. Most approaches encode graph structure in Euclidean space, however, it is non-trivial to model directed edges. The naive solution is to represent each node using a separate source and target vector, however, this can decouple the representation, making it harder for the model to capture information within longer paths in the graph.
In this dissertation, we propose to model graphs by representing each node as a \textit{box} (a Cartesian product of intervals) where directed edges are captured by the relative containment of one box in another. Theoretical proof shows that our proposed box embeddings have the expressiveness to represent any \emph{directed acyclic graph}. We also perform rigorous empirical evaluations of vector, hyperbolic, and region-based geometric representations on several families of synthetic and real-world directed graphs. Extensive experimental results suggest that the box containment can allow for transitive relationships to be modeled easily. We further propose t-Box, a variant of box embeddings that learns the temperature together during training. t-Box uses a learned smoothing parameter to achieve better representational capacity than vector models in low dimensions, while also avoiding performance saturation common to other geometric models in high dimensions.
Though promising, modeling directed graphs that both contain cycles and some element of transitivity, two properties common in real-world settings, is challenging. Box embeddings, which can be thought of as representing the graph as an intersection over some learned super-graphs, have a natural inductive bias toward modeling transitivity, but (as we prove) cannot model cycles. To address this issue, we propose binary code box embeddings, where a learned binary code selects a subset of graphs for intersection. We explore several variants, including global binary codes (amounting to a union over intersections) and per-vertex binary codes (allowing greater flexibility) as well as methods of regularization. Theoretical and empirical results show that the proposed models not only preserve a useful inductive bias of transitivity but also have sufficient representational capacity to model arbitrary graphs, including graphs with cycles.
Lastly, we discuss the use case where box embeddings are not free parameters but are produced by functions. In particular, we explore whether neural networks can map node features into the box space. This is critical in many real-world scenarios. On the one hand, graphs are sparse and the majority of vertices only have few connections or are completely isolated. On the other hand, there may exist rich node features such as attributes and descriptions, that could be useful for prediction tasks. The experimental analysis points out both the effectiveness and insufficiency of multi-layer perceptron-based encoders under different circumstances
Algorithmen fĂŒr Topologiebewusstsein in Sensornetzen
This work deals with algorithmic and geometric challenges in wireless sensor networks (WSNs). Classical algorithm theory, with a single processor executing one sequential program while having access to the complete data of the problem at hand, does not suit the needs of WSNs. Instead, we need distributed protocols where nodes collaboratively solve problems that are too complex for a single node. First we analyze a location problem, where the nodes obtain a sense of the network topology and their position in it. Computing coordinates in a global coordinate system is NP-hard in almost all relevant variants. So we present a completely new approach instead. The network builds clusters and constructs an abstract graph that closely reflects the topology of the network region. The resulting topology awareness suits the needs of some applications much better than the coordinate-based approach. In the second part, we present a novel flow problem, which adds battery constraints to dynamic network flows. Given a time horizon, we seek a flow from source to sink that maximizes the total amount of delivered data. As there is no prior work on this problem, we also analyze it in a centralized setting. We prove complexity results for several variants and present approximation schemes. The third part introduces the WSN simulator Shawn. By letting the user choose among different geometric communication models and data structures for the resulting graph, Shawn can adapt to many different setups, including mobile ones. Due to its design, Shawn is much faster than comparable simulation environments.Die vorliegende Arbeit beschĂ€ftigt sich mit algorithmischen und geometrischen Fragestellungen in Sensornetzwerken. Im Gegensatz zur klassischen Algorithmik, bei der ein einzelner Prozessor sequenziell Anweisungen abarbeitet und vollen Zugriff auf die Probleminstanz hat, werden hier verteilte Protokolle benötigt, bei denen die Knoten gemeinsam eine Aufgabe bewĂ€ltigen, zu der sie allein nicht in der Lage wĂ€ren. Zuerst untersuchen wir das grundlegende Problem, wie Sensorknoten ein Bewusstsein fĂŒr ihre Position erlangen können. Motiviert daraus, dass das Problem, Koordinaten fĂŒr ein globales Koordinatensystem zu bestimmen, in fast allen Varianten NP-schwer ist, wird ein vollkommen neuer Ansatz skizziert, bei dem das Netzwerk selbstĂ€ndig geometrische Cluster bildet und einen abstrakten Graphen konstruiert, der die Topologie des zugrunde liegenden Gebiets sehr genau widerspiegelt. Das sich daraus ergebende Positionsbewusstsein ist fĂŒr einige Anwendungen dem klassischen euklidischen Ansatz deutlich ĂŒberlegen. Der zweite Teil widmet sich einem Flussproblems fĂŒr Sensornetzwerke, dass klassische dynamische FlĂŒsse um BatteriebeschrĂ€nkungen erweitert. Gesucht ist ein Fluss, der fĂŒr gegebenen Zeithorizont die Datenmenge maximiert, die von einer Quelle zur Senke geschickt werden kann. Dieses Problem wird auch im zentralisierten Modell untersucht, da keine Vorarbeiten existieren. Wir beweisen KomplexitĂ€ten von Problemvarianten und entwickeln Approximationsschemata. Der dritte Teil stellt den Netzwerksimulator Shawn vor. Da der Benutzer zwischen verschiedenen geometrischen Kommunikationsmodellen wĂ€hlen kann und das Speichermodell fĂŒr den daraus resultierenden Graphen an den verfĂŒgbaren Speicher sowie an Simulationsparameter wie eventuell mögliche MobilitĂ€t der Knoten anpassen kann, ist Shawn hochflexibel und gleichzeitig deutlich schneller als vergleichbare Simulationsumgebungen
Book of Abstracts of the Sixth SIAM Workshop on Combinatorial Scientific Computing
Book of Abstracts of CSC14 edited by Bora UçarInternational audienceThe Sixth SIAM Workshop on Combinatorial Scientific Computing, CSC14, was organized at the Ecole Normale Supérieure de Lyon, France on 21st to 23rd July, 2014. This two and a half day event marked the sixth in a series that started ten years ago in San Francisco, USA. The CSC14 Workshop's focus was on combinatorial mathematics and algorithms in high performance computing, broadly interpreted. The workshop featured three invited talks, 27 contributed talks and eight poster presentations. All three invited talks were focused on two interesting fields of research specifically: randomized algorithms for numerical linear algebra and network analysis. The contributed talks and the posters targeted modeling, analysis, bisection, clustering, and partitioning of graphs, applied in the context of networks, sparse matrix factorizations, iterative solvers, fast multi-pole methods, automatic differentiation, high-performance computing, and linear programming. The workshop was held at the premises of the LIP laboratory of ENS Lyon and was generously supported by the LABEX MILYON (ANR-10-LABX-0070, Université de Lyon, within the program ''Investissements d'Avenir'' ANR-11-IDEX-0007 operated by the French National Research Agency), and by SIAM
Computational methods in protein structure comparison and analysis of protein interaction networks
Proteins are versatile biological macromolecules that perform numerous functions in a living organism. For example, proteins catalyze chemical reactions, store and transport various small molecules, and are involved in transmitting nerve signals. As the number of completely sequenced genomes grows, we are faced with the important but daunting task of assigning function to proteins encoded by newly sequenced genomes. In this thesis we contribute to this effort by developing computational methods for which one use is to facilitate protein function assignment.
Functional annotation of a newly discovered protein can often be transferred from that of evolutionarily related proteins of known function. However, distantly related proteins can still only be detected by the most accurate protein structure alignment methods. As these methods are computationally expensive, they are combined with less accurate but fast methods to allow large-scale comparative studies. In this thesis we propose a general framework to define a family of protein structure comparison methods that reduce protein structure comparison to distance computation between high-dimensional vectors and therefore are extremely fast.
Interactions among proteins can be detected through the use of several mature experimental techniques. These interactions are routinely represented by a graph, called a protein interaction network, with nodes representing the proteins and edges representing the interactions between the proteins. In this thesis we present two computational studies that explore the connection between the topology of protein interaction networks and protein biological function.
Unfortunately, protein interaction networks do not explicitly capture an important aspect of protein interactions, their dynamic nature. In this thesis, we present an automatic method that relies on graph theoretic tools for chordal and cograph graph families to extract dynamic properties of protein interactions from the network topology.
An intriguing question in the analysis of biological networks is whether biological characteristics of a protein, such as essentiality, can be explained by its placement in the network. In this thesis we analyze protein interaction networks for Saccharomyces cerevisiae to identify the main topological determinant of essentiality and to provide a biological explanation for the connection between the network topology and essentiality
Graph Neural Networks for Link Prediction with Subgraph Sketching
Many Graph Neural Networks (GNNs) perform poorly compared to simple
heuristics on Link Prediction (LP) tasks. This is due to limitations in
expressive power such as the inability to count triangles (the backbone of most
LP heuristics) and because they can not distinguish automorphic nodes (those
having identical structural roles). Both expressiveness issues can be
alleviated by learning link (rather than node) representations and
incorporating structural features such as triangle counts. Since explicit link
representations are often prohibitively expensive, recent works resorted to
subgraph-based methods, which have achieved state-of-the-art performance for
LP, but suffer from poor efficiency due to high levels of redundancy between
subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link
prediction. Based on our analysis, we propose a novel full-graph GNN called
ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as
messages to approximate the key components of SGNNs without explicit subgraph
construction. ELPH is provably more expressive than Message Passing GNNs
(MPNNs). It outperforms existing SGNN models on many standard LP benchmarks
while being orders of magnitude faster. However, it shares the common GNN
limitation that it is only efficient when the dataset fits in GPU memory.
Accordingly, we develop a highly scalable model, called BUDDY, which uses
feature precomputation to circumvent this limitation without sacrificing
predictive performance. Our experiments show that BUDDY also outperforms SGNNs
on standard LP benchmarks while being highly scalable and faster than ELPH.Comment: 29 pages, 19 figures, 6 appendice
Proceedings of the 8th Cologne-Twente Workshop on Graphs and Combinatorial Optimization
International audienceThe Cologne-Twente Workshop (CTW) on Graphs and Combinatorial Optimization started off as a series of workshops organized bi-annually by either Köln University or Twente University. As its importance grew over time, it re-centered its geographical focus by including northern Italy (CTW04 in Menaggio, on the lake Como and CTW08 in Gargnano, on the Garda lake). This year, CTW (in its eighth edition) will be staged in France for the first time: more precisely in the heart of Paris, at the Conservatoire National dâArts et MĂ©tiers (CNAM), between 2nd and 4th June 2009, by a mixed organizing committee with members from LIX, Ecole Polytechnique and CEDRIC, CNAM
An Integer Programming approach to Bayesian Network Structure Learning
We study the problem of learning a Bayesian Network structure from data using an Integer Programming approach. We study the existing approaches, an in particular some recent works that formulate the problem as an Integer Programming model. By discussing some weaknesses of the existing approaches, we propose an alternative solution, based on a statistical sparsification of the search space. Results show how our approach can lead to promising results, especially for large network