1,457 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Mining Butterflies in Streaming Graphs
This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection.
sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Geometric Learning on Graph Structured Data
Graphs provide a ubiquitous and universal data structure that can be applied in many domains such as social networks, biology, chemistry, physics, and computer science. In this thesis we focus on two fundamental paradigms in graph learning: representation learning and similarity learning over graph-structured data. Graph representation learning aims to learn embeddings for nodes by integrating topological and feature information of a graph. Graph similarity learning brings into play with similarity functions that allow to compute similarity between pairs of graphs in a vector space. We address several challenging issues in these two paradigms, designing powerful, yet efficient and theoretical guaranteed machine learning models that can leverage rich topological structural properties of real-world graphs.
This thesis is structured into two parts. In the first part of the thesis, we will present how to develop powerful Graph Neural Networks (GNNs) for graph representation learning from three different perspectives: (1) spatial GNNs, (2) spectral GNNs, and (3) diffusion GNNs. We will discuss the model architecture, representational power, and convergence properties of these GNN models. Specifically, we first study how to develop expressive, yet efficient and simple message-passing aggregation schemes that can go beyond the Weisfeiler-Leman test (1-WL). We propose a generalized message-passing framework by incorporating graph structural properties into an aggregation scheme. Then, we introduce a new local isomorphism hierarchy on neighborhood subgraphs. We further develop a novel neural model, namely GraphSNN, and theoretically prove that this model is more expressive than the 1-WL test. After that, we study how to build an effective and efficient graph convolution model with spectral graph filters. In this study, we propose a spectral GNN model, called DFNets, which incorporates a novel spectral graph filter, namely feedback-looped filters. As a result, this model can provide better localization on neighborhood while achieving fast convergence and linear memory requirements. Finally, we study how to capture the rich topological information of a graph using graph diffusion. We propose a novel GNN architecture with dynamic PageRank, based on a learnable transition matrix. We explore two variants of this GNN architecture: forward-euler solution and invariable feature solution, and theoretically prove that our forward-euler GNN architecture is guaranteed with the convergence to a stationary distribution.
In the second part of this thesis, we will introduce a new optimal transport distance metric on graphs in a regularized learning framework for graph kernels. This optimal transport distance metric can preserve both local and global structures between graphs during the transport, in addition to preserving features and their local variations. Furthermore, we propose two strongly convex regularization terms to theoretically guarantee the convergence and numerical stability in finding an optimal assignment between graphs. One regularization term is used to regularize a Wasserstein distance between graphs in the same ground space. This helps to preserve the local clustering structure on graphs by relaxing the optimal transport problem to be a cluster-to-cluster assignment between locally connected vertices. The other regularization term is used to regularize a Gromov-Wasserstein distance between graphs across different ground spaces based on degree-entropy KL divergence. This helps to improve the matching robustness of an optimal alignment to preserve the global connectivity structure of graphs. We have evaluated our optimal transport-based graph kernel using different benchmark tasks. The experimental results show that our models considerably outperform all the state-of-the-art methods in all benchmark tasks
Planar Disjoint Paths, Treewidth, and Kernels
In the Planar Disjoint Paths problem, one is given an undirected planar graph
with a set of vertex pairs and the task is to find pairwise
vertex-disjoint paths such that the -th path connects to . We
study the problem through the lens of kernelization, aiming at efficiently
reducing the input size in terms of a parameter. We show that Planar Disjoint
Paths does not admit a polynomial kernel when parameterized by unless coNP
NP/poly, resolving an open problem by [Bodlaender, Thomass{\'e},
Yeo, ESA'09]. Moreover, we rule out the existence of a polynomial Turing kernel
unless the WK-hierarchy collapses. Our reduction carries over to the setting of
edge-disjoint paths, where the kernelization status remained open even in
general graphs.
On the positive side, we present a polynomial kernel for Planar Disjoint
Paths parameterized by , where denotes the treewidth of the input
graph. As a consequence of both our results, we rule out the possibility of a
polynomial-time (Turing) treewidth reduction to under the same
assumptions. To the best of our knowledge, this is the first hardness result of
this kind. Finally, combining our kernel with the known techniques [Adler,
Kolliopoulos, Krause, Lokshtanov, Saurabh, Thilikos, JCTB'17; Schrijver,
SICOMP'94] yields an alternative (and arguably simpler) proof that Planar
Disjoint Paths can be solved in time , matching the
result of [Lokshtanov, Misra, Pilipczuk, Saurabh, Zehavi, STOC'20].Comment: To appear at FOCS'23, 82 pages, 30 figure
Extending Orthogonal Planar Graph Drawings Is Fixed-Parameter Tractable
The task of finding an extension to a given partial drawing of a graph while adhering to constraints on the representation has been extensively studied in the literature, with well-known results providing efficient algorithms for fundamental representations such as planar and beyond-planar topological drawings. In this paper, we consider the extension problem for bend-minimal orthogonal drawings of planar graphs, which is among the most fundamental geometric graph drawing representations. While the problem was known to be NP-hard, it is natural to consider the case where only a small part of the graph is still to be drawn. Here, we establish the fixed-parameter tractability of the problem when parameterized by the size of the missing subgraph. Our algorithm is based on multiple novel ingredients which intertwine geometric and combinatorial arguments. These include the identification of a new graph representation of bend-equivalent regions for vertex placement in the plane, establishing a bound on the treewidth of this auxiliary graph, and a global point-grid that allows us to discretize the possible placement of bends and vertices into locally bounded subgrids for each of the above regions
- …