2,503 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Factor Graph Neural Networks
In recent years, we have witnessed a surge of Graph Neural Networks (GNNs),
most of which can learn powerful representations in an end-to-end fashion with
great success in many real-world applications. They have resemblance to
Probabilistic Graphical Models (PGMs), but break free from some limitations of
PGMs. By aiming to provide expressive methods for representation learning
instead of computing marginals or most likely configurations, GNNs provide
flexibility in the choice of information flowing rules while maintaining good
performance. Despite their success and inspirations, they lack efficient ways
to represent and learn higher-order relations among variables/nodes. More
expressive higher-order GNNs which operate on k-tuples of nodes need increased
computational resources in order to process higher-order tensors. We propose
Factor Graph Neural Networks (FGNNs) to effectively capture higher-order
relations for inference and learning. To do so, we first derive an efficient
approximate Sum-Product loopy belief propagation inference algorithm for
discrete higher-order PGMs. We then neuralize the novel message passing scheme
into a Factor Graph Neural Network (FGNN) module by allowing richer
representations of the message update rules; this facilitates both efficient
inference and powerful end-to-end learning. We further show that with a
suitable choice of message aggregation operators, our FGNN is also able to
represent Max-Product belief propagation, providing a single family of
architecture that can represent both Max and Sum-Product loopy belief
propagation. Our extensive experimental evaluation on synthetic as well as real
datasets demonstrates the potential of the proposed model.Comment: Accepted by JML
Subgroup discovery for structured target concepts
The main object of study in this thesis is subgroup discovery, a theoretical framework for finding subgroups in data—i.e., named sub-populations— whose behaviour with respect to a specified target concept is exceptional when compared to the rest of the dataset. This is a powerful tool that conveys crucial information to a human audience, but despite past advances has been limited to simple target concepts. In this work we propose algorithms that bring this framework to novel application domains. We introduce the concept of representative subgroups, which we use not only to ensure the fairness of a sub-population with regard to a sensitive trait, such as race or gender, but also to go beyond known trends in the data. For entities with additional relational information that can be encoded as a graph, we introduce a novel measure of robust connectedness which improves on established alternative measures of density; we then provide a method that uses this measure to discover which named sub-populations are more well-connected. Our contributions within subgroup discovery crescent with the introduction of kernelised subgroup discovery: a novel framework that enables the discovery of subgroups on i.i.d. target concepts with virtually any kind of structure. Importantly, our framework additionally provides a concrete and efficient tool that works out-of-the-box without any modification, apart from specifying the Gramian of a positive definite kernel. To use within kernelised subgroup discovery, but also on any other kind of kernel method, we additionally introduce a novel random walk graph kernel. Our kernel allows the fine tuning of the alignment between the vertices of the two compared graphs, during the count of the random walks, while we also propose meaningful structure-aware vertex labels to utilise this new capability. With these contributions we thoroughly extend the applicability of subgroup discovery and ultimately re-define it as a kernel method.Der Hauptgegenstand dieser Arbeit ist die Subgruppenentdeckung (Subgroup Discovery), ein theoretischer Rahmen für das Auffinden von Subgruppen in Daten—d. h. benannte Teilpopulationen—deren Verhalten in Bezug auf ein bestimmtes Targetkonzept im Vergleich zum Rest des Datensatzes außergewöhnlich ist. Es handelt sich hierbei um ein leistungsfähiges Instrument, das einem menschlichen Publikum wichtige Informationen vermittelt. Allerdings ist es trotz bisherigen Fortschritte auf einfache Targetkonzepte beschränkt. In dieser Arbeit schlagen wir Algorithmen vor, die diesen Rahmen auf neuartige Anwendungsbereiche übertragen. Wir führen das Konzept der repräsentativen Untergruppen ein, mit dem wir nicht nur die Fairness einer Teilpopulation in Bezug auf ein sensibles Merkmal wie Rasse oder Geschlecht sicherstellen, sondern auch über bekannte Trends in den Daten hinausgehen können. Für Entitäten mit zusätzlicher relationalen Information, die als Graph kodiert werden kann, führen wir ein neuartiges Maß für robuste Verbundenheit ein, das die etablierten alternativen Dichtemaße verbessert; anschließend stellen wir eine Methode bereit, die dieses Maß verwendet, um herauszufinden, welche benannte Teilpopulationen besser verbunden sind. Unsere Beiträge in diesem Rahmen gipfeln in der Einführung der kernelisierten Subgruppenentdeckung: ein neuartiger Rahmen, der die Entdeckung von Subgruppen für u.i.v. Targetkonzepten mit praktisch jeder Art von Struktur ermöglicht. Wichtigerweise, unser Rahmen bereitstellt zusätzlich ein konkretes und effizientes Werkzeug, das ohne jegliche Modifikation funktioniert, abgesehen von der Angabe des Gramian eines positiv definitiven Kernels. Für den Einsatz innerhalb der kernelisierten Subgruppentdeckung, aber auch für jede andere Art von Kernel-Methode, führen wir zusätzlich einen neuartigen Random-Walk-Graph-Kernel ein. Unser Kernel ermöglicht die Feinabstimmung der Ausrichtung zwischen den Eckpunkten der beiden unter-Vergleich-gestelltenen Graphen während der Zählung der Random Walks, während wir auch sinnvolle strukturbewusste Vertex-Labels vorschlagen, um diese neue Fähigkeit zu nutzen. Mit diesen Beiträgen erweitern wir die Anwendbarkeit der Subgruppentdeckung gründlich und definieren wir sie im Endeffekt als Kernel-Methode neu
Dynamic Price Incentivization for Carbon Emission Reduction using Quantum Optimization
Demand Side Response (DSR) is a strategy that enables consumers to actively
participate in managing electricity demand. It aims to alleviate strain on the
grid during high demand and promote a more balanced and efficient use of
electricity resources. We implement DSR through discount scheduling, which
involves offering discrete price incentives to consumers to adjust their
electricity consumption patterns. Since we tailor the discounts to individual
customers' consumption, the Discount Scheduling Problem (DSP) becomes a large
combinatorial optimization task. Consequently, we adopt a hybrid quantum
computing approach, using D-Wave's Leap Hybrid Cloud. We observe an indication
that Leap performs better compared to Gurobi, a classical general-purpose
optimizer, in our test setup. Furthermore, we propose a specialized
decomposition algorithm for the DSP that significantly reduces the problem
size, while maintaining an exceptional solution quality. We use a mix of
synthetic data, generated based on real-world data, and real data to benchmark
the performance of the different approaches
Structured Semidefinite Programming for Recovering Structured Preconditioners
We develop a general framework for finding approximately-optimal
preconditioners for solving linear systems. Leveraging this framework we obtain
improved runtimes for fundamental preconditioning and linear system solving
problems including the following. We give an algorithm which, given positive
definite with
nonzero entries, computes an -optimal
diagonal preconditioner in time , where is the
optimal condition number of the rescaled matrix. We give an algorithm which,
given that is either the pseudoinverse
of a graph Laplacian matrix or a constant spectral approximation of one, solves
linear systems in in time. Our diagonal
preconditioning results improve state-of-the-art runtimes of
attained by general-purpose semidefinite programming, and our solvers improve
state-of-the-art runtimes of where is the
current matrix multiplication constant. We attain our results via new
algorithms for a class of semidefinite programs (SDPs) we call
matrix-dictionary approximation SDPs, which we leverage to solve an associated
problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172
Recommended from our members
Foundations of Node Representation Learning
Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?
We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.
To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Tur\'{a}n's Theorem Through Algorithmic Lens
The fundamental theorem of Tur\'{a}n from Extremal Graph Theory determines
the exact bound on the number of edges in an -vertex graph that
does not contain a clique of size . We establish an interesting link
between Extremal Graph Theory and Algorithms by providing a simple compression
algorithm that in linear time reduces the problem of finding a clique of size
in an -vertex graph with edges, where , to the problem of finding a maximum clique in a graph on at most
vertices. This also gives us an algorithm deciding in time whether has a clique of size . As a byproduct of the new
compression algorithm, we give an algorithm that in time decides whether a graph contains an independent set of size at least
. Here is the average vertex degree of the graph . The
multivariate complexity analysis based on ETH indicates that the asymptotical
dependence on several parameters in the running times of our algorithms is
tight
- …