109 research outputs found

    Largest Weight Common Subtree Embeddings with Distance Penalties

    Get PDF
    The largest common embeddable subtree problem asks for the largest possible tree embeddable into two input trees and generalizes the classical maximum common subtree problem. Several variants of the problem in labeled and unlabeled rooted trees have been studied, e.g., for the comparison of evolutionary trees. We consider a generalization, where the sought embedding is maximal with regard to a weight function on pairs of labels. We support rooted and unrooted trees with vertex and edge labels as well as distance penalties for skipping vertices. This variant is important for many applications such as the comparison of chemical structures and evolutionary trees. Our algorithm computes the solution from a series of bipartite matching instances, which are solved efficiently by exploiting their structural relation and imbalance. Our analysis shows that our approach improves or matches the running time of the formally best algorithms for several problem variants. Specifically, we obtain a running time of O(|T| |T\u27|Delta) for two rooted or unrooted trees T and T\u27, where Delta=min{Delta(T),Delta(T\u27)} with Delta(X) the maximum degree of X. If the weights are integral and at most C, we obtain a running time of O(|T| |T\u27|sqrt Delta log (C min{|T|,|T\u27|})) for rooted trees

    Tree comparison: enumeration and application to cheminformatics

    Get PDF
    Graphs are a well-known data structure used in many application domains that rely on relationships between individual entities. Examples are social networks, where the users may be in friendship with each other, road networks, where one-way or bidirectional roads connect crossings, and work package assignments, where workers are assigned to tasks. In chem- and bioinformatics, molecules are often represented as molecular graphs, where vertices represent atoms, and bonds between them are represented by edges connecting the vertices. Since there is an ever-increasing amount of data that can be treated as graphs, fast algorithms are needed to compare such graphs. A well-researched concept to compare two graphs is the maximum common subgraph. On the one hand, this allows finding substructures that are common to both input graphs. On the other hand, we can derive a similarity score from the maximum common subgraph. A practical application is rational drug design which involves molecular similarity searches. In this thesis, we study the maximum common subgraph problem, which entails finding a largest graph, which is isomorphic to subgraphs of two input graphs. We focus on restrictions that allow polynomial-time algorithms with a low exponent. An example is the maximum common subtree of two input trees. We succeed in improving the previously best-known time bound. Additionally, we provide a lower time bound under certain assumptions. We study a generalization of the maximum common subtree problem, the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. This problem is motivated by the application to cheminformatics. First, the vast majority of drugs modeled as molecular graphs is outerplanar, and second, the blocks correspond to the ring structures and the bridges to atom chains or linkers. If we allow disconnected common subgraphs, the problem becomes NP-hard even for trees as input. We propose a second generalization of the maximum common subtree problem, which allows skipping vertices in the input trees while maintaining polynomial running time. Since a maximum common subgraph is not unique in general, we investigate the problem to enumerate all maximum solutions. We do this for both the maximum common subtree problem and the block-and-bridge preserving maximum common induced subgraph problem between outerplanar graphs. An arising subproblem which we analyze is the enumeration of maximum weight matchings in bipartite graphs. We support a weight function between the vertices and edges for all proposed common subgraph methods in this thesis. Thus the objective is to compute a common subgraph of maximum weight. The weights may be integral or real-valued, including negative values. A special case of using such a weight function is computing common subgraph isomorphisms between labeled graphs, where labels between mapped vertices and edges must be equal. An experimental study evaluates the practical running times and the usefulness of our block-and-bridge preserving maximum common induced subgraph algorithm against state of the art algorithms

    Metabolic Network Alignments and their Applications

    Get PDF
    The accumulation of high-throughput genomic and proteomic data allows for the reconstruction of the increasingly large and complex metabolic networks. In order to analyze the accumulated data and reconstructed networks, it is critical to identify network patterns and evolutionary relations between metabolic networks. But even finding similar networks becomes computationally challenging. The dissertation addresses these challenges with discrete optimization and the corresponding algorithmic techniques. Based on the property of the gene duplication and function sharing in biological network,we have formulated the network alignment problem which asks the optimal vertex-to-vertex mapping allowing path contraction, vertex deletion, and vertex insertions. We have proposed the first polynomial time algorithm for aligning an acyclic metabolic pattern pathway with an arbitrary metabolic network. We also have proposed a polynomial-time algorithm for patterns with small treewidth and implemented it for series-parallel patterns which are commonly found among metabolic networks. We have developed the metabolic network alignment tool for free public use. We have performed pairwise mapping of all pathways among five organisms and found a set of statistically significant pathway similarities. We also have applied the network alignment to identifying inconsistency, inferring missing enzymes, and finding potential candidates

    Online Duet between Metric Embeddings and Minimum-Weight Perfect Matchings

    Full text link
    Low-distortional metric embeddings are a crucial component in the modern algorithmic toolkit. In an online metric embedding, points arrive sequentially and the goal is to embed them into a simple space irrevocably, while minimizing the distortion. Our first result is a deterministic online embedding of a general metric into Euclidean space with distortion O(logn)min{logΦ,n}O(\log n)\cdot\min\{\sqrt{\log\Phi},\sqrt{n}\} (or, O(d)min{logΦ,n}O(d)\cdot\min\{\sqrt{\log\Phi},\sqrt{n}\} if the metric has doubling dimension dd), solving a conjecture by Newman and Rabinovich (2020), and quadratically improving the dependence on the aspect ratio Φ\Phi from Indyk et al.\ (2010). Our second result is a stochastic embedding of a metric space into trees with expected distortion O(dlogΦ)O(d\cdot \log\Phi), generalizing previous results (Indyk et al.\ (2010), Bartal et al.\ (2020)). Next, we study the \emph{online minimum-weight perfect matching} problem, where a sequence of 2n2n metric points arrive in pairs, and one has to maintain a perfect matching at all times. We allow recourse (as otherwise the order of arrival determines the matching). The goal is to return a perfect matching that approximates the \emph{minimum-weight} perfect matching at all times, while minimizing the recourse. Our third result is a randomized algorithm with competitive ratio O(dlogΦ)O(d\cdot \log \Phi) and recourse O(logΦ)O(\log \Phi) against an oblivious adversary, this result is obtained via our new stochastic online embedding. Our fourth result is a deterministic algorithm against an adaptive adversary, using O(log2n)O(\log^2 n) recourse, that maintains a matching of weight at most O(logn)O(\log n) times the weight of the MST, i.e., a matching of lightness O(logn)O(\log n). We complement our upper bounds with a strategy for an oblivious adversary that, with recourse rr, establishes a lower bound of Ω(lognrlogr)\Omega(\frac{\log n}{r \log r}) for both competitive ratio and lightness.Comment: 53 pages, 8 figures, to be presented at the ACM-SIAM Symposium on Discrete Algorithms (SODA24

    Discovering dynamics and parameters of nonlinear oscillatory and chaotic systems from partial observations

    Full text link
    Despite rapid progress in live-imaging techniques, many complex biophysical and biochemical systems remain only partially observable, thus posing the challenge to identify valid theoretical models and estimate their parameters from an incomplete set of experimentally accessible time series. Here, we combine sensitivity methods and VoteFair popularity ranking to construct an automated hidden dynamics inference framework that can discover predictive nonlinear dynamical models for both observable and latent variables from noise-corrupted incomplete data in oscillatory and chaotic systems. After validating the framework for prototypical FitzHugh-Nagumo oscillations, we demonstrate its applicability to experimental data from squid neuron activity measurements and Belousov-Zhabotinsky (BZ) reactions, as well as to the Lorenz system in the chaotic regime.Comment: 37 pages, 18 figure

    Fine-grained Complexity Meets IP = PSPACE

    Full text link
    In this paper we study the fine-grained complexity of finding exact and approximate solutions to problems in P. Our main contribution is showing reductions from exact to approximate solution for a host of such problems. As one (notable) example, we show that the Closest-LCS-Pair problem (Given two sets of strings AA and BB, compute exactly the maximum LCS(a,b)\textsf{LCS}(a, b) with (a,b)A×B(a, b) \in A \times B) is equivalent to its approximation version (under near-linear time reductions, and with a constant approximation factor). More generally, we identify a class of problems, which we call BP-Pair-Class, comprising both exact and approximate solutions, and show that they are all equivalent under near-linear time reductions. Exploring this class and its properties, we also show: \bullet Under the NC-SETH assumption (a significantly more relaxed assumption than SETH), solving any of the problems in this class requires essentially quadratic time. \bullet Modest improvements on the running time of known algorithms (shaving log factors) would imply that NEXP is not in non-uniform NC1\textsf{NC}^1. \bullet Finally, we leverage our techniques to show new barriers for deterministic approximation algorithms for LCS. At the heart of these new results is a deep connection between interactive proof systems for bounded-space computations and the fine-grained complexity of exact and approximate solutions to problems in P. In particular, our results build on the proof techniques from the classical IP = PSPACE result

    A Comprehensive Survey on Deep Graph Representation Learning

    Full text link
    Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future

    Dependency Encoding for Relation Extraction

    Get PDF
    The surge in information in the form of textual data demands automated systems to extract structured information from unstructured data. Relation extraction plays a key role in the process, with the aim of extracting semantic relations between entities in a text. Since dependency parse trees are capable of capturing the grammatical structure of sentences, this thesis experiments with different encodings of the dependency parse tree to distinguish different semantic relationships. Experiments are conducted on three different data sets that vary in domain and complexity and experimented with varying encoding schemas that can be grouped into two. The first group focuses on encoding the structure of the dependency parse tree with a Deep Graph Convolution Neural Network (DGCNN). The second group focuses on encoding the linguistic features obtained from the dependency parse tree with classical machine learning models such as Random Forest, Support Vector Machine, and Feed-Forward Network, and deep models such as BERT and Transformer encoder stack. The objective of this thesis is not to achieve state-of-the-art (SOTA) performance, rather to evaluate how dependency parse tree based linguistic features perform on different encoding schemas, including deep transformer-based models, on the relation extraction task. The results of the experiments show that these features on certain data sets being less computationally demanding are competitive for complex language models such as BERT, and incorporating them externally to BERT improves the performance rather than confounding
    corecore