126 research outputs found

    Heuristic Algorithms for the Maximum Colorful Subtree Problem

    Get PDF
    In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured. Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic. We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold

    Towards de novo identification of metabolites by analyzing tandem mass spectra

    Get PDF
    Mass spectrometry is among the most widely used technologies in proteomics and metabolomics. For metabolites, de novo interpretation of spectra is even more important than for protein data, because metabolite spectra databases cover only a small fraction of naturally occurring metabolites. In this work, we analyze a method for fully automated de novo identification of metabolites from tandem mass spectra. Mass spectrometry data is usually assumed to be insufficient for identification of molecular structures, so we want to estimate the molecular formula of the unknown metabolite, a crucial step for its identification. This is achieved by calculating the possible formulas of the fragment peaks and then reconstructing the most likely fragmentation tree from this information. We present tests on real mass spectra showing that our algorithms solve the reconstruction problem suitably fast and provide excellent results: For all 32 test compounds the correct solution was among the top five suggestions, for 26 compounds the first suggestion of the exact algorithm was correct

    Novel methods for the analysis of small molecule fragmentation mass spectra

    Get PDF
    The identification of small molecules, such as metabolites, in a high throughput manner plays an important in many research areas. Mass spectrometry (MS) is one of the predominant analysis technologies and is much more sensitive than nuclear magnetic resonance spectroscopy. Fragmentation of the molecules is used to obtain information beyond its mass. Gas chromatography-MS is one of the oldest and most widespread techniques for the analysis of small molecules. Commonly, the molecule is fragmented using electron ionization (EI). Using this technique, the molecular ion peak is often barely visible in the mass spectrum or even absent. We present a method to calculate fragmentation trees from high mass accuracy EI spectra, which annotate the peaks in the mass spectrum with molecular formulas of fragments and explain relevant fragmentation pathways. Fragmentation trees enable the identification of the molecular ion and its molecular formula if the molecular ion is present in the spectrum. The method works even if the molecular ion is of very low abundance. MS experts confirm that the calculated trees correspond very well to known fragmentation mechanisms.Using pairwise local alignments of fragmentation trees, structural and chemical similarities to already-known molecules can be determined. In order to compare a fragmentation tree of an unknown metabolite to a huge database of fragmentation trees, fast algorithms for solving the tree alignment problem are required. Unfortunately the alignment of unordered trees, such as fragmentation trees, is NP-hard. We present three exact algorithms for the problem. Evaluation of our methods showed that thousands of alignments can be computed in a matter of minutes. Both the computation and the comparison of fragmentation trees are rule-free approaches that require no chemical knowledge about the unknown molecule and thus will be very helpful in the automated analysis of metabolites that are not included in common libraries

    The Graph Motif problem parameterized by the structure of the input graph

    Full text link
    The Graph Motif problem was introduced in 2006 in the context of biological networks. It consists of deciding whether or not a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Graph Motif has been mostly analyzed from the standpoint of parameterized complexity. The main parameters which came into consideration were the size of the multiset and the number of colors. Though, in the many applications of Graph Motif, the input graph originates from real-life and has structure. Motivated by this prosaic observation, we systematically study its complexity relatively to graph structural parameters. For a wide range of parameters, we give new or improved FPT algorithms, or show that the problem remains intractable. For the FPT cases, we also give some kernelization lower bounds as well as some ETH-based lower bounds on the worst case running time. Interestingly, we establish that Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which is, to the best of our knowledge, the first problem to behave this way.Comment: 24 pages, accepted in DAM, conference version in IPEC 201

    Upper and lower bounds for finding connected motifs in vertex-colored graphs

    Get PDF
    International audienceWe study the problem of finding occurrences of motifs in vertex-colored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices whose multiset of colors equals the motif. This problem is a natural graph-theoretic pattern matching variant where we are not interested in the actual structure of the occurrence of the pattern, we only require it to preserve the very basic topological requirement of connectedness. We give two positive results and three negative results that together give an extensive picture of tractable and intractable instances of the problem

    Knapsack: Connectedness, Path, and Shortest-Path

    Full text link
    We study the knapsack problem with graph theoretic constraints. That is, we assume that there exists a graph structure on the set of items of knapsack and the solution also needs to satisfy certain graph theoretic properties on top of knapsack constraints. In particular, we need to compute in the connected knapsack problem a connected subset of items which has maximum value subject to the size of knapsack constraint. We show that this problem is strongly NP-complete even for graphs of maximum degree four and NP-complete even for star graphs. On the other hand, we develop an algorithm running in time O(2twlogtwpoly(min{s2,d2}))O\left(2^{tw\log tw}\cdot\text{poly}(\min\{s^2,d^2\})\right) where tw,s,dtw,s,d are respectively treewidth of the graph, size, and target value of the knapsack. We further exhibit a (1ϵ)(1-\epsilon) factor approximation algorithm running in time O(2twlogtwpoly(n,1/ϵ))O\left(2^{tw\log tw}\cdot\text{poly}(n,1/\epsilon)\right) for every ϵ>0\epsilon>0. We show similar results for several other graph theoretic properties, namely path and shortest-path under the problem names path-knapsack and shortestpath-knapsack. Our results seems to indicate that connected-knapsack is computationally hardest followed by path-knapsack and shortestpath-knapsack.Comment: Under revie

    Parallelizing Maximal Clique Enumeration on GPUs

    Full text link
    We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first traversal of independent subtrees in parallel. Since MCE suffers from high load imbalance and memory capacity requirements, we propose a worker list for dynamic load balancing, as well as partial induced subgraphs and a compact representation of excluded vertex sets to regulate memory consumption. Our evaluation shows that our GPU implementation on a single GPU outperforms the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x (up to 16.7x), and scales efficiently to multiple GPUs. Our code has been open-sourced to enable further research on accelerating MCE
    corecore