126 research outputs found
Heuristic Algorithms for the Maximum Colorful Subtree Problem
In metabolomics, small molecules are structurally elucidated using tandem mass spectrometry (MS/MS); this computational task can be formulated as the Maximum Colorful Subtree problem, which is NP-hard. Unfortunately, data from a single metabolite requires us to solve hundreds or thousands of instances of this problem - and in a single Liquid Chromatography MS/MS run, hundreds or thousands of metabolites are measured.
Here, we comprehensively evaluate the performance of several heuristic algorithms for the problem. Unfortunately, as is often the case in bioinformatics, the structure of the (chemically) true solution is not known to us; therefore we can only evaluate against the optimal solution of an instance. Evaluating the quality of a heuristic based on scores can be misleading: Even a slightly suboptimal solution can be structurally very different from the optimal solution, but it is the structure of a solution and not its score that is relevant for the downstream analysis. To this end, we propose a different evaluation setup: Given a set of candidate instances of which exactly one is known to be correct, the heuristic in question solves each instance to the best of its ability, producing a score for each instance, which is then used to rank the instances. We then evaluate whether the correct instance is ranked highly by the heuristic.
We find that one particular heuristic consistently ranks the correct instance in a top position. We also find that the scores of the best heuristic solutions are very close to the optimal score; in contrast, the structure of the solutions can deviate significantly from the optimal structures. Integrating the heuristic allowed us to speed up computations in practice by a factor of 100-fold
Towards de novo identification of metabolites by analyzing tandem mass spectra
Mass spectrometry is among the most widely used technologies in
proteomics and metabolomics. For metabolites, de novo interpretation of
spectra is even more important than for protein data, because metabolite
spectra databases cover only a small fraction of naturally occurring
metabolites. In this work, we analyze a method for fully automated de
novo identification of metabolites from tandem mass spectra. Mass
spectrometry data is usually assumed to be insufficient for
identification of molecular structures, so we want to estimate the
molecular formula of the unknown metabolite, a crucial step for
its identification. This is achieved by calculating the possible
formulas of the fragment peaks and then reconstructing the most likely
fragmentation tree from this information. We present tests on real mass
spectra showing that our algorithms solve the reconstruction problem
suitably fast and provide excellent results: For all 32 test compounds
the correct solution was among the top five suggestions, for 26
compounds the first suggestion of the exact algorithm was correct
Novel methods for the analysis of small molecule fragmentation mass spectra
The identification of small molecules, such as metabolites, in a high throughput manner plays an important in many research areas. Mass spectrometry (MS) is one of the predominant analysis technologies and is much more sensitive than nuclear magnetic resonance spectroscopy. Fragmentation of the molecules is used to obtain information beyond its mass. Gas chromatography-MS is one of the oldest and most widespread techniques for the analysis of small molecules. Commonly, the molecule is fragmented using electron ionization (EI). Using this technique, the molecular ion peak is often barely visible in the mass spectrum or even absent. We present a method to calculate fragmentation trees from high mass accuracy EI spectra, which annotate the peaks in the mass spectrum with molecular formulas of fragments and explain relevant fragmentation pathways. Fragmentation trees enable the identification of the molecular ion and its molecular formula if the molecular ion is present in the spectrum. The method works even if the molecular ion is of very low abundance. MS experts confirm that the calculated trees correspond very well to known fragmentation mechanisms.Using pairwise local alignments of fragmentation trees, structural and chemical similarities to already-known molecules can be determined. In order to compare a fragmentation tree of an unknown metabolite to a huge database of fragmentation trees, fast algorithms for solving the tree alignment problem are required. Unfortunately the alignment of unordered trees, such as fragmentation trees, is NP-hard. We present three exact algorithms for the problem. Evaluation of our methods showed that thousands of alignments can be computed in a matter of minutes.
Both the computation and the comparison of fragmentation trees are rule-free approaches that require no chemical knowledge about the unknown molecule and thus will be very helpful in the automated analysis of metabolites that are not included in common libraries
The Graph Motif problem parameterized by the structure of the input graph
The Graph Motif problem was introduced in 2006 in the context of biological
networks. It consists of deciding whether or not a multiset of colors occurs in
a connected subgraph of a vertex-colored graph. Graph Motif has been mostly
analyzed from the standpoint of parameterized complexity. The main parameters
which came into consideration were the size of the multiset and the number of
colors. Though, in the many applications of Graph Motif, the input graph
originates from real-life and has structure. Motivated by this prosaic
observation, we systematically study its complexity relatively to graph
structural parameters. For a wide range of parameters, we give new or improved
FPT algorithms, or show that the problem remains intractable. For the FPT
cases, we also give some kernelization lower bounds as well as some ETH-based
lower bounds on the worst case running time. Interestingly, we establish that
Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which
is, to the best of our knowledge, the first problem to behave this way.Comment: 24 pages, accepted in DAM, conference version in IPEC 201
Upper and lower bounds for finding connected motifs in vertex-colored graphs
International audienceWe study the problem of finding occurrences of motifs in vertex-colored graphs, where a motif is a multiset of colors, and an occurrence of a motif is a subset of connected vertices whose multiset of colors equals the motif. This problem is a natural graph-theoretic pattern matching variant where we are not interested in the actual structure of the occurrence of the pattern, we only require it to preserve the very basic topological requirement of connectedness. We give two positive results and three negative results that together give an extensive picture of tractable and intractable instances of the problem
Knapsack: Connectedness, Path, and Shortest-Path
We study the knapsack problem with graph theoretic constraints. That is, we
assume that there exists a graph structure on the set of items of knapsack and
the solution also needs to satisfy certain graph theoretic properties on top of
knapsack constraints. In particular, we need to compute in the connected
knapsack problem a connected subset of items which has maximum value subject to
the size of knapsack constraint. We show that this problem is strongly
NP-complete even for graphs of maximum degree four and NP-complete even for
star graphs. On the other hand, we develop an algorithm running in time
where
are respectively treewidth of the graph, size, and target value of the
knapsack. We further exhibit a factor approximation algorithm
running in time for
every . We show similar results for several other graph theoretic
properties, namely path and shortest-path under the problem names path-knapsack
and shortestpath-knapsack. Our results seems to indicate that
connected-knapsack is computationally hardest followed by path-knapsack and
shortestpath-knapsack.Comment: Under revie
Parallelizing Maximal Clique Enumeration on GPUs
We present a GPU solution for exact maximal clique enumeration (MCE) that
performs a search tree traversal following the Bron-Kerbosch algorithm. Prior
works on parallelizing MCE on GPUs perform a breadth-first traversal of the
tree, which has limited scalability because of the explosion in the number of
tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing
depth-first traversal of independent subtrees in parallel. Since MCE suffers
from high load imbalance and memory capacity requirements, we propose a worker
list for dynamic load balancing, as well as partial induced subgraphs and a
compact representation of excluded vertex sets to regulate memory consumption.
Our evaluation shows that our GPU implementation on a single GPU outperforms
the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x
(up to 16.7x), and scales efficiently to multiple GPUs. Our code has been
open-sourced to enable further research on accelerating MCE
- …