11 research outputs found
Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm
Many practical problems in almost all scientific and technological
disciplines have been classified as computationally hard (NP-hard or even
NP-complete). In life sciences, combinatorial optimization problems frequently
arise in molecular biology, e.g., genome sequencing; global alignment of
multiple genomes; identifying siblings or discovery of dysregulated pathways.In
almost all of these problems, there is the need for proving a hypothesis about
certain property of an object that can be present only when it adopts some
particular admissible structure (an NP-certificate) or be absent (no admissible
structure), however, none of the standard approaches can discard the hypothesis
when no solution can be found, since none can provide a proof that there is no
admissible structure. This article presents an algorithm that introduces a
novel type of solution method to "efficiently" solve the graph 3-coloring
problem; an NP-complete problem. The proposed method provides certificates
(proofs) in both cases: present or absent, so it is possible to accept or
reject the hypothesis on the basis of a rigorous proof. It provides exact
solutions and is polynomial-time (i.e., efficient) however parametric. The only
requirement is sufficient computational power, which is controlled by the
parameter . Nevertheless, here it is proved that the
probability of requiring a value of to obtain a solution for a
random graph decreases exponentially: , making
tractable almost all problem instances. Thorough experimental analyses were
performed. The algorithm was tested on random graphs, planar graphs and
4-regular planar graphs. The obtained experimental results are in accordance
with the theoretical expected results.Comment: Working pape
Evaluation of ILP-based approaches for partitioning into colorful components
The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances
Yippelia: Triggering Deep Property Violations in Hardware Designs through Symbolic Execution
We in Yippelia attempt to automatically identify deep bugs in hardware designs by symbolically exploring hardware designs for one clock cycle and then stitching the generated simple paths to form a multi-cycle path from the reset state to the buggy state. Compared to a state-of-the-art symbolic execution engine, Yippelia has an average speedup of at least four orders of magnitude on finding deep bugs on the up-down counter hardware design.Bachelor of Scienc
Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering
The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance