11 research outputs found

    Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm

    Get PDF
    Many practical problems in almost all scientific and technological disciplines have been classified as computationally hard (NP-hard or even NP-complete). In life sciences, combinatorial optimization problems frequently arise in molecular biology, e.g., genome sequencing; global alignment of multiple genomes; identifying siblings or discovery of dysregulated pathways.In almost all of these problems, there is the need for proving a hypothesis about certain property of an object that can be present only when it adopts some particular admissible structure (an NP-certificate) or be absent (no admissible structure), however, none of the standard approaches can discard the hypothesis when no solution can be found, since none can provide a proof that there is no admissible structure. This article presents an algorithm that introduces a novel type of solution method to "efficiently" solve the graph 3-coloring problem; an NP-complete problem. The proposed method provides certificates (proofs) in both cases: present or absent, so it is possible to accept or reject the hypothesis on the basis of a rigorous proof. It provides exact solutions and is polynomial-time (i.e., efficient) however parametric. The only requirement is sufficient computational power, which is controlled by the parameter αN\alpha\in\mathbb{N}. Nevertheless, here it is proved that the probability of requiring a value of α>k\alpha>k to obtain a solution for a random graph decreases exponentially: P(α>k)2(k+1)P(\alpha>k) \leq 2^{-(k+1)}, making tractable almost all problem instances. Thorough experimental analyses were performed. The algorithm was tested on random graphs, planar graphs and 4-regular planar graphs. The obtained experimental results are in accordance with the theoretical expected results.Comment: Working pape

    Evaluation of ILP-based approaches for partitioning into colorful components

    Get PDF
    The NP-hard Colorful Components problem is a graph partitioning problem on vertex-colored graphs. We identify a new application of Colorful Components in the correction of Wikipedia interlanguage links, and describe and compare three exact and two heuristic approaches. In particular, we devise two ILP formulations, one based on Hitting Set and one based on Clique Partition. Furthermore, we use the recently proposed implicit hitting set framework [Karp, JCSS 2011; Chandrasekaran et al., SODA 2011] to solve Colorful Components. Finally, we study a move-based and a merge-based heuristic for Colorful Components. We can optimally solve Colorful Components for Wikipedia link correction data; while the Clique Partition-based ILP outperforms the other two exact approaches, the implicit hitting set is a simple and competitive alternative. The merge-based heuristic is very accurate and outperforms the move-based one. The above results for Wikipedia data are confirmed by experiments with synthetic instances

    Yippelia: Triggering Deep Property Violations in Hardware Designs through Symbolic Execution

    Get PDF
    We in Yippelia attempt to automatically identify deep bugs in hardware designs by symbolically exploring hardware designs for one clock cycle and then stitching the generated simple paths to form a multi-cycle path from the reset state to the buggy state. Compared to a state-of-the-art symbolic execution engine, Yippelia has an average speedup of at least four orders of magnitude on finding deep bugs on the up-down counter hardware design.Bachelor of Scienc

    Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering

    Get PDF
    The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance
    corecore