9 research outputs found

    Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code

    Get PDF
    Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide’s state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regenerate the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. We abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75% performance improvement, four kernels from IrfanView, leading to 4.97× performance, and one stencil from the miniGMG multigrid benchmark netting a 4.25× improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop’s filters with our lifted implementations, giving 1.12× speedup without affecting the user experience.United States. Dept. of Energy (Award DE-SC0005288)United States. Dept. of Energy (Award DE-SC0008923)United States. Defense Advanced Research Projects Agency (Agreement FA8759-14-2-0009)MIT Energy Initiative (Fellowship

    Arithmetic Expression Construction

    Get PDF
    When can nn given numbers be combined using arithmetic operators from a given subset of {+,,×,÷}\{+, -, \times, \div\} to obtain a given target number? We study three variations of this problem of Arithmetic Expression Construction: when the expression (1) is unconstrained; (2) has a specified pattern of parentheses and operators (and only the numbers need to be assigned to blanks); or (3) must match a specified ordering of the numbers (but the operators and parenthesization are free). For each of these variants, and many of the subsets of {+,,×,÷}\{+,-,\times,\div\}, we prove the problem NP-complete, sometimes in the weak sense and sometimes in the strong sense. Most of these proofs make use of a "rational function framework" which proves equivalence of these problems for values in rational functions with values in positive integers.Comment: 36 pages, 5 figures. Full version of paper accepted to 31st International Symposium on Algorithms and Computation (ISAAC 2020

    Commensal compiler for high-performance stream programming

    No full text
    Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 57-63).There are domain-specific libraries for many domains, enabling rapid and cost-effective development of complex applications. On the other hand, domain-specific languages are rare despite the performance advantages of compilation. We believe the reason is the multiple orders-of-magnitude higher cost of building a compiler compared to building a library. We propose commensal compilation, a new strategy for compiling embedded domain-specific languages by reusing the massive investment in modern language virtual machine platforms. Commensal compilers use the host language's front-end, use an autotuner instead of optimization heuristics, and use host platform APIs that enable back-end optimizations by the host platform JIT. The cost of implementing a commensal compiler is only the cost of implementing the domain-specific optimizations. We demonstrate the concept by implementing a commensal compiler for the stream programming language StreamJIT atop the Java platform. The StreamJIT commensal compiler takes advantage of the structure of stream programs to find the right amount of parallelism for a given machine and program. Our compiler achieves 2.4 times better performance than StreamIt's native code (via GCC) compiler with considerably less implementation effort.by Jeffrey Bosboom.S.M

    Mario Kart Is Hard

    No full text
    Nintendo’s Mario Kart is perhaps the most popular racing video game franchise. Players race alone or against opponents to finish in the fastest time possible. Players can also use items to attack and defend from other racers. We prove two hardness results for generalized Mario Kart: deciding whether a driver can finish a course alone in some given time is NP-hard, and deciding whether a player can beat an opponent in a race is PSPACE-hard

    Even 1 × n Edge-Matching and Jigsaw Puzzles are Really Hard

    No full text
    We prove the computational intractability of rotating and placing n square tiles into a 1 × n array such that adjacent tiles are compatible-either equal edge colors, as in edge-matching puzzles, or matching tab/pocket shapes, as in jigsaw puzzles. Beyond basic NP-hardness, we prove that it is NP-hard even to approximately maximize the number of placed tiles (allowing blanks), while satisfying the compatibility constraint between nonblank tiles, within a factor of 0.9999999702 (On the other hand, there is an easy 1/2 -approximation). This is the first (correct) proof of inapproximability for edge-matching and jigsaw puzzles. Along the way, we prove NP-hardness of distinguishing, for a directed graph on n nodes, between having a Hamiltonian path (length n - 1) and having at most 0.999999284(n - 1) edges that form a vertex-disjoint union of paths. We use this gap hardness and gap-preserving reductions to establish similar gap hardness for 1 × n jigsaw and edge-matching puzzles. Keywords: edge-matching puzzles; jigsaw puzzles; computational complexity; hardness of approximatio

    Who witnesses the witness? Finding witnesses in the witness is hard and sometimes impossible

    No full text
    We analyze the computational complexity of the many types of pencil-and-paper-style puzzles featured in the 2016 puzzle video game The Witness. In all puzzles, the goal is to draw a path in a rectangular grid graph from a start vertex to a destination vertex. The different puzzle types place different constraints on the path: preventing some edges from being visited (broken edges); forcing some edges or vertices to be visited (hexagons); forcing some cells to have certain numbers of incident path edges (triangles); or forcing the regions formed by the path to be partially monochromatic (squares), have exactly two special cells (stars), or be singly covered by given shapes (polyominoes) and/or negatively counting shapes (antipolyominoes). We show that any one of these clue types (except the first) is enough to make path finding NP-complete ("witnesses exist but are hard to find"), even for rectangular boards. Furthermore, we show that a final clue type (antibody), which necessarily "cancels" the effect of another clue in the same region, makes path finding Σ2-complete ("witnesses do not exist"), even with a single antibody (combined with many anti/polyominoes), and the problem gets no harder with many antibodies

    Who witnesses The Witness? Finding witnesses in The Witness is hard and sometimes impossible

    No full text
    We analyze the computational complexity of the many types of pencil-and-paper-style puzzles featured in the 2016 puzzle video game The Witness. In all puzzles, the goal is to draw a simple path in a rectangular grid graph from a start vertex to a destination vertex. The different puzzle types place different constraints on the path: preventing some edges from being visited (broken edges); forcing some edges or vertices to be visited (hexagons); forcing some cells to have certain numbers of incident path edges (triangles); or forcing the regions formed by the path to be partially monochromatic (squares), have exactly two special cells (stars), or be singly covered by given shapes (polyominoes) and/or negatively counting shapes (antipolyominoes). We show that any one of these clue types (except the first) is enough to make path finding NP-complete (“witnesses exist but are hard to find”), even for rectangular boards. Furthermore, we show that a final clue type (antibody), which necessarily “cancels” the effect of another clue in the same region, makes path finding Σ2-complete (“witnesses do not exist”), even with a single antibody (combined with many anti/polyominoes), and the problem gets no harder with many antibodies. On the positive side, we give a polynomial-time algorithm for monomino clues, by reducing to hexagon clues on the boundary of the puzzle, even in the presence of broken edges, and solving “subset Hamiltonian path” for terminals on the boundary of an embedded planar graph in polynomial time

    Edge Matching with Inequalities, Triangles, Unknown Shape, and Two Players

    No full text
    © 2020 Information Processing Society of Japan. We analyze the computational complexity of several new variants of edge-matching puzzles. First we analyze inequality (instead of equality) constraints between adjacent tiles, proving the problem NP-complete for strict inequalities but polynomial-time solvable for nonstrict inequalities. Second we analyze three types of triangular edge matching, of which one is polynomial-time solvable and the other two are NP-complete; all three are #P-complete. Third we analyze the case where no target shape is specified and we merely want to place the (square) tiles so that edges match exactly; this problem is NP-complete. Fourth we consider four 2-player games based on 1×n edge matching, all four of which are PSPACE-complete. Most of our NP-hardness reductions are parsimonious, newly proving #P and ASP-completeness for, e.g., 1 × n edge matching. Along the way, we prove #P-and ASP-completeness of planar 3-regular directed Hamiltonicity; we provide linear-time algorithms to find antidirected and forbidden-transition Eulerian paths; and we characterize the complexity of new partizan variants of the Geography game on graphs
    corecore