21 research outputs found

    Solving Parity Games in Scala

    Get PDF
    Parity games are two-player games, played on directed graphs, whose nodes are labeled with priorities. Along a play, the maximal priority occurring infinitely often determines the winner. In the last two decades, a variety of algorithms and successive optimizations have been proposed. The majority of them have been implemented in PGSolver, written in OCaml, which has been elected by the community as the de facto platform to solve efficiently parity games as well as evaluate their performance in several specific cases. PGSolver includes the Zielonka Recursive Algorithm that has been shown to perform better than the others in randomly generated games. However, even for arenas with a few thousand of nodes (especially over dense graphs), it requires minutes to solve the corresponding game. In this paper, we deeply revisit the implementation of the recursive algorithm introducing several improvements and making use of Scala Programming Language. These choices have been proved to be very successful, gaining up to two orders of magnitude in running time

    Memory Usage Inference for Object-Oriented Programs

    Get PDF
    We present a type-based approach to statically derive symbolic closed-form formulae that characterize the bounds of heap memory usages of programs written in object-oriented languages. Given a program with size and alias annotations, our inference system will compute the amount of memory required by the methods to execute successfully as well as the amount of memory released when methods return. The obtained analysis results are useful for networked devices with limited computational resources as well as embedded software.Singapore-MIT Alliance (SMA

    ccTSA: A Coverage-Centric Threaded Sequence Assembler

    Get PDF
    De novo sequencing, a process to find the whole genome or the regions of a species without references, requires much higher computational power compared to mapped sequencing with references. The advent and continuous evolution of next-generation sequencing technologies further stress the demands of high-throughput processing of myriads of short DNA fragments. Recently announced sequence assemblers, such as Velvet, SOAPdenovo, and ABySS, all exploit parallelism to meet these computational demands since contemporary computer systems primarily rely on scaling the number of computing cores to improve performance. However, most of them are not tailored to exploit the full potential of these systems, leading to suboptimal performance. In this paper, we present ccTSA, a parallel sequence assembler that utilizes coverage to prune k-mers, find preferred edges, and resolve conflicts in preferred edges between k-mers. We minimize computation dependencies between threads to effectively parallelize k-mer processing. We also judiciously allocate and reuse memory space in order to lower memory usage and further improve sequencing speed. The results of ccTSA are compelling such that it runs several times faster than other assemblers while providing comparable quality values such as N50

    The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract

    Full text link
    Recent work has shown that viewing allocators as black-box 2DBP solvers bears meaning. For instance, there exists a 2DBP-based fragmentation metric which often correlates monotonically with maximum resident set size (RSS). Given the field's indeterminacy with respect to fragmentation definitions, as well as the immense value of physical memory savings, we are motivated to set allocator-generated placements against their 2DBP-devised, makespan-optimizing counterparts. Of course, allocators must operate online while 2DBP algorithms work on complete request traces; but since both sides optimize criteria related to minimizing memory wastage, the idea of studying their relationship preserves its intellectual--and practical--interest. Unfortunately no implementations of 2DBP algorithms for DSA are available. This paper presents a first, though partial, implementation of the state-of-the-art. We validate its functionality by comparing its outputs' makespan to the theoretical upper bound provided by the original authors. Along the way, we identify and document key details to assist analogous future efforts. Our experiments comprise 4 modern allocators and 8 real application workloads. We make several notable observations on our empirical evidence: in terms of makespan, allocators outperform Robson's worst-case lower bound 93.75%93.75\% of the time. In 87.5%87.5\% of cases, GNU's \texttt{malloc} implementation demonstrates equivalent or superior performance to the 2DBP state-of-the-art, despite the second operating offline. Most surprisingly, the 2DBP algorithm proves competent in terms of fragmentation, producing up to 2.462.46x better solutions. Future research can leverage such insights towards memory-targeting optimizations.Comment: 13 pages, 10 figures, 3 tables. To appear in ISMM '2

    HALO: Post-Link Heap-Layout Optimisation

    Get PDF
    Today, general-purpose memory allocators dominate the landscape of dynamic memory management. While these so- lutions can provide reasonably good behaviour across a wide range of workloads, it is an unfortunate reality that their behaviour for any particular workload can be highly suboptimal. By catering primarily to average and worst-case usage patterns, these allocators deny programs the advantages of domain-specific optimisations, and thus may inadvertently place data in a manner that hinders performance, generating unnecessary cache misses and load stalls. To help alleviate these issues, we propose HALO: a post-link profile-guided optimisation tool that can improve the layout of heap data to reduce cache misses automatically. Profiling the target binary to understand how allocations made in different contexts are related, we specialise memory-management routines to allocate groups of related objects from separate pools to increase their spatial locality. Unlike other solutions of its kind, HALO employs novel grouping and identification algorithms which allow it to create tight-knit allocation groups using the entire call stack and to identify these efficiently at runtime. Evaluation of HALO on contemporary out-of-order hardware demonstrates speedups of up to 28% over jemalloc, out-performing a state-of-the-art data placement technique from the literature

    Cooperative cache scrubbing

    Get PDF
    Managing the limited resources of power and memory bandwidth while improving performance on multicore hardware is challeng-ing. In particular, more cores demand more memory bandwidth, and multi-threaded applications increasingly stress memory sys-tems, leading to more energy consumption. However, we demon-strate that not all memory traffic is necessary. For modern Java pro-grams, 10 to 60 % of DRAM writes are useless, because the data on these lines are dead- the program is guaranteed to never read them again. Furthermore, reading memory only to immediately zero ini-tialize it wastes bandwidth. We propose a software/hardware coop-erative solution: the memory manager communicates dead and zero lines with cache scrubbing instructions. We show how scrubbing instructions satisfy MESI cache coherence protocol invariants and demonstrate them in a Java Virtual Machine and multicore simula-tor. Scrubbing reduces average DRAM traffic by 59%, total DRAM energy by 14%, and dynamic DRAM energy by 57 % on a range of configurations. Cooperative software/hardware cache scrubbing reduces memory bandwidth and improves energy efficiency, two critical problems in modern systems