Search CORE

21 research outputs found

Solving Parity Games in Scala

Author: Aniello Murano
DI STASIO ANTONIO
Loredana Sorrentino
Vincenzo Prignano
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Parity games are two-player games, played on directed graphs, whose nodes are labeled with priorities. Along a play, the maximal priority occurring infinitely often determines the winner. In the last two decades, a variety of algorithms and successive optimizations have been proposed. The majority of them have been implemented in PGSolver, written in OCaml, which has been elected by the community as the de facto platform to solve efficiently parity games as well as evaluate their performance in several specific cases. PGSolver includes the Zielonka Recursive Algorithm that has been shown to perform better than the others in randomly generated games. However, even for arenas with a few thousand of nodes (especially over dense graphs), it requires minutes to solve the corresponding game. In this paper, we deeply revisit the implementation of the recursive algorithm introducing several improvements and making use of Scala Programming Language. These choices have been proved to be very successful, gaining up to two orders of magnitude in running time

Archivio della ricerca- Università di Roma La Sapienza

Memory Usage Inference for Object-Oriented Programs

Author: Chin Wei Ngan
Nguyen Huu Hai
Qin Shengchao
Rinard Martin C.
Publication venue
Publication date: 01/01/2005
Field of study

We present a type-based approach to statically derive symbolic closed-form formulae that characterize the bounds of heap memory usages of programs written in object-oriented languages. Given a program with size and alias annotations, our inference system will compute the amount of memory required by the methods to execute successfully as well as the amount of memory released when methods return. The obtained analysis results are useful for networked devices with limited computational resources as well as embedded software.Singapore-MIT Alliance (SMA

DSpace@MIT

ccTSA: A Coverage-Centric Threaded Sequence Assembler

Author: B Marteb
BG Jackson
Carl Kingsford
D Culler
DC Richter
DR Kelley
ED Berger
FA Stephen
G Marçais
J Butler
JL Hennessy
JR Miller
JT Simpson
Jung Ho Ahn
PA Pevzner
R Li
RM Elaine
RZ Daniel
SL Salzberg
TF Smith
Z Wenyu
Z Zhang
Publication venue: Public Library of Science
Publication date: 19/06/2012
Field of study

De novo sequencing, a process to find the whole genome or the regions of a species without references, requires much higher computational power compared to mapped sequencing with references. The advent and continuous evolution of next-generation sequencing technologies further stress the demands of high-throughput processing of myriads of short DNA fragments. Recently announced sequence assemblers, such as Velvet, SOAPdenovo, and ABySS, all exploit parallelism to meet these computational demands since contemporary computer systems primarily rely on scaling the number of computing cores to improve performance. However, most of them are not tailored to exploit the full potential of these systems, leading to suboptimal performance. In this paper, we present ccTSA, a parallel sequence assembler that utilizes coverage to prune k-mers, find preferred edges, and resolve conflicts in preferred edges between k-mers. We minimize computation dependencies between threads to effectively parallelize k-mer processing. We also judiciously allocate and reuse memory space in order to lower memory usage and further improve sequencing speed. The results of ccTSA are compelling such that it runs several times faster than other assemblers while providing comparable quality values such as N50

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

FigShare

The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract

Author: Catthoor Francky
Lamprakos Christos P.
Soudris Dimitrios
Xydis Sotirios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/05/2023
Field of study

Recent work has shown that viewing allocators as black-box 2DBP solvers bears meaning. For instance, there exists a 2DBP-based fragmentation metric which often correlates monotonically with maximum resident set size (RSS). Given the field's indeterminacy with respect to fragmentation definitions, as well as the immense value of physical memory savings, we are motivated to set allocator-generated placements against their 2DBP-devised, makespan-optimizing counterparts. Of course, allocators must operate online while 2DBP algorithms work on complete request traces; but since both sides optimize criteria related to minimizing memory wastage, the idea of studying their relationship preserves its intellectual--and practical--interest. Unfortunately no implementations of 2DBP algorithms for DSA are available. This paper presents a first, though partial, implementation of the state-of-the-art. We validate its functionality by comparing its outputs' makespan to the theoretical upper bound provided by the original authors. Along the way, we identify and document key details to assist analogous future efforts. Our experiments comprise 4 modern allocators and 8 real application workloads. We make several notable observations on our empirical evidence: in terms of makespan, allocators outperform Robson's worst-case lower bound

93.75\%

of the time. In

87.5\%

of cases, GNU's \texttt{malloc} implementation demonstrates equivalent or superior performance to the 2DBP state-of-the-art, despite the second operating offline. Most surprisingly, the 2DBP algorithm proves competent in terms of fragmentation, producing up to

2.46

x better solutions. Future research can leverage such insights towards memory-targeting optimizations.Comment: 13 pages, 10 figures, 3 tables. To appear in ISMM '2

arXiv.org e-Print Archive

HALO: Post-Link Heap-Layout Optimisation

Author: Berger Emery D.
Berger Emery D.
Calder Brad
Chilimbi Trishul M.
Chilimbi Trishul M.
David
Evans Jason
Evans Jason
Leijen Daan
Matthew
Nevill-Manning C. G.
Newman M. E. J.
Powers Bobby
Standard Performance Evaluation Corporation
Standard Performance Evaluation Corporation
Trishul
Truong D. N.
Publication venue: CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization
Publication date: 01/02/2020
Field of study

Today, general-purpose memory allocators dominate the landscape of dynamic memory management. While these so- lutions can provide reasonably good behaviour across a wide range of workloads, it is an unfortunate reality that their behaviour for any particular workload can be highly suboptimal. By catering primarily to average and worst-case usage patterns, these allocators deny programs the advantages of domain-specific optimisations, and thus may inadvertently place data in a manner that hinders performance, generating unnecessary cache misses and load stalls. To help alleviate these issues, we propose HALO: a post-link profile-guided optimisation tool that can improve the layout of heap data to reduce cache misses automatically. Profiling the target binary to understand how allocations made in different contexts are related, we specialise memory-management routines to allocate groups of related objects from separate pools to increase their spatial locality. Unlike other solutions of its kind, HALO employs novel grouping and identification algorithms which allow it to create tight-knit allocation groups using the entire call stack and to identify these efficiently at runtime. Evaluation of HALO on contemporary out-of-order hardware demonstrates speedups of up to 28% over jemalloc, out-performing a state-of-the-art data placement technique from the literature

Crossref

Apollo (Cambridge)

Cooperative cache scrubbing

Author: Blackburn Stephen M.
Eeckhout Lieven
Heirman Wim
McKinley Kathryn S
Sartor Jennifer
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2014
Field of study

Managing the limited resources of power and memory bandwidth while improving performance on multicore hardware is challeng-ing. In particular, more cores demand more memory bandwidth, and multi-threaded applications increasingly stress memory sys-tems, leading to more energy consumption. However, we demon-strate that not all memory traffic is necessary. For modern Java pro-grams, 10 to 60 % of DRAM writes are useless, because the data on these lines are dead- the program is guaranteed to never read them again. Furthermore, reading memory only to immediately zero ini-tialize it wastes bandwidth. We propose a software/hardware coop-erative solution: the memory manager communicates dead and zero lines with cache scrubbing instructions. We show how scrubbing instructions satisfy MESI cache coherence protocol invariants and demonstrate them in a Java Virtual Machine and multicore simula-tor. Scrubbing reduces average DRAM traffic by 59%, total DRAM energy by 14%, and dynamic DRAM energy by 57 % on a range of configurations. Cooperative software/hardware cache scrubbing reduces memory bandwidth and improves energy efficiency, two critical problems in modern systems

CiteSeerX

Crossref

Ghent University Academic Bibliography

The Australian National University