Search CORE

96,367 research outputs found

Efficient Set Sharing Using ZBDDs

Author: C. Meinel
J. Dean
J. Whaley
K. Muthukumar
K. Muthukumar
M. Berndl
M. Codish
M. Méndez-Lojo
R.E. Bryant
S.I. Minato
S.I. Minato
T. Yoneda
X. Leroy
É. Payet
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Set sharing is an abstract domain in which each concrete object is represented by the set of local variables from which it might be reachable. It is a useful abstraction to detect parallelism opportunities, since it contains definite information about which variables do not share in memory, i.e., about when the memory regions reachable from those variables are disjoint. Set sharing is a more precise alternative to pair sharing, in which each domain element is a set of all pairs of local variables from which a common object may be reachable. However, the exponential complexity of some set sharing operations has limited its wider application. This work introduces an efficient implementation of the set sharing domain using Zero-suppressed Binary Decision Diagrams (ZBDDs). Because ZBDDs were designed to represent sets of combinations (i.e., sets of sets), they naturally represent elements of the set sharing domain. We show how to synthesize the operations needed in the set sharing transfer functions from basic ZBDD operations. For some of the operations, we devise custom ZBDD algorithms that perform better in practice. We also compare our implementation of the abstract domain with an efficient, compact, bit set-based alternative, and show that the ZBDD version scales better in terms of both memory usage and running time

Crossref

Archivo Digital UPM

Shared Arrangements: practical inter-query sharing for streaming dataflows

Author: Lattuada Andrea
McSherry Frank
Roscoe Timothy
Schwarzkopf Malte
Publication venue
Publication date: 01/06/2020
Field of study

Current systems for data-parallel, incremental processing and view maintenance over high-rate streams isolate the execution of independent queries. This creates unwanted redundancy and overhead in the presence of concurrent incrementally maintained queries: each query must independently maintain the same indexed state over the same input streams, and new queries must build this state from scratch before they can begin to emit their first results. This paper introduces shared arrangements: indexed views of maintained state that allow concurrent queries to reuse the same in-memory state without compromising data-parallel performance and scaling. We implement shared arrangements in a modern stream processor and show order-of-magnitude improvements in query response time and resource consumption for interactive queries against high-throughput streams, while also significantly improving performance in other domains including business analytics, graph processing, and program analysis

arXiv.org e-Print Archive

Repository for Publications and Research Data

Dynamic resource allocation scheme for distributed heterogeneous computer systems

Author: Liu Howard T.
Silvester John A.
Publication venue
Publication date: 09/07/1991
Field of study

This invention relates to a resource allocation in computer systems, and more particularly, to a method and associated apparatus for shortening response time and improving efficiency of a heterogeneous distributed networked computer system by reallocating the jobs queued up for busy nodes to idle, or less-busy nodes. In accordance with the algorithm (SIDA for short), the load-sharing is initiated by the server device in a manner such that extra overhead in not imposed on the system during heavily-loaded conditions. The algorithm employed in the present invention uses a dual-mode, server-initiated approach. Jobs are transferred from heavily burdened nodes (i.e., over a high threshold limit) to low burdened nodes at the initiation of the receiving node when: (1) a job finishes at a node which is burdened below a pre-established threshold level, or (2) a node is idle for a period of time as established by a wakeup timer at the node. The invention uses a combination of the local queue length and the local service rate ratio at each node as the workload indicator

NASA Technical Reports Server

Independent AND-parallel implementation of narrowing

Author: Hermenegildo Manuel V.
Kuchen Herbert
Moreno Navarro Juan José
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/1992
Field of study

We present a parallel graph narrowing machine, which is used to implement a functional logic language on a shared memory multiprocessor. It is an extensión of an abstract machine for a purely functional language. The result is a programmed graph reduction machine which integrates the mechanisms of unification, backtracking, and independent and-parallelism. In the machine, the subexpressions of an expression can run in parallel. In the case of backtracking, the structure of an expression is used to avoid the reevaluation of subexpressions as far as possible. Deterministic computations are detected. Their results are maintained and need not be reevaluated after backtracking

CiteSeerX

Archivo Digital UPM

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

Author: Benini Luca
Capotondi Alessandro
Kurth Andreas
Marongiu Andrea
Vogel Pirmin
Publication venue
Publication date: 01/01/2017
Field of study

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due to the challenges of building a prototyping platform that unites an industry-standard host processor with an open research PMCA architecture. In this work we introduce HERO, an FPGA-based research platform that combines a PMCA composed of clusters of RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor. The PMCA architecture mapped on the FPGA is silicon-proven, scalable, configurable, and fully modifiable. HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host and PMCA. HERO is designed to facilitate rapid exploration on all software and hardware layers: run-time behavior can be accurately analyzed by tracing events, and modifications can be validated through fully automated hard ware and software builds and executed tests. We demonstrate the usefulness of HERO by means of case studies from our research

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia