Search CORE

97,506 research outputs found

Inherently workload-balanced clustered microarchitecture

Author: Abella Ferrer Jaume
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

GLB: Lifeline-based Global Load Balancing library in X10

Author: Grove David
Herta Benjamin
Kamada Tomio
Saraswat Vijay
Takeuchi Mikio
Tardieu Olivier
Zhang Wei
Publication venue
Publication date: 19/12/2013
Field of study

We present GLB, a programming model and an associated implementation that can handle a wide range of irregular paral- lel programming problems running over large-scale distributed systems. GLB is applicable both to problems that are easily load-balanced via static scheduling and to problems that are hard to statically load balance. GLB hides the intricate syn- chronizations (e.g., inter-node communication, initialization and startup, load balancing, termination and result collection) from the users. GLB internally uses a version of the lifeline graph based work-stealing algorithm proposed by Saraswat et al. Users of GLB are simply required to write several pieces of sequential code that comply with the GLB interface. GLB then schedules and orchestrates the parallel execution of the code correctly and efficiently at scale. We have applied GLB to two representative benchmarks: Betweenness Centrality (BC) and Unbalanced Tree Search (UTS). Among them, BC can be statically load-balanced whereas UTS cannot. In either case, GLB scales well-- achieving nearly linear speedup on different computer architectures (Power, Blue Gene/Q, and K) -- up to 16K cores

arXiv.org e-Print Archive

CiteSeerX

Packet Transactions: High-level Programming for Line-Rate Switches

Author: Alizadeh Mohammad
Balakrishnan Hari
Budiu Mihai
Cheung Alvin
Kim Changhoon
Licking Steve
McKeown Nick
Sivaraman Anirudh
Varghese George
Publication venue
Publication date: 29/01/2016
Field of study

Many algorithms for congestion control, scheduling, network measurement, active queue management, security, and load balancing require custom processing of packets as they traverse the data plane of a network switch. To run at line rate, these data-plane algorithms must be in hardware. With today's switch hardware, algorithms cannot be changed, nor new algorithms installed, after a switch has been built. This paper shows how to program data-plane algorithms in a high-level language and compile those programs into low-level microcode that can run on emerging programmable line-rate switching chipsets. The key challenge is that these algorithms create and modify algorithmic state. The key idea to achieve line-rate programmability for stateful algorithms is the notion of a packet transaction : a sequential code block that is atomic and isolated from other such code blocks. We have developed this idea in Domino, a C-like imperative language to express data-plane algorithms. We show with many examples that Domino provides a convenient and natural way to express sophisticated data-plane algorithms, and show that these algorithms can be run at line rate with modest estimated die-area overhead.Comment: 16 page

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Average treatment effect estimation via random recursive partitioning

Author: Iacus Stefano
Porro Giuseppe
Publication venue
Publication date: 01/01/2004
Field of study

A new matching method is proposed for the estimation of the average treatment effect of social policy interventions (e.g., training programs or health care measures). Given an outcome variable, a treatment and a set of pre-treatment covariates, the method is based on the examination of random recursive partitions of the space of covariates using regression trees. A regression tree is grown either on the treated or on the untreated individuals {\it only} using as response variable a random permutation of the indexes 1...

n

(

n

being the number of units involved), while the indexes for the other group are predicted using this tree. The procedure is replicated in order to rule out the effect of specific permutations. The average treatment effect is estimated in each tree by matching treated and untreated in the same terminal nodes. The final estimator of the average treatment effect is obtained by averaging on all the trees grown. The method does not require any specific model assumption apart from the tree's complexity, which does not affect the estimator though. We show that this method is either an instrument to check whether two samples can be matched (by any method) and, when this is feasible, to obtain reliable estimates of the average treatment effect. We further propose a graphical tool to inspect the quality of the match. The method has been applied to the National Supported Work Demonstration data, previously analyzed by Lalonde (1986) and others

arXiv.org e-Print Archive

CiteSeerX

Compression by Contracting Straight-Line Programs

Author: Ganardi Moses
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th Annual European Symposium on Algorithms (ESA 2021)
Publication date: 01/01/2021
Field of study

In grammar-based compression a string is represented by a context-free grammar, also called a straight-line program (SLP), that generates only that string. We refine a recent balancing result stating that one can transform an SLP of size

g

in linear time into an equivalent SLP of size

O(g)

so that the height of the unique derivation tree is

O(\log N)

where

N

is the length of the represented string (FOCS 2019). We introduce a new class of balanced SLPs, called contracting SLPs, where for every rule

A \to \beta_1 \dots \beta_k

the string length of every variable

\beta_i

on the right-hand side is smaller by a constant factor than the string length of

A

. In particular, the derivation tree of a contracting SLP has the property that every subtree has logarithmic height in its leaf size. We show that a given SLP of size

g

can be transformed in linear time into an equivalent contracting SLP of size

O(g)

with rules of constant length. We present an application to the navigation problem in compressed unranked trees, represented by forest straight-line programs (FSLPs). We extend a linear space data structure by Reh and Sieber (2020) by the operation of moving to the

i

-th child in time

O(\log d)

where

d

is the degree of the current node. Contracting SLPs are also applied to the finger search problem over SLP-compressed strings where one wants to access positions near to a pre-specified finger position, ideally in