Search CORE

5,208 research outputs found

A Parallel semantics for normal logic programs plus time

Author: Clayton Roger
Cleary John G.
Utting Mark
Publication venue: University of Waikato, Department of Computer Science
Publication date: 11/11/2013
Field of study

It is proposed that Normal Logic Programs with an explicit time ordering are a suitable basis for a general purpose parallel programming language. Examples show that such a language can accept real-time external inputs and outputs, and mimic assignment, all without departing from its pure logical semantics. This paper describes a fully incremental bottom-up interpreter that supports a wide range of parallel execution strategies and can extract significant potential parallelism from programs with complex dependencies

Research Commons@Waikato

Energy-Efficient Multiprocessor Scheduling for Flow Time and Makespan

Author: Agrawal
Albers
Albers
Andrew
Andrew
Bansal
Bansal
Becchetti
Bender
Blumofe
Borodin
Boyd
Brecht
Brecht
Brooks
Chan
Chan
Chan
Chan
Chen
Deng
Edmonds
Edmonds
Edmonds
Fox
Greiner
Grunwald
Hardy
He
Herbert
Hongyang Sun
Im
Irani
Jaffe
Kalyanasundaram
Kim
Kim
Lam
Lam
Mudge
Pruhs
Pruhs
Robert
Rui Fan
Shmoys
Sun
Sun
Sun
Trick
Weiser
Wen-Jing Hsu
Yao
Yuxiong He
Zhang
Zhao
Publication venue: 'Elsevier BV'
Publication date: 19/01/2014
Field of study

We consider energy-efficient scheduling on multiprocessors, where the speed of each processor can be individually scaled, and a processor consumes power

s^{\alpha}

when running at speed

s

, for

\alpha>1

. A scheduling algorithm needs to decide at any time both processor allocations and processor speeds for a set of parallel jobs with time-varying parallelism. The objective is to minimize the sum of the total energy consumption and certain performance metric, which in this paper includes total flow time and makespan. For both objectives, we present instantaneous parallelism clairvoyant (IP-clairvoyant) algorithms that are aware of the instantaneous parallelism of the jobs at any time but not their future characteristics, such as remaining parallelism and work. For total flow time plus energy, we present an

O(1)

-competitive algorithm, which significantly improves upon the best known non-clairvoyant algorithm and is the first constant competitive result on multiprocessor speed scaling for parallel jobs. In the case of makespan plus energy, which is considered for the first time in the literature, we present an

O(\ln^{1-1/\alpha}P)

-competitive algorithm, where

P

is the total number of processors. We show that this algorithm is asymptotically optimal by providing a matching lower bound. In addition, we also study non-clairvoyant scheduling for total flow time plus energy, and present an algorithm that achieves

O(\ln P)

-competitive for jobs with arbitrary release time and

O(\ln^{1/\alpha}P)

-competitive for jobs with identical release time. Finally, we prove an

\Omega(\ln^{1/\alpha}P)

lower bound on the competitive ratio of any non-clairvoyant algorithm, matching the upper bound of our algorithm for jobs with identical release time

arXiv.org e-Print Archive

Crossref

Distributed Training Large-Scale Deep Architectures

Author: Chang Edward Y.
Chen Chun-Yen
Chou Chun-Nan
Lin Ting-Wei
Sung Cheng-Lung
Tsao Chia-Chin
Tung Kuan-Chieh
Wu Jui-Lin
Zou Shang-Xuan
Publication venue
Publication date: 10/08/2017
Field of study

Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training

arXiv.org e-Print Archive

Crossref

Processor Allocation for Optimistic Parallelization of Irregular Programs

Author: A. Braunstein
D. Freedman
F. Versaci
H.D. Friedman
J. Jensen
J. Reinders
K. Agrawal
K. Georgiou
K. Pingali
L.J. Guibas
L.S. Blackford
M. Frigo
M. Püschel
P. An
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Optimistic parallelization is a promising approach for the parallelization of irregular algorithms: potentially interfering tasks are launched dynamically, and the runtime system detects conflicts between concurrent activities, aborting and rolling back conflicting tasks. However, parallelism in irregular algorithms is very complex. In a regular algorithm like dense matrix multiplication, the amount of parallelism can usually be expressed as a function of the problem size, so it is reasonably straightforward to determine how many processors should be allocated to execute a regular algorithm of a certain size (this is called the processor allocation problem). In contrast, parallelism in irregular algorithms can be a function of input parameters, and the amount of parallelism can vary dramatically during the execution of the irregular algorithm. Therefore, the processor allocation problem for irregular algorithms is very difficult. In this paper, we describe the first systematic strategy for addressing this problem. Our approach is based on a construct called the conflict graph, which (i) provides insight into the amount of parallelism that can be extracted from an irregular algorithm, and (ii) can be used to address the processor allocation problem for irregular algorithms. We show that this problem is related to a generalization of the unfriendly seating problem and, by extending Tur\'an's theorem, we obtain a worst-case class of problems for optimistic parallelization, which we use to derive a lower bound on the exploitable parallelism. Finally, using some theoretically derived properties and some experimental facts, we design a quick and stable control strategy for solving the processor allocation problem heuristically.Comment: 12 pages, 3 figures, extended version of SPAA 2011 brief announcemen

arXiv.org e-Print Archive

Crossref

MapReduce and Streaming Algorithms for Diversity Maximization in Metric Spaces of Bounded Doubling Dimension

Author: Ceccarello Matteo
Pietracaprina Andrea
Pucci Geppino
Upfal Eli
Publication venue
Publication date: 01/01/2017
Field of study

Given a dataset of points in a metric space and an integer

k

, a diversity maximization problem requires determining a subset of

k

points maximizing some diversity objective measure, e.g., the minimum or the average distance between two points in the subset. Diversity maximization is computationally hard, hence only approximate solutions can be hoped for. Although its applications are mainly in massive data analysis, most of the past research on diversity maximization focused on the sequential setting. In this work we present space and pass/round-efficient diversity maximization algorithms for the Streaming and MapReduce models and analyze their approximation guarantees for the relevant class of metric spaces of bounded doubling dimension. Like other approaches in the literature, our algorithms rely on the determination of high-quality core-sets, i.e., (much) smaller subsets of the input which contain good approximations to the optimal solution for the whole input. For a variety of diversity objective functions, our algorithms attain an

(\alpha+\epsilon)

-approximation ratio, for any constant

\epsilon>0

, where

\alpha

is the best approximation ratio achieved by a polynomial-time, linear-space sequential algorithm for the same diversity objective. This improves substantially over the approximation ratios attainable in Streaming and MapReduce by state-of-the-art algorithms for general metric spaces. We provide extensive experimental evidence of the effectiveness of our algorithms on both real world and synthetic datasets, scaling up to over a billion points.Comment: Extended version of http://www.vldb.org/pvldb/vol10/p469-ceccarello.pdf, PVLDB Volume 10, No. 5, January 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Padova

Independence in CLP Languages

Author: APT K.
BRUYNOOGHE EIRA
BUENO F.
BUENO F.
CABEZA D.
CONERY J. S.
DEGROOT D.
GARC ANDA
HARIDI S.
HERMENEGILDO M.
HERMENEGILDO M.
HERMENEGILDO M.
JAFFAR J.
JAFFAR J.
JANSON S.
KALE L. V.
KELLY A.
Kim Marriott
Manuel Hermenegildo
MARRIOTT K.
MARRIOTT K.
María García de la Banda
PATERSON M. S.
PEREIRA L. M.
PUEBLA G.
UEDA K.
WARREN D.
WARREN R.
WINSBOROUGH W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2000
Field of study

Studying independence of goals has proven very useful in the context of logic programming. In particular, it has provided a formal basis for powerful automatic parallelization tools, since independence ensures that two goals may be evaluated in parallel while preserving correctness and eciency. We extend the concept of independence to constraint logic programs (CLP) and prove that it also ensures the correctness and eciency of the parallel evaluation of independent goals. Independence for CLP languages is more complex than for logic programming as search space preservation is necessary but no longer sucient for ensuring correctness and eciency. Two additional issues arise. The rst is that the cost of constraint solving may depend upon the order constraints are encountered. The second is the need to handle dynamic scheduling. We clarify these issues by proposing various types of search independence and constraint solver independence, and show how they can be combined to allow dierent optimizations, from parallelism to intelligent backtracking. Sucient conditions for independence which can be evaluated \a priori" at run-time are also proposed. Our study also yields new insights into independence in logic programming languages. In particular, we show that search space preservation is not only a sucient but also a necessary condition for ensuring correctness and eciency of parallel execution

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM