Search CORE

14,484 research outputs found

Efficient Online Scheduling in Distributed Stream Data Processing Systems

Author: Li Teng
Publication venue: SURFACE at Syracuse University
Publication date: 13/05/2018
Field of study

General-purpose Distributed Stream Data Processing Systems (DSDPSs) have attracted extensive attention from industry and academia in recent years. They are capable of processing unbounded big streams of continuous data in a distributed and real (or near-real) time manner. A fundamental problem in a DSDPS is the scheduling problem, i.e., assigning threads (carrying workload) to workers/machines with the objective of minimizing average end-to-end tuple processing time (or simply tuple processing time). A widely-used solution is to distribute workload over machines in the cluster in a round-robin manner, which is obviously not efficient due to the lack of consideration for communication delay among processes/machines. A scheduling solution makes a significant impact on the average tuple processing time. However, their relationship is very subtle and complicated. It does not even seem possible to have a mathematical programming formulation for the scheduling problem if its objective is to directly minimize the average tuple processing time. In this dissertation, we first propose a model-based approach that accurately models the correlation between a scheduling solution and its objective value (i.e. average tuple processing time) for a given scheduling solution according to the topology of the application graph and runtime statistics. A predictive scheduling algorithm is then presented, which as- signs tasks (threads) to machines under the guidance of the proposed model. This approach achieves an average of 24.9% improvement over Storm’s default scheduler. However, the model-based approach still has its limitations: the model may not be able to fully capture the features of a DSDPS; prediction may not be accurate enough; and a large amount of high-dimensional data may lead to high overhead. To address the limitations, we develop a model-free approach that can learn to control a DSDPS from its experience rather than adopting accurate and mathematically solvable system models, just as a human learns a skill (such as cooking, driving, swimming, etc.). Recent breakthrough of Deep Reinforcement Learning (DRL) provides a promising approach for enabling effective model-free control. The proposed DRL-based model-free approach minimizes the average end-to-end tuple processing time by jointly learning the system environment via collecting very limited runtime statistics and making decisions under the guidance of powerful Deep Neural Networks (DNNs). This approach achieves great performance improvement over the current practice and the state-of-the-art model-based approach. Moreover, there is still room for improvement for the above model-free approach: For the above model-free approach and most existing methods, a user specifies the number of threads for an application in advance without knowing much about runtime needs, which, however, remains unchanged during runtime. This could severely affect the performance of a DSDPS. Therefore, we further develop another model-free approach using DRL, EXTRA, which enables the dynamic use of a variable number of threads at runtime. It has been shown by extensive experimental results, by adding this new feature, EXTRA can achieve further performance improvement and greater flexibility on scheduling

Syracuse University Research Facility and Collaborative Environment

Porting Decision Tree Algorithms to Multicore using FastFlow

Author: A.C. Sodan
I. Park
J.E. Gehrke
J.R. Quinlan
K. Asanovic
M. Aldinucci
M. Cole
M. Coppola
M. Joshi
M. Vanneschi
M. Zaki
M.K. Sreenivas
R. Jin
R.D. Blumofe
S. Ruggieri
S. Ruggieri
T. Lim
W. Thies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

UnipiEprints

Accelerating MCMC via Parallel Predictive Prefetching

Author: Adams Ryan P.
Angelino Elaine
Kohler Eddie
Seltzer Margo
Waterland Amos
Publication venue
Publication date: 27/03/2014
Field of study

We present a general framework for accelerating a large class of widely used Markov chain Monte Carlo (MCMC) algorithms. Our approach exploits fast, iterative approximations to the target density to speculatively evaluate many potential future steps of the chain in parallel. The approach can accelerate computation of the target distribution of a Bayesian inference problem, without compromising exactness, by exploiting subsets of data. It takes advantage of whatever parallel resources are available, but produces results exactly equivalent to standard serial execution. In the initial burn-in phase of chain evaluation, it achieves speedup over serial evaluation that is close to linear in the number of available cores

arXiv.org e-Print Archive

CiteSeerX

On Designing Multicore-aware Simulators for Biological Systems

Author: Aldinucci Marco
Coppo Mario
Damiani Ferruccio
Drocco Maurizio
Torquati Massimo
Troina Angelo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/10/2010
Field of study

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio della Ricerca - Università di Pisa

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Institutional Research Information System University of Turin