Search CORE

476 research outputs found

Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

Author: Luo Yuan
Publication venue: [Bloomington, Ind.] : Indiana University
Publication date: 01/08/2015
Field of study

Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Scenarios exist in the era of Big Data where computational analysis needs to utilize widely distributed and remote compute clusters, especially when the data sources are sensitive or extremely large, and thus unable to move. A large dataset in Malaysia could be ecologically sensitive, for instance, and unable to be moved outside the country boundaries. Controlling an analysis experiment in this virtual cluster setting can be difficult on multiple levels: with setup and control, with managing behavior of the virtual cluster, and with interoperability issues across the compute clusters. Further, datasets can be distributed among clusters, or even across data centers, so that it becomes critical to utilize data locality information to optimize the performance of data-intensive jobs. Finally, datasets are increasingly sensitive and tied to certain administrative boundaries, though once the data has been processed, the aggregated or statistical result can be shared across the boundaries. This dissertation addresses management and control of a widely distributed virtual cluster having sensitive or otherwise immovable data sets through a controller. The Virtual Cluster Controller (VCC) gives control back to the researcher. It creates virtual clusters across multiple cloud platforms. In recognition of sensitive data, it can establish a single network overlay over widely distributed clusters. We define a novel class of data, notably immovable data that we call "pinned data", where the data is treated as a first-class citizen instead of being moved to where needed. We draw from our earlier work with a hierarchical data processing model, Hierarchical MapReduce (HMR), to process geographically distributed data, some of which are pinned data. The applications implemented in HMR use extended MapReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Further, by facilitating information sharing among resources, applications, and data, the overall performance is improved. Experimental results show that the overhead of VCC is minimum. The HMR outperforms traditional MapReduce model while processing a particular class of applications. The evaluations also show that information sharing between resources and application through the VCC shortens the hierarchical data processing time, as well satisfying the constraints on the pinned data

IUScholarWorks (University of Indiana)

Energy-Efficient Flow Scheduling and Routing with Hard Deadlines in Data Center Networks

Author: Liu Zhiyong
Ren Shaolei
Vasilakos Athanasios V.
Wang Lin
Zhang Fa
Zheng Kai
Publication venue
Publication date: 29/05/2014
Field of study

The power consumption of enormous network devices in data centers has emerged as a big concern to data center operators. Despite many traffic-engineering-based solutions, very little attention has been paid on performance-guaranteed energy saving schemes. In this paper, we propose a novel energy-saving model for data center networks by scheduling and routing "deadline-constrained flows" where the transmission of every flow has to be accomplished before a rigorous deadline, being the most critical requirement in production data center networks. Based on speed scaling and power-down energy saving strategies for network devices, we aim to explore the most energy efficient way of scheduling and routing flows on the network, as well as determining the transmission speed for every flow. We consider two general versions of the problem. For the version of only flow scheduling where routes of flows are pre-given, we show that it can be solved polynomially and we develop an optimal combinatorial algorithm for it. For the version of joint flow scheduling and routing, we prove that it is strongly NP-hard and cannot have a Fully Polynomial-Time Approximation Scheme (FPTAS) unless P=NP. Based on a relaxation and randomized rounding technique, we provide an efficient approximation algorithm which can guarantee a provable performance ratio with respect to a polynomial of the total number of flows.Comment: 11 pages, accepted by ICDCS'1

arXiv.org e-Print Archive

Crossref

Distributed evolutionary algorithms and their models: A survey of the state-of-the-art

Author: Alba
Alba
Alba
Alba
Alba
Anglano
Apolloni
Bai
Bollini
Bouvry
Branke
Burczynski
Burczyński
Cahon
Cahon
Cantu-Paz
Cantu-Paz
Cantú-Paz
Chatzimilioudis
Chen
Creput
Danoy
Davis
de Toro Negro
Dean
Deb
Decraene
Desell
Dorronsoro
Du
Dubreuil
Durillo
Durillo
Durillo
Durillo
Durillo
Epitropakis
Escuela
Ewald
Fok
Folino
Folino
Gagné
Garcia-Arenas
García-Arenas
Giacobini
Giacobini
Giacobini
Giacobini
Giacobini
Giacobini
Goh
Goldberg
Gong
Gonzalez
Herrera
Herrera
Hidalgo
Hidalgo
Hosseini
Iimura
Ishimizu
Ismail
Jin
Jing-Jing Li
Johar
Jun Zhang
Kattan
Kattan
Kirley
Kirley
Kwok
Laredo
Li
Liang
Lienig
Lim
Lim
Liu
Llora
Lorion
Manfrin
McNabb
Melab
Melab
Mendiburu
Merelo
Merelo-Guervos
Merelo-Guervós
Merelo-Guervós
Michel
Mostaghim
Mussi
Nebro
Nebro
Nesmachnow
Nojima
Ordeshook
Pedemonte
Pendharkar
Pierreval
Piriyakumar
Potter
Qingfu Zhang
Ray
Robilliard
Roy
Ruiz-Andino
Said
Said
Sarma
Schutte
Schönfisch
Scriven
Sefrioui
Seredynski
Seredynski
Seredynski
Sherry
Soca
Starzynski
Stützle
Su
Subbu
Subbu
Suganthan
Tagawa
Tan
Tan
Tan
Tan
Tan
Tasoulis
Tomassini
Tomassini
Umbarkar
Van Veldhuizen
Verma
Veronese
Vlachogiannis
Weber
Weber
Wei-Neng Chen
Whitley
Wickramasinghe
Wu
Xiong
Xu
Yang
Yu
Yu
Yue-Jiao Gong
Yun Li
Zhang
Zhang
Zhang
Zhao
Zhao
Zhi-Hui Zhan
Zhou
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 11/05/2015
Field of study

The increasing complexity of real-world optimization problems raises new challenges to evolutionary computation. Responding to these challenges, distributed evolutionary computation has received considerable attention over the past decade. This article provides a comprehensive survey of the state-of-the-art distributed evolutionary algorithms and models, which have been classified into two groups according to their task division mechanism. Population-distributed models are presented with master-slave, island, cellular, hierarchical, and pool architectures, which parallelize an evolution task at population, individual, or operation levels. Dimension-distributed models include coevolution and multi-agent models, which focus on dimension reduction. Insights into the models, such as synchronization, homogeneity, communication, topology, speedup, advantages and disadvantages are also presented and discussed. The study of these models helps guide future development of different and/or improved algorithms. Also highlighted are recent hotspots in this area, including the cloud and MapReduce-based implementations, GPU and CUDA-based implementations, distributed evolutionary multiobjective optimization, and real-world applications. Further, a number of future research directions have been discussed, with a conclusion that the development of distributed evolutionary computation will continue to flourish

University of Essex Research Repository

Crossref

Enlighten

An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

Author: Ardagna D.
Ciavotta M.
Gianniti E.
Malekimajd M.
Passacantando M.
Rizzi A. M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Nowadays, we live in a Big Data world and many sectors of our economy are guided by data-driven decision processes. Big Data and Business Intelligence applications are facilitated by the MapReduce programming model, while, at infrastructural layer, cloud computing provides flexible and cost-effective solutions to provide on-demand large clusters. Capacity allocation in such systems, meant as the problem of providing computational power to support concurrent MapReduce applications in a cost-effective fashion, represents a challenge of paramount importance. In this paper we lay the foundation for a solution implementing admission control and capacity allocation for MapReduce jobs with a priori deadline guarantees. In particular, shared Hadoop 2.x clusters supporting batch and/or interactive jobs are targeted. We formulate a linear programming model able to minimize cloud resources costs and rejection penalties for the execution of jobs belonging to multiple classes with deadline guarantees. Scalability analyses demonstrated that the proposed method is able to determine the global optimal solution of the linear problem for systems including up to 10,000 classes in less than 1 s

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio della Ricerca - Università di Pisa