Search CORE

15 research outputs found

Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center

Author: Chu Dianhui
Li Xiaoping
Ruiz García Rubén
Wang Jia
Xu Hanchuan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

[EN] Total profit is one of the most important factors to be considered from the perspective of resource providers. In this paper, an original MapReduce workflow scheduling with deadline and data locality is proposed to maximize total profit of resource providers. A new workflow conversion based on dynamic programming and ChainMap/ChainReduce is designed to decrease transmission times among MapReduce jobs of workflows. A new deadline division considering execution time, float time and job level is proposed to obtain better deadlines of MapReduce jobs in workflows. With the adapted replica strategy in MapReduce workflow, a new task scheduling is proposed to improve data locality which assigns tasks to servers with the earliest completion time in order to ensure resource providers obtain more profit. Experimental results show that the proposed heuristic results in larger total profit than other adopted algorithms.This work is supported by the National Key Research and Development Program of China (No. 2017YFB1400801), the National Natural Science Foundation of China (Nos. 61872077, 61832004) and Collaborative Innovation Center of Wireless Communications Technology. Rubén Ruiz is partly supported by the Spanish Ministry of Science, Innovation, and Universities, under the project ¿OPTEP-Port Terminal Operations Optimization¿ (No. RTI2018-094940-B-I00) financed with FEDER funds¿.Wang, J.; Li, X.; Ruiz García, R.; Xu, H.; Chu, D. (2020). Allocating MapReduce workflows with deadlines to heterogeneous servers in a cloud data center. Service Oriented Computing and Applications. 14(2):101-118. https://doi.org/10.1007/s11761-020-00290-1S101118142Zaharia M, Chowdhury M, Franklin M et al (2010) Spark: cluster computing with working sets. In: Usenix conference on hot topics in cloud computing, pp 1765–1773Li L, Ma Z, Liu L et al (2013) Hadoop-based ARIMA algorithm and its application in weather forecast. Int J Database Theory Appl 6(5):119–132Xun Y, Zhang J, Qin X (2017) FiDoop: parallel mining of frequent itemsets using MapReduce. IEEE Trans Syst Man Cybern Syst 46(3):313–325Wang Y, Shi W (2014) Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds. IEEE Trans Cloud Comput 2(3):306–319Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–49Bu Y, Howe B, Balazinska M et al (2012) The HaLoop approach to large-scale iterative data analysis. VLDB J 21(2):169–190Gunarathne T, Zhang B, Wu T et al (2013) Scalable parallel computing on clouds using Twister4Azure iterative MapReduce. Future Gener Comput Syst 29(4):1035–1048Zhang Y, Gao Q, Gao L et al (2012) iMapReduce: a distributed computing framework for iterative computation. J Grid Comput 10(1):47–68Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: International conference on parallel and distributed systems, pp 9–16Tang Z, Zhou J, Li K et al (2013) A MapReduce task scheduling algorithm for deadline constraints. Clust Comput 16(4):651–662Zhang W, Rajasekaran S, Wood T et al (2014) MIMP: deadline and interference aware scheduling of Hadoop virtual machines. In: International symposium on cluster, cloud and grid computing, pp 394–403Teng F, Magoulès F, Yu L et al (2014) A novel real-time scheduling algorithm and performance analysis of a MapReduce-based cloud. J Supercomput 69(2):739–765Palanisamy B, Singh A, Liu L (2015) Cost-effective resource provisioning for MapReduce in a cloud. IEEE Trans Parallel Distrib Syst 26(5):1265–1279Hashem I, Anuar N, Marjani M et al (2018) Multi-objective scheduling of MapReduce jobs in big data processing. Multimed Tools Appl 77(8):9979–9994Xu X, Tang M, Tian Y (2017) QoS-guaranteed resource provisioning for cloud-based MapReduce in dynamical environments. Future Gener Comput Syst 78(1):18–30Li H, Wei X, Fu Q et al (2014) MapReduce delay scheduling with deadline constraint. Concurr Comput Pract Exp 26(3):766–778Polo J, Becerra Y, Carrera D et al (2013) Deadline-based MapReduce workload management. IEEE Trans Netw Serv Manag 10(2):231–244Chen C, Lin J, Kuo S (2018) MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. IEEE Trans Cloud Comput 6(1):127–140Kao Y, Chen Y (2016) Data-locality-aware MapReduce real-time scheduling framework. J Syst Softw 112:65–77Bok K, Hwang J, Lim J et al (2017) An efficient MapReduce scheduling scheme for processing large multimedia data. Multimed Tools Appl 76(16):1–24Chen Y, Borthakur D, Borthakur D et al (2012) Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: ACM european conference on computer systems, pp 43–56Mashayekhy L, Nejad M, Grosu D et al (2015) Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733Lei H, Zhang T, Liu Y et al (2015) SGEESS: smart green energy-efficient scheduling strategy with dynamic electricity price for data center. J Syst Softw 108:23–38Oliveira D, Ocana K, Baiao F et al (2012) A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds. J Grid Comput 10(3):521–552Li S, Hu S, Abdelzaher T (2015) The packing server for real-time scheduling of MapReduce workflows. In: IEEE real-time and embedded technology and applications symposium, pp 51–62Cai Z, Li X, Ruiz R et al (2017) A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds. Future Gener Comput Syst 71:57–72Cai Z, Li X, Ruiz R (2017) Resource provisioning for task-batch based workflows with deadlines in public clouds. IEEE Trans Cloud Comput. https://doi.org/10.1109/TCC.2017.2663426Cai Z, Li X, Gupta J (2016) Heuristics for provisioning services to workflows in XaaS clouds. IEEE Trans Serv Comput 9(2):250–263Li X, Cai Z (2017) Elastic resource provisioning for cloud workflow applications. IEEE Trans Autom Sci Eng 14(2):1195–1210Tang Z, Liu M, Ammar A et al (2014) An optimized MapReduce workflow scheduling algorithm for heterogeneous computing. J Supercomput 72(6):1–21Xu C, Yang J, Yin K et al (2017) Optimal construction of virtual networks for cloud-based MapReduce workflows. Comput Netw 112:194–207Chiara S, Danilo A, Gianpaolo C et al (2013) Optimizing service selection and allocation in situational computing applications. IEEE Trans Serv Comput 6(3):414–428Baresi L, Elisabetta D, Carlo G et al (2007) A framework for the deployment of adaptable web service compositions. Serv Oriented Comput Appl 1(1):75–91Lim H, Herodotou H, Babu S (2012) Stubby: a transformation-based optimizer for MapReduce workflows. VLDB Endow 5(11):1196–1207Ke H, Li P, Guo S et al (2016) On traffic-aware partition and aggregation in MapReduce for big data applications. IEEE Trans Parallel Distrib Syst 27(3):818–828Yu W, Wang Y, Que X et al (2015) Virtual shuffling for efficient data movement in MapReduce. IEEE Trans Comput 64(2):556–568Chowdhury M, Zaharia M, Ma J et al (2011) Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Comput Commun 41(4):98–109Guo D, Xie J, Zhou X et al (2015) Exploiting efficient and scalable shuffle transfers in future data center network. IEEE Trans Parallel Distrib Syst 26(4):997–1009Li D, Yu Y, He W et al (2015) Willow: saving data center network energy for network-limited flows. IEEE Trans Parallel Distrib Syst 26(9):2610–2620Tan J, Meng X, Zhang L (2013) Coupling task progress for MapReduce resource-aware scheduling. In: IEEE INFOCOM, pp 1618–1626Hammoud M, Rehman M, Sakr M (2012) Center-of-gravity reduce task scheduling to lower MapReduce network traffic. In: International conference on cloud computing, pp 49–58Guo Z, Fox G, Zhou M et al (2012) Improving resource utilization in MapReduce. In: International conference on cluster computing, pp 402–410Fischer M, Su X, Yin Y (2010) Assigning tasks for efficiency in Hadoop. In: Proceedings of the 22nd ACM symposium on parallelism in algorithms and architectures, pp 30–39Zhu Y, Jiang Y, Wu W et al (2014) Minimizing makespan and total completion time in MapReduce-like systems. In: IEEE INFOCOM, pp 2166–2174Kavulya S, Tan J, Gandhi R et al (2010) An analysis of traces from a production MapReduce cluster. In: IEEE/ACM international conference on cluster, cloud and grid computing, pp 94–103Abrishami S, Naghibzadeh M, Epema D (2013) Deadline-constrained workflow scheduling algorithms for Infrastructure as a Service clouds. Future Gener Comput Syst 29(1):158–169Fernando B, Edmundo R (2010) Towards the scheduling of multiple workflows on computational grids. J Grid Comput 8(3):419–441Tiwari N, Sarkar S, Bellur U et al (2015) Classification framework of MapReduce scheduling algorithms. ACM Comput Surv 47(3):1–38Verma A, Cherkasova L, Campbell R (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Dependable Secur Comput 10(5):314–327Heintz B, Chandra A, Sitaraman R et al (2017) End-to-end optimization for geo-distributed MapReduce. IEEE Trans Cloud Comput 4(3):293–306Chen L, Li X (2018) Cloud workflow scheduling with hybrid resource provisioning. J Supercomput 74(12):6529–6553Li X, Jiang T, Ruiz R (2016) Heuristics for periodical batch job scheduling in a MapReduce computing framework. Inf Sci 326:119–133Vanhoucheabcd M, Maenhout B, Tavares L (2008) An evaluation of the adequacy of project network generators with systematically sampled networks. Eur J Oper Res 187(2):511–52

RiuNet

Data Set Construction Method for Intelligent Health Care and Its Application

Author: ZHANG Linyu TU Zhiying, HANG Shaoshi, ZHANG Bolin, CHU Dianhui
Publication venue: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Publication date: 01/07/2022
Field of study

The rapid development of Internet and computer technology makes it possible to improve smart health care services in today’s aging population. However, there are some data problems that seriously restrict the process of intelligence in the field of elderly care, such as the lack of real data, the interference of dirty data, and too few standard samples. To solve the problem of lacking data set, this paper proposes a three-stage data set construction method based on machine learning on the basis of small sample data which are collected from the community health care in a city. In the first stage, this paper designs a tree structure-based generation strategy to generate the basic attributes of the data set according to the distribution of the original data. In the second stage, this paper obtains the basic behavioral ability evaluation index of the samples with naive Bayesian algorithm. In the third stage, this paper constructs a variety of multiple linear regression equations to get high-order behavioral ability index and evaluation stage on the basis of the first two stages. In order to verify the effectiveness of the data set generated by the model for downstream tasks, this paper designs multiple rehabilitation training plan recommendation models based on the generated data with neural network, and achieves 5 multi-classification tasks and 2 multi-label classification tasks. This paper verifies the authenticity and validity of generated data through analysis of experimental results and expert knowledge

Directory of Open Access Journals

Finding service compositions in complex homecare service network

Author: Chunshan Li
Dianhui Chu
Qi Wang
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2019
Field of study

Crossref

Multiple Hidden Markov Model for Pathological Vessel Segmentation

Author: Deqiong Ding
Dianhui Chu
Xin Hu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

One of the obstacles that prevent the accurate delineation of vessel boundaries is the presence of pathologies, which results in obscure boundaries and vessel-like structures. Targeting this limitation, we present a novel segmentation method based on multiple Hidden Markov Models. This method works with a vessel axis + cross-section model, which constrains the classifier around the vessel. The vessel axis constraint gives our method the potential to be both physiologically accurate and computationally effective. Focusing on pathological vessels, we reap the benefits of the redundant information embedded in multiple vessel-specific features and the good statistical properties coming with Hidden Markov Model, to cover the widest possible spectrum of complex situations. The performance of our method is evaluated on synthetic complex-structured datasets, where we achieve a 91% high overlap ratio. We also validate the proposed method on a real challenging case, segmentation of pathological abdominal arteries. The performance of our method is promising, since our method yields better results than two state-of-the-art methods on both synthetic datasets and real clinical datasets

Directory of Open Access Journals

Diabetes index evaluation framework based on data mining technology: a genetic factor involved solution for predicting diabetes risk

Author: Dianhui Chu
Mingqiang Song
Yao Wang
Publication venue: 'Inderscience Publishers'
Publication date: 01/01/2019
Field of study

Crossref

A Method for Building Service Process Value Model Based on Process Mining

Author: Chen David
Chu Dianhui
Zacharewicz Grégory
Zhou Xuequan
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

International audienceWith the emergence and development of servitization, more and more enterprises are turning from product focus to service focus. Service is customer-oriented and driven by customer requirements. Value in the service is the goal pursued by all actors, including service providers, customers and other participants. After introducing the materials and methods of services and service system, process modeling, and service value networks, combined with domain knowledge, this paper proposes a service process value model based on process mining to describe the actors how the actors to can create value cooperatively in the service process. The model focuses on the value creation of the service process and the value generated by activities in the process. We distinguish service process from business process, and describe two methods to build service process value model by using process mining techniques. Considering that different actors have different perspectives on value, the dimension and granularity of value in service are defined. Then we describe the construction steps of the service process value model by one of the methods, and use a case study to demonstrate how to build the service process value model of telephone repair service, from the event log to business process model, and then to service process model, and finally get the service process value model. Moreover, we develop a new plug-in based on α-algorithm for ProM to realize realized the model construction in the case study

Hal-Diderot

MST-RNN: A Multi-Dimension Spatiotemporal Recurrent Neural Networks for Recommending the Next Point of Interest

Author: Chunshan Li
Dianhui Chu
Dongmei Li
Zhongya Zhang
Publication venue: 'MDPI AG'
Publication date: 27/05/2022
Field of study

With the increasing popularity of location-aware Internet-of-Vehicle services, the next-Point-of-Interest (POI) recommendation has gained significant research interest, predicting where drivers will go next from their sequential movements. Many researchers have focused on this problem and proposed solutions. Machine learning-based methods (matrix factorization, Markov chain, and factorizing personalized Markov chain) focus on a POI sequential transition. However, they do not recommend the user’s position for the next few hours. Neural network-based methods can model user mobility behavior by learning the representations of the sequence data in the high-dimensional space. However, they just consider the influence from the spatiotemporal dimension and ignore many important influences, such as duration time at a POI (Point of Interest) and the semantic tags of the POIs. In this paper, we propose a novel method called multi-dimension spatial–temporal recurrent neural networks (MST-RNN), which extends the ST-RNN and exploits the duration time dimension and semantic tag dimension of POIs in each layer of neural networks. Experiments on real-world vehicle movement data show that the proposed MST-RNN is effective and clearly outperforms the state-of-the-art methods

Multidisciplinary Digital Publishing Institute

Axis-Guided Vessel Segmentation Using a Self-Constructing Cascade-AdaBoost-SVM Classifier

Author: Deqiong Ding
Dianhui Chu
Xin Hu
Yuanzhi Cheng
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

One major limiting factor that prevents the accurate delineation of vessel boundaries has been the presence of blurred boundaries and vessel-like structures. Overcoming this limitation is exactly what we are concerned about in this paper. We describe a very different segmentation method based on a cascade-AdaBoost-SVM classifier. This classifier works with a vessel axis + cross-section model, which constrains the classifier around the vessel. This has the potential to be both physiologically accurate and computationally effective. To further increase the segmentation accuracy, we organize the AdaBoost classifiers and the Support Vector Machine (SVM) classifiers in a cascade way. And we substitute the AdaBoost classifier with the SVM classifier under special circumstances to overcome the overfitting issue of the AdaBoost classifier. The performance of our method is evaluated on synthetic complex-structured datasets, where we obtain high overlap ratios, around 91%. We also validate the proposed method on one challenging case, segmentation of carotid arteries over real clinical datasets. The performance of our method is promising, since our method yields better results than two state-of-the-art methods on both synthetic datasets and real clinical datasets

Directory of Open Access Journals