Search CORE

10 research outputs found

Online Job Scheduling in Distributed Machine Learning Clusters

Author: Bao Yixin
Li Zongpeng
Peng Yanghua
Wu Chuan
Publication venue
Publication date: 03/01/2018
Field of study

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural network, multiple workers are run in parallel to train partitions of the input dataset, and update shared model parameters. In a shared cluster handling multiple training jobs, a fundamental issue is how to efficiently schedule jobs and set the number of concurrent workers to run for each job, such that server resources are maximally utilized and model training can be completed in time. Targeting a distributed machine learning system using the parameter server framework, we design an online algorithm for scheduling the arriving jobs and deciding the adjusted numbers of concurrent workers and parameter servers for each job over its course, to maximize overall utility of all jobs, contingent on their completion times. Our online algorithm design utilizes a primal-dual framework coupled with efficient dual subroutines, achieving good long-term performance guarantees with polynomial time complexity. Practical effectiveness of the online algorithm is evaluated using trace-driven simulation and testbed experiments, which demonstrate its outperformance as compared to commonly adopted scheduling algorithms in today's cloud systems

arXiv.org e-Print Archive

Crossref

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Author: Hu Hanpeng
Lin Haibin
Peng Yanghua
Su Junwei
Wu Chuan
Zhao Juntao
Zhu Yibo
Publication venue
Publication date: 17/11/2023
Field of study

Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph- or tensor-level optimization and device selection. Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices. However, none of the existing attempts have achieved a cost model that can accurately predict the performance of various tensor programs while supporting both training and inference accelerators. We propose CDMPP, an efficient tensor program latency prediction framework for both cross-model and cross-device prediction. We design an informative but efficient representation of tensor programs, called compact ASTs, and a pre-order-based positional encoding method, to capture the internal structure of tensor programs. We develop a domain-adaption-inspired method to learn domain-invariant representations and devise a KMeans-based sampling algorithm, for the predictor to learn from different domains (i.e., different DNN operators and devices). Our extensive experiments on a diverse range of DNN models and devices demonstrate that CDMPP significantly outperforms state-of-the-art baselines with 14.03% and 10.85% prediction error for cross-model and cross-device prediction, respectively, and one order of magnitude higher training efficiency. The implementation and the expanded dataset are available at https://github.com/joapolarbear/cdmpp.Comment: Accepted by EuroSys 202

arXiv.org e-Print Archive

Optimization of ultrasound-assisted extraction by response surface methodology, antioxidant capacity, and tyrosinase inhibitory activity of anthocyanins from red rice bran

Author: Shi Longlong
Wang Yujie
Xue Peng
Yang Xiushi
yanghua sun
Zhang Ruoyu
Zhao Lei
Publication venue: 'Wiley'
Publication date: 01/01/2020
Field of study

The anthocyanins contents from red rice bran were characterized by HPLC/MS. Response surface methodology was used to optimize the ultrasound-assisted extraction of red rice bran anthocyanin. The antioxidant activities were evaluated in terms of IC50. The tyrosinase inhibitory activities of the anthocyanin samples from red rice bran and the standard substances were determined by a spectrophotometric method. According to mass spectrometry information, the main component of anthocyanins is paeoniflorin (m/z = 480). The optimized anthocyanin level was 5.80 mg/g under the following conditions: solid–liquid ratio of 1:17.46; ethanol concentration of 78.37%; ultrasonication time of 55.23 min; and pH of 2.31. The IC50 value of the DPPH radical scavenging and the superoxide anion scavenging activities of the sample were 53.51 and 2,375 μg/ml; those of the standard were 14.60 and 64.74 μg/ml; and those of vitamin C were 24.45 and 136.25 μg/ml, respectively. The IC50 values of the tyrosinase inhibition activities of the sample and Vc were 4.26 and 2.18 μg/ml, respectively. There is a significant difference (p < .05) between the activities of the three, which may be caused by the purity of the extract. Red rice bran anthocyanins have valuable research and development prospects as skin whiteners and healthcare products

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)