10 research outputs found
Online Job Scheduling in Distributed Machine Learning Clusters
Nowadays large-scale distributed machine learning systems have been deployed
to support various analytics and intelligence services in IT firms. To train a
large dataset and derive the prediction/inference model, e.g., a deep neural
network, multiple workers are run in parallel to train partitions of the input
dataset, and update shared model parameters. In a shared cluster handling
multiple training jobs, a fundamental issue is how to efficiently schedule jobs
and set the number of concurrent workers to run for each job, such that server
resources are maximally utilized and model training can be completed in time.
Targeting a distributed machine learning system using the parameter server
framework, we design an online algorithm for scheduling the arriving jobs and
deciding the adjusted numbers of concurrent workers and parameter servers for
each job over its course, to maximize overall utility of all jobs, contingent
on their completion times. Our online algorithm design utilizes a primal-dual
framework coupled with efficient dual subroutines, achieving good long-term
performance guarantees with polynomial time complexity. Practical effectiveness
of the online algorithm is evaluated using trace-driven simulation and testbed
experiments, which demonstrate its outperformance as compared to commonly
adopted scheduling algorithms in today's cloud systems
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
Deep Neural Networks (DNNs) have shown excellent performance in a wide range
of machine learning applications. Knowing the latency of running a DNN model or
tensor program on a specific device is useful in various tasks, such as DNN
graph- or tensor-level optimization and device selection. Considering the large
space of DNN models and devices that impede direct profiling of all
combinations, recent efforts focus on building a predictor to model the
performance of DNN models on different devices. However, none of the existing
attempts have achieved a cost model that can accurately predict the performance
of various tensor programs while supporting both training and inference
accelerators. We propose CDMPP, an efficient tensor program latency prediction
framework for both cross-model and cross-device prediction. We design an
informative but efficient representation of tensor programs, called compact
ASTs, and a pre-order-based positional encoding method, to capture the internal
structure of tensor programs. We develop a domain-adaption-inspired method to
learn domain-invariant representations and devise a KMeans-based sampling
algorithm, for the predictor to learn from different domains (i.e., different
DNN operators and devices). Our extensive experiments on a diverse range of DNN
models and devices demonstrate that CDMPP significantly outperforms
state-of-the-art baselines with 14.03% and 10.85% prediction error for
cross-model and cross-device prediction, respectively, and one order of
magnitude higher training efficiency. The implementation and the expanded
dataset are available at https://github.com/joapolarbear/cdmpp.Comment: Accepted by EuroSys 202
Optimization of ultrasound-assisted extraction by response surface methodology, antioxidant capacity, and tyrosinase inhibitory activity of anthocyanins from red rice bran
The anthocyanins contents from red rice bran were characterized by HPLC/MS. Response surface methodology was used to optimize the ultrasound-assisted extraction of red rice bran anthocyanin. The antioxidant activities were evaluated in terms of IC50. The tyrosinase inhibitory activities of the anthocyanin samples from red rice bran and the standard substances were determined by a spectrophotometric method. According to mass spectrometry information, the main component of anthocyanins is paeoniflorin (m/z = 480). The optimized anthocyanin level was 5.80 mg/g under the following conditions: solid–liquid ratio of 1:17.46; ethanol concentration of 78.37%; ultrasonication time of 55.23 min; and pH of 2.31. The IC50 value of the DPPH radical scavenging and the superoxide anion scavenging activities of the sample were 53.51 and 2,375 μg/ml; those of the standard were 14.60 and 64.74 μg/ml; and those of vitamin C were 24.45 and 136.25 μg/ml, respectively. The IC50 values of the tyrosinase inhibition activities of the sample and Vc were 4.26 and 2.18 μg/ml, respectively. There is a significant difference (p < .05) between the activities of the three, which may be caused by the purity of the extract. Red rice bran anthocyanins have valuable research and development prospects as skin whiteners and healthcare products
Genome-wide association analyses in Han Chinese identify two new susceptibility loci for amyotrophic lateral sclerosis
10.1038/ng.2627Nature Genetics456697-700NGEN