3,654 research outputs found
Hyper-Scalable JSQ with Sparse Feedback
Load balancing algorithms play a vital role in enhancing performance in data
centers and cloud networks. Due to the massive size of these systems,
scalability challenges, and especially the communication overhead associated
with load balancing mechanisms, have emerged as major concerns. Motivated by
these issues, we introduce and analyze a novel class of load balancing schemes
where the various servers provide occasional queue updates to guide the load
assignment.
We show that the proposed schemes strongly outperform JSQ() strategies
with comparable communication overhead per job, and can achieve a vanishing
waiting time in the many-server limit with just one message per job, just like
the popular JIQ scheme. The proposed schemes are particularly geared however
towards the sparse feedback regime with less than one message per job, where
they outperform corresponding sparsified JIQ versions.
We investigate fluid limits for synchronous updates as well as asynchronous
exponential update intervals. The fixed point of the fluid limit is identified
in the latter case, and used to derive the queue length distribution. We also
demonstrate that in the ultra-low feedback regime the mean stationary waiting
time tends to a constant in the synchronous case, but grows without bound in
the asynchronous case
SAFA : a semi-asynchronous protocol for fast federated learning with low overhead
Federated learning (FL) has attracted increasing attention as a promising approach to driving a vast number of end devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of FL considering the unreliable nature of end devices while the cost of device-server communication cannot be neglected. In this paper, we propose SAFA, a semi-asynchronous FL protocol, to address the problems in federated learning such as low round efficiency and poor convergence rate in extreme conditions (e.g., clients dropping offline frequently). We introduce novel designs in the steps of model distribution, client selection and global aggregation to mitigate the impacts of stragglers, crashes and model staleness in order to boost efficiency and improve the quality of the global model. We have conducted extensive experiments with typical machine learning tasks. The results demonstrate that the proposed protocol is effective in terms of shortening federated round duration, reducing local resource wastage, and improving the accuracy of the global model at an acceptable communication cost
Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks
Training and deploying deep learning models in real-world applications
require processing large amounts of data. This is a challenging task when the
amount of data grows to a hundred terabytes, or even, petabyte-scale. We
introduce a hybrid distributed cloud framework with a unified view to multiple
clouds and an on-premise infrastructure for processing tasks using both CPU and
GPU compute instances at scale. The system implements a distributed file system
and failure-tolerant task processing scheduler, independent of the language and
Deep Learning framework used. It allows to utilize unstable cheap resources on
the cloud to significantly reduce costs. We demonstrate the scalability of the
framework on running pre-processing, distributed training, hyperparameter
search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU
instances with the overall processing power of 30 petaflops
Asynchronously Trained Distributed Topographic Maps
Topographic feature maps are low dimensional representations of data, that
preserve spatial dependencies. Current methods of training such maps (e.g. self
organizing maps - SOM, generative topographic maps) require centralized control
and synchronous execution, which restricts scalability. We present an algorithm
that uses autonomous units to generate a feature map by distributed
asynchronous training. Unit autonomy is achieved by sparse interaction in time
\& space through the combination of a distributed heuristic search, and a
cascade-driven weight updating scheme governed by two rules: a unit i) adapts
when it receives either a sample, or the weight vector of a neighbor, and ii)
broadcasts its weight vector to its neighbors after adapting for a predefined
number of times. Thus, a vector update can trigger an avalanche of adaptation.
We map avalanching to a statistical mechanics model, which allows us to
parametrize the statistical properties of cascading. Using MNIST, we
empirically investigate the effect of the heuristic search accuracy and the
cascade parameters on map quality. We also provide empirical evidence that
algorithm complexity scales at most linearly with system size . The proposed
approach is found to perform comparably with similar methods in classification
tasks across multiple datasets.Comment: 11 Pages, 8 Figures
Automated design of boolean satisfiability solvers employing evolutionary computation
Modern society gives rise to complex problems which sometimes lend themselves to being transformed into Boolean satisfiability (SAT) decision problems; this thesis presents an example from the program understanding domain. Current conflict-driven clause learning (CDCL) SAT solvers employ all-purpose heuristics for making decisions when finding truth assignments for arbitrary logical expressions called SAT instances. The instances derived from a particular problem class exhibit a unique underlying structure which impacts a solver\u27s effectiveness. Thus, tailoring the solver heuristics to a particular problem class can significantly enhance the solver\u27s performance; however, manual specialization is very labor intensive. Automated development may apply hyper-heuristics to search program space by utilizing problem-derived building blocks. This thesis demonstrates the potential for genetic programming (GP) powered hyper-heuristic driven automated design of algorithms to create tailored CDCL solvers, in this case through custom variable scoring and learnt clause scoring heuristics, with significantly better performance on targeted classes of SAT problem instances. As the run-time of GP is often dominated by fitness evaluation, evaluating multiple offspring in parallel typically reduces the time incurred by fitness evaluation proportional to the number of parallel processing units. The naive synchronous approach requires an entire generation to be evaluated before progressing to the next generation; as such, heterogeneity in the evaluation times will degrade the performance gain, as parallel processing units will have to idle until the longest evaluation has completed. This thesis shows empirical evidence justifying the employment of an asynchronous parallel model for GP powered hyper-heuristics applied to SAT solver space, rather than the generational synchronous alternative, for gaining speed-ups in evolution time. Additionally, this thesis explores the use of a multi-objective GP to reveal the trade-off surface between multiple CDCL attributes --Abstract, page iii
- …