3,654 research outputs found

    Hyper-Scalable JSQ with Sparse Feedback

    Full text link
    Load balancing algorithms play a vital role in enhancing performance in data centers and cloud networks. Due to the massive size of these systems, scalability challenges, and especially the communication overhead associated with load balancing mechanisms, have emerged as major concerns. Motivated by these issues, we introduce and analyze a novel class of load balancing schemes where the various servers provide occasional queue updates to guide the load assignment. We show that the proposed schemes strongly outperform JSQ(dd) strategies with comparable communication overhead per job, and can achieve a vanishing waiting time in the many-server limit with just one message per job, just like the popular JIQ scheme. The proposed schemes are particularly geared however towards the sparse feedback regime with less than one message per job, where they outperform corresponding sparsified JIQ versions. We investigate fluid limits for synchronous updates as well as asynchronous exponential update intervals. The fixed point of the fluid limit is identified in the latter case, and used to derive the queue length distribution. We also demonstrate that in the ultra-low feedback regime the mean stationary waiting time tends to a constant in the synchronous case, but grows without bound in the asynchronous case

    SAFA : a semi-asynchronous protocol for fast federated learning with low overhead

    Get PDF
    Federated learning (FL) has attracted increasing attention as a promising approach to driving a vast number of end devices with artificial intelligence. However, it is very challenging to guarantee the efficiency of FL considering the unreliable nature of end devices while the cost of device-server communication cannot be neglected. In this paper, we propose SAFA, a semi-asynchronous FL protocol, to address the problems in federated learning such as low round efficiency and poor convergence rate in extreme conditions (e.g., clients dropping offline frequently). We introduce novel designs in the steps of model distribution, client selection and global aggregation to mitigate the impacts of stragglers, crashes and model staleness in order to boost efficiency and improve the quality of the global model. We have conducted extensive experiments with typical machine learning tasks. The results demonstrate that the proposed protocol is effective in terms of shortening federated round duration, reducing local resource wastage, and improving the accuracy of the global model at an acceptable communication cost

    Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

    Full text link
    Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid distributed cloud framework with a unified view to multiple clouds and an on-premise infrastructure for processing tasks using both CPU and GPU compute instances at scale. The system implements a distributed file system and failure-tolerant task processing scheduler, independent of the language and Deep Learning framework used. It allows to utilize unstable cheap resources on the cloud to significantly reduce costs. We demonstrate the scalability of the framework on running pre-processing, distributed training, hyperparameter search and large-scale inference tasks utilizing 10,000 CPU cores and 300 GPU instances with the overall processing power of 30 petaflops

    Asynchronously Trained Distributed Topographic Maps

    Full text link
    Topographic feature maps are low dimensional representations of data, that preserve spatial dependencies. Current methods of training such maps (e.g. self organizing maps - SOM, generative topographic maps) require centralized control and synchronous execution, which restricts scalability. We present an algorithm that uses NN autonomous units to generate a feature map by distributed asynchronous training. Unit autonomy is achieved by sparse interaction in time \& space through the combination of a distributed heuristic search, and a cascade-driven weight updating scheme governed by two rules: a unit i) adapts when it receives either a sample, or the weight vector of a neighbor, and ii) broadcasts its weight vector to its neighbors after adapting for a predefined number of times. Thus, a vector update can trigger an avalanche of adaptation. We map avalanching to a statistical mechanics model, which allows us to parametrize the statistical properties of cascading. Using MNIST, we empirically investigate the effect of the heuristic search accuracy and the cascade parameters on map quality. We also provide empirical evidence that algorithm complexity scales at most linearly with system size NN. The proposed approach is found to perform comparably with similar methods in classification tasks across multiple datasets.Comment: 11 Pages, 8 Figures

    Automated design of boolean satisfiability solvers employing evolutionary computation

    Get PDF
    Modern society gives rise to complex problems which sometimes lend themselves to being transformed into Boolean satisfiability (SAT) decision problems; this thesis presents an example from the program understanding domain. Current conflict-driven clause learning (CDCL) SAT solvers employ all-purpose heuristics for making decisions when finding truth assignments for arbitrary logical expressions called SAT instances. The instances derived from a particular problem class exhibit a unique underlying structure which impacts a solver\u27s effectiveness. Thus, tailoring the solver heuristics to a particular problem class can significantly enhance the solver\u27s performance; however, manual specialization is very labor intensive. Automated development may apply hyper-heuristics to search program space by utilizing problem-derived building blocks. This thesis demonstrates the potential for genetic programming (GP) powered hyper-heuristic driven automated design of algorithms to create tailored CDCL solvers, in this case through custom variable scoring and learnt clause scoring heuristics, with significantly better performance on targeted classes of SAT problem instances. As the run-time of GP is often dominated by fitness evaluation, evaluating multiple offspring in parallel typically reduces the time incurred by fitness evaluation proportional to the number of parallel processing units. The naive synchronous approach requires an entire generation to be evaluated before progressing to the next generation; as such, heterogeneity in the evaluation times will degrade the performance gain, as parallel processing units will have to idle until the longest evaluation has completed. This thesis shows empirical evidence justifying the employment of an asynchronous parallel model for GP powered hyper-heuristics applied to SAT solver space, rather than the generational synchronous alternative, for gaining speed-ups in evolution time. Additionally, this thesis explores the use of a multi-objective GP to reveal the trade-off surface between multiple CDCL attributes --Abstract, page iii
    • …
    corecore