66,968 research outputs found

    Improving Automatic Parallel Training via Balanced Memory Workload Optimization

    Full text link
    Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts to design distributed training plans or limit parallelism combinations to a constrained search space. In this paper, we present Galvatron-BMW, a novel system framework that integrates multiple prevalent parallelism dimensions and automatically identifies the most efficient hybrid parallelism strategy. To effectively navigate this vast search space, we employ a decision tree approach for decomposition and pruning based on intuitive insights. We further utilize a dynamic programming search algorithm to derive the optimal plan. Moreover, to improve resource utilization and enhance system efficiency, we propose a bi-objective optimization workflow that focuses on workload balance. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints. Across all tested scenarios, Galvatron-BMW consistently achieves superior system throughput, surpassing previous approaches that rely on limited parallelism strategies.Comment: arXiv admin note: substantial text overlap with arXiv:2211.1387

    The relevance of outsourcing and leagile strategies in performance optimization of an integrated process planning and scheduling

    Get PDF
    Over the past few years growing global competition has forced the manufacturing industries to upgrade their old production strategies with the modern day approaches. As a result, recent interest has been developed towards finding an appropriate policy that could enable them to compete with others, and facilitate them to emerge as a market winner. Keeping in mind the abovementioned facts, in this paper the authors have proposed an integrated process planning and scheduling model inheriting the salient features of outsourcing, and leagile principles to compete in the existing market scenario. The paper also proposes a model based on leagile principles, where the integrated planning management has been practiced. In the present work a scheduling problem has been considered and overall minimization of makespan has been aimed. The paper shows the relevance of both the strategies in performance enhancement of the industries, in terms of their reduced makespan. The authors have also proposed a new hybrid Enhanced Swift Converging Simulated Annealing (ESCSA) algorithm, to solve the complex real-time scheduling problems. The proposed algorithm inherits the prominent features of the Genetic Algorithm (GA), Simulated Annealing (SA), and the Fuzzy Logic Controller (FLC). The ESCSA algorithm reduces the makespan significantly in less computational time and number of iterations. The efficacy of the proposed algorithm has been shown by comparing the results with GA, SA, Tabu, and hybrid Tabu-SA optimization methods

    Adaptive multimodal continuous ant colony optimization

    Get PDF
    Seeking multiple optima simultaneously, which multimodal optimization aims at, has attracted increasing attention but remains challenging. Taking advantage of ant colony optimization algorithms in preserving high diversity, this paper intends to extend ant colony optimization algorithms to deal with multimodal optimization. First, combined with current niching methods, an adaptive multimodal continuous ant colony optimization algorithm is introduced. In this algorithm, an adaptive parameter adjustment is developed, which takes the difference among niches into consideration. Second, to accelerate convergence, a differential evolution mutation operator is alternatively utilized to build base vectors for ants to construct new solutions. Then, to enhance the exploitation, a local search scheme based on Gaussian distribution is self-adaptively performed around the seeds of niches. Together, the proposed algorithm affords a good balance between exploration and exploitation. Extensive experiments on 20 widely used benchmark multimodal functions are conducted to investigate the influence of each algorithmic component and results are compared with several state-of-the-art multimodal algorithms and winners of competitions on multimodal optimization. These comparisons demonstrate the competitive efficiency and effectiveness of the proposed algorithm, especially in dealing with complex problems with high numbers of local optima

    On controllability of neuronal networks with constraints on the average of control gains

    Get PDF
    Control gains play an important role in the control of a natural or a technical system since they reflect how much resource is required to optimize a certain control objective. This paper is concerned with the controllability of neuronal networks with constraints on the average value of the control gains injected in driver nodes, which are in accordance with engineering and biological backgrounds. In order to deal with the constraints on control gains, the controllability problem is transformed into a constrained optimization problem (COP). The introduction of the constraints on the control gains unavoidably leads to substantial difficulty in finding feasible as well as refining solutions. As such, a modified dynamic hybrid framework (MDyHF) is developed to solve this COP, based on an adaptive differential evolution and the concept of Pareto dominance. By comparing with statistical methods and several recently reported constrained optimization evolutionary algorithms (COEAs), we show that our proposed MDyHF is competitive and promising in studying the controllability of neuronal networks. Based on the MDyHF, we proceed to show the controlling regions under different levels of constraints. It is revealed that we should allocate the control gains economically when strong constraints are considered. In addition, it is found that as the constraints become more restrictive, the driver nodes are more likely to be selected from the nodes with a large degree. The results and methods presented in this paper will provide useful insights into developing new techniques to control a realistic complex network efficiently

    Improved dynamical particle swarm optimization method for structural dynamics

    Get PDF
    A methodology to the multiobjective structural design of buildings based on an improved particle swarm optimization algorithm is presented, which has proved to be very efficient and robust in nonlinear problems and when the optimization objectives are in conflict. In particular, the behaviour of the particle swarm optimization (PSO) classical algorithm is improved by dynamically adding autoadaptive mechanisms that enhance the exploration/exploitation trade-off and diversity of the proposed algorithm, avoiding getting trapped in local minima. A novel integrated optimization system was developed, called DI-PSO, to solve this problem which is able to control and even improve the structural behaviour under seismic excitations. In order to demonstrate the effectiveness of the proposed approach, the methodology is tested against some benchmark problems. Then a 3-story-building model is optimized under different objective cases, concluding that the improved multiobjective optimization methodology using DI-PSO is more efficient as compared with those designs obtained using single optimization.Peer ReviewedPostprint (published version
    • 

    corecore