Search CORE

5 research outputs found

Re-engineering the ant colony optimization for CMP architectures

Author: Cecilia-Canales José María
GARCÍA CARRASCO JOSE MANUEL
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

[EN] The ant colony optimization (ACO) is inspired by the behavior of real ants, and as a bioinspired method, its underlying computation is massively parallel by definition. This paper shows re-engineering strategies to migrate the ACO algorithm applied to the Traveling Salesman Problem to modern Intel-based multi- and many-core architectures in a step-by-step methodology. The paper provides detailed guidelines on how to optimize the algorithm for the intra-node (thread and vector) parallelization, showing the performance scalability along with the number of cores on different Intel architectures, reporting up to 5.5x speedup factor between the Intel Xeon Phi Knights Landing and Intel Xeon v2. Moreover, parallel efficiency is provided for all targeted architectures, finding that core load imbalance, memory bandwidth limitations, and NUMA effects on data placement are some of the key factors limiting performance. Finally, a distributed implementation is also presented, reaching up to 2.96x speedup factor when running the code on 3 nodes over the single-node counterpart version. In the latter case, the parallel efficiency is affected by the synchronization frequency, which also affects the quality of the solution found by the distributed implementation.This work was partially supported by the Fundación Séneca, Agencia de Ciencia y Tecnología de la Región de Murcia under Project 20813/PI/18, and by Spanish Ministry of Science, Innovation and Universities as well as European Commission FEDER funds under Grants TIN2015-66972-C5-3-R, RTI2018-098156-B-C53, TIN2016-78799-P (AEI/FEDER, UE), and RTC-2017-6389-5. We acknowledge the excellent work done by Victor Montesinos while he was doing a research internship supported by the University of Murcia.Cecilia-Canales, JM.; García Carrasco, JM. (2020). Re-engineering the ant colony optimization for CMP architectures. The Journal of Supercomputing (Online). 76(6):4581-4602. https://doi.org/10.1007/s11227-019-02869-8S45814602766Yang XS (2010) Nature-inspired metaheuristic algorithms. Luniver Press, LebanonAkila M, Anusha P, Sindhu M, Selvan Krishnasamy T (2017) Examination of PSO, GA-PSO and ACO algorithms for the design optimization of printed antennas. In: IEEE Applied Electromagnetics Conference (AEMC)Dorigo M, Stützle T (2004) Ant colony optimization. A bradford book. The MIT Press, CambridgeCecilia JM, García JM, Nisbet A, Amos M, Ujaldón M (2013) Enhancing data parallelism for ant colony optimization on GPUs. J Parallel Distrib Comput 73(1):42–51Dawson L, Stewart I (2013) Improving ant colony optimization performance on the GPU using CUDA. In: IEEE Conference on Evolutionary Computation, pp 1901–1908Llanes A, Cecilia JM, Sánchez A, García JM, Amos M, Ujaldón M (2016) Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization. Cluster Comput 19(1):1–11Cecilia JM, Llanes A, Abellán JL, Gómez-Luna J, Chang L, Hwu WW (2018) High-throughput ant colony optimization on graphics processing units. J Parallel Distrib Comput 113:261–274Lloyd H, Amos M (2016) A Highly Parallelized and Vectorized Implementation of Max–Min Ant System on Intel Xeon Phi. In: IEEE computational intelligenceTirado F, Barrientos RJ, González P, Mora M (2017) Efficient exploitation of the Xeon Phi architecture for the ant colony optimization (ACO) metaheuristic. J Supercomput 73(11):5053–5070Montesinos V, García JM (2018) Vectorization strategies for ant colony optimization on intel architectures. Parallel Computing is Everywhere. IOS Press, Amsterdam, pp 400–409Lawler E, Lenstra J, Kan A, Shmoys D (1987) The Traveling salesman problem. Wiley, New YorkMontesinos V (June 2018) Performance analysis of ant colony optimization on intel architectures. Master’s Thesis, University of Murcia (Spain)Lloyd H, Amos M (2017) Analysis of independent roulette selection in parallel ant colony optimization. In: Genetic and Evolutionary Computation Conference, ACM, pp 19–26Dorigo M (1992) Optimization, learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, ItalyDuran A, Klemm M (2012) The intel many integrated core architecture. In: Internal Conference on High Performance Computing and Simulation (HPCS), pp 365–366The OpenMP API specification for parallel programming. URL: https://www.openmp.org . [Last accessed 14 June 2018]The Message Passing Interface (MPI) standard. URL: http://www.mcs.anl.gov/research/projects/mpi/ . [Last accessed 15 June 2018]Vladimirov A, Asai R (2016) Clustering modes in Knights landing processors: developer’s guide. Colfax international. URL: https://colfaxresearch.com/knl-numa/ . [Last accessed: 16 June 2018]Intel Developer Zone. URL: https://software.intel.com/en-us/modern-code . [Last accessed 02 Oct 2018]Pearce M (2018) What is code modernization? Intel developer zone. URL: http://software.intel.com/en-us/articles/what-is-code-modernization . [Last accessed 15 Feb 2018]Stützle T ACOTSP v1.03. Last accessed 15 Feb 2018. URL: http://iridia.ulb.ac.be/~mdorigo/ACO/downloads/ACOTSP-1.03.tgzReinelt G (1991) TSPLIB—a traveling salesman problem library. ORSA J Comput 3:376–384Crainic TG, Toulouse M (2003) Parallel strategies for meta-heuristics. State-of-the-art handbook in metaheuristics. Kluwer Academic Publishers, Dordrecht, pp 475–513Delévacq A, Delisle P, Gravel M, Krajecki M (2013) Parallel ant colony optimization on graphics processing units. J Parallel Distrib Comput 73(1):52–61Skinderowicz R (2016) The GPU-based parallel ant colony system. J Parallel Distrib Comput 98:48–60Zhou Y, He F, Hou N, Qiu Y (2018) Parallel ant colony optimization on multi-core SIMD CPUs. Future Gener Comput Syst 79:473–487Peake J, Amos M, Yiapanis P, Lloyd H (2018) Vectorized candidate set selection for parallel ant colony optimization. In: Genetic and Evolutionary Computation Conference, ACM, pp 1300–1306Stützle T (1998) Parallelization strategies for ant colony optimization. In: Eiben AE, Bäck T, Schoenauer M, Schwefel HP (eds) Parallel problem solving from nature—PPSN V. PPSN. Lecture Notes in Computer Science, vol 1498. Springer, Berlin, HeidelbergAbdelkafi O, Lepagnot J, Idoumghar L (2014) Multi-level parallelization for hybrid ACO. In: Siarry P, Idoumghar L, Lepagnot J (eds) Swarm Intelligence Based Optimization. ICSIBO 2014. Lecture Notes in Computer Science, vol 8472. Springer, ChamMichel R, Middendorf M (1998) An island model based ant system with lookahead for the shortest super sequence problem. In: Eiben AE, Bäck T, Schoenauer M, Schwefel HP (eds) Parallel problem solving from nature— PPSN V. PPSN. Lecture Notes in Computer Science, vol 1498. Springer, Berlin, HeidelbergChen L, Sun H, Wang S (2008) Parallel implementation of ant colony optimization on MPP. In: International Conference on Machine Learning and CyberneticsLin Y, Cai H, Xiao J, Zhang J (2007) Pseudo parallel ant colony optimization for continuous functions. In: International Conference on Natural Computatio

RiuNet

Accelerating supply chains with Ant Colony Optimization across range of hardware solutions

Author: Dzalbs I
Kalganova T
Publication venue: 'Elsevier BV'
Publication date: 22/01/2020
Field of study

This pre-print, arXiv:2001.08102v1 [cs.NE], was published subsequently by Elsevier in Computers and Industrial Engineering, vol. 147, 106610, pp. 1-14 on 29 Jun 2020 and is available at https://doi.org/10.1016/j.cie.2020.106610Ant Colony algorithm has been applied to various optimization problems, however most of the previous work on scaling and parallelism focuses on Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new idea comparison, the algorithmic dynamics does not always transfer to complex real-life problems, where additional meta-data is required during solution construction. This paper looks at real-life outbound supply chain problem using Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO architectures - Independent Ant Colonies (IAC) and Parallel Ants (PA). Results showed that PA was able to reach a higher solution quality in fewer iterations as the number of parallel instances increased. Furthermore, speed performance was measured across three different hardware solutions - 16 core CPU, 68 core Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization techniques such as SS-Roulette were implemented using C++ and CUDA. Although excellent for TSP, it was concluded that for the given supply chain problem GPUs are not suitable due to meta-data access footprint required. Furthermore, compared to their sequential counterpart, vectorized CPU AVX2 implementation achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at least up to 1024 parallel instances on the supply chain network problem solved

arXiv.org e-Print Archive

Brunel University Research Archive

METADOCK 2: a high-throughput parallel metaheuristic scheme for molecular docking

Author: Abellán José L.
Bueno-Crespo Andrés
Cecilia-Canales José María
Imbernón Baldomero
Pérez-Sánchez Horacio
Serrano Antonio
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/01/2020
Field of study

[EN] Motivation Molecular docking methods are extensively used to predict the interaction between protein-ligand systems in terms of structure and binding affinity, through the optimization of a physics-based scoring function. However, the computational requirements of these simulations grow exponentially with: (i) the global optimization procedure, (ii) the number and degrees of freedom of molecular conformations generated and (iii) the mathematical complexity of the scoring function. Results In this work, we introduce a novel molecular docking method named METADOCK 2, which incorporates several novel features, such as (i) a ligand-dependent blind docking approach that exhaustively scans the whole protein surface to detect novel allosteric sites, (ii) an optimization method to enable the use of a wide branch of metaheuristics and (iii) a heterogeneous implementation based on multicore CPUs and multiple graphics processing units. Two representative scoring functions implemented in METADOCK 2 are extensively evaluated in terms of computational performance and accuracy using several benchmarks (such as the well-known DUD) against AutoDock 4.2 and AutoDock Vina. Results place METADOCK 2 as an efficient and accurate docking methodology able to deal with complex systems where computational demands are staggering and which outperforms both AutoDock Vina and AutoDock 4.This work was partially supported by the Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia [Projects 20813/PI/ 18, 20988/PI/18, 20524/PDC/18] and by the Spanish Ministry of Science, Innovation and Universities [TIN2016-78799-P (AEI/FEDER, UE), CTQ2017-87974-R]. The authors thankfully acknowledge the computer resources at CTE-POWER and the technical support provided by Barcelona Supercomputing Center - Centro Nacional de Supercomputación [RES-BCV2018-3-0008].Imbernón, B.; Serrano, A.; Bueno-Crespo, A.; Abellán, JL.; Pérez-Sánchez, H.; Cecilia-Canales, JM. (2020). METADOCK 2: a high-throughput parallel metaheuristic scheme for molecular docking. Bioinformatics. 1-6. https://doi.org/10.1093/bioinformatics/btz958S16Bianchi, L., Dorigo, M., Gambardella, L. M., & Gutjahr, W. J. (2008). A survey on metaheuristics for stochastic combinatorial optimization. Natural Computing, 8(2), 239-287. doi:10.1007/s11047-008-9098-4Cecilia, J. M., Llanes, A., Abellán, J. L., Gómez-Luna, J., Chang, L.-W., & Hwu, W.-M. W. (2018). High-throughput Ant Colony Optimization on graphics processing units. Journal of Parallel and Distributed Computing, 113, 261-274. doi:10.1016/j.jpdc.2017.12.002Desiraju, G., & Steiner, T. (2001). The Weak Hydrogen Bond. doi:10.1093/acprof:oso/9780198509707.001.0001Eisenberg, D., & McLachlan, A. D. (1986). Solvation energy in protein folding and binding. Nature, 319(6050), 199-203. doi:10.1038/319199a0Ewing, T. J. A., Makino, S., Skillman, A. G., & Kuntz, I. D. (2001). Journal of Computer-Aided Molecular Design, 15(5), 411-428. doi:10.1023/a:1011115820450Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J., Mainz, D. T., … Shenkin, P. S. (2004). Glide: A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. Journal of Medicinal Chemistry, 47(7), 1739-1749. doi:10.1021/jm0306430Guerrero, G. D., Imbernón, B., Pérez-Sánchez, H., Sanz, F., García, J. M., & Cecilia, J. M. (2014). A Performance/Cost Evaluation for a GPU-Based Drug Discovery Application on Volunteer Computing. BioMed Research International, 2014, 1-8. doi:10.1155/2014/474219Hauser, A. S., & Windshügel, B. (2016). LEADS-PEP: A Benchmark Data Set for Assessment of Peptide Docking Performance. Journal of Chemical Information and Modeling, 56(1), 188-200. doi:10.1021/acs.jcim.5b00234Llanes, A., Muñoz, A., Bueno-Crespo, A., García-Valverde, T., Sánchez, A., Arcas-Túnez, F., … M. Cecilia, J. (2016). Soft Computing Techniques for the Protein Folding Problem on High Performance Computing Architectures. Current Drug Targets, 17(14), 1626-1648. doi:10.2174/1389450117666160201114028McIntosh-Smith, S., Price, J., Sessions, R. B., & Ibarra, A. A. (2014). High performance in silico virtual drug screening on many-core processors. The International Journal of High Performance Computing Applications, 29(2), 119-134. doi:10.1177/1094342014528252Mehler, E. L., & Solmajer, T. (1991). Electrostatic effects in proteins: comparison of dielectric and charge models. «Protein Engineering, Design and Selection», 4(8), 903-910. doi:10.1093/protein/4.8.903Morris, G. M., Goodsell, D. S., Halliday, R. S., Huey, R., Hart, W. E., Belew, R. K., & Olson, A. J. (1998). Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. Journal of Computational Chemistry, 19(14), 1639-1662. doi:10.1002/(sici)1096-987x(19981115)19:143.0.co;2-bMysinger, M. M., Carchia, M., Irwin, J. J., & Shoichet, B. K. (2012). Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. Journal of Medicinal Chemistry, 55(14), 6582-6594. doi:10.1021/jm300687eO’Boyle, N. M., Banck, M., James, C. A., Morley, C., Vandermeersch, T., & Hutchison, G. R. (2011). Open Babel: An open chemical toolbox. Journal of Cheminformatics, 3(1). doi:10.1186/1758-2946-3-33Sakurai, Y., Kolokoltsov, A. A., Chen, C.-C., Tidwell, M. W., Bauta, W. E., Klugbauer, N., … Davey, R. A. (2015). Two-pore channels control Ebola virus host cell entry and are drug targets for disease treatment. Science, 347(6225), 995-998. doi:10.1126/science.1258758Sánchez-Linares, I., Pérez-Sánchez, H., Cecilia, J. M., & García, J. M. (2012). High-Throughput parallel blind Virtual Screening using BINDSURF. BMC Bioinformatics, 13(S14). doi:10.1186/1471-2105-13-s14-s13Sliwoski, G., Kothiwale, S., Meiler, J., & Lowe, E. W. (2013). Computational Methods in Drug Discovery. Pharmacological Reviews, 66(1), 334-395. doi:10.1124/pr.112.007336Sörensen, K. (2013). Metaheuristics-the metaphor exposed. International Transactions in Operational Research, 22(1), 3-18. doi:10.1111/itor.12001Yuan, S., Chan, J. F.-W., den-Haan, H., Chik, K. K.-H., Zhang, A. J., Chan, C. C.-S., … Yuen, K.-Y. (2017). Structure-based discovery of clinically approved drugs as Zika virus NS2B-NS3 protease inhibitors that potently inhibit Zika virus infection in vitro and in vivo. Antiviral Research, 145, 33-43. doi:10.1016/j.antiviral.2017.07.00

Crossref

RiuNet

Recommended from our members

OptPlatform: metaheuristic optimisation framework for solving complex real-world problems

Author: Dzalbs Ivars
Publication venue: Brunel University London
Publication date: 01/01/2021
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonWe optimise daily, whether that is planning a round trip that visits the most attractions within a given holiday budget or just taking a train instead of driving a car in a rush hour. Many problems, just like these, are solved by individuals as part of our daily schedule, and they are effortless and straightforward. If we now scale that to many individuals with many different schedules, like a school timetable, we get to a point where it is just not feasible or practical to solve by hand. In such instances, optimisation methods are used to obtain an optimal solution. In this thesis, a practical approach to optimisation has been taken by developing an optimisation platform with all the necessary tools to be used by practitioners who are not necessarily familiar with the subject of optimisation. First, a high-performance metaheuristic optimisation framework (MOF) called OptPlatform is implemented, and the versatility and performance are evaluated across multiple benchmarks and real-world optimisation problems. Results show that, compared to competing MOFs, the OptPlatform outperforms in both the solution quality and computation time. Second, the most suitable hardware platform for OptPlatform is determined by an in-depth analysis of Ant Colony Optimisation scaling across CPU, GPU and enterprise Xeon Phi. Contrary to the common benchmark problems used in the literature, the supply chain problem solved could not scale on GPUs. Third, a variety of metaheuristics are implemented into OptPlatform. Including, a new metaheuristic based on Imperialist Competitive Algorithm (ICA), called ICA with Independence and Constrained Assimilation (ICAwICA) is proposed. The ICAwICA was compared against two different types of benchmark problems, and results show the versatile application of the algorithm, matching and in some cases outperforming the custom-tuned approaches. Finally, essential MOF features like automatic algorithm selection and tuning, lacking on existing frameworks, are implemented in OptPlatform. Two novel approaches are proposed and compared to existing methods. Results indicate the superiority of the implemented tuning algorithms within constrained tuning budget environment

Brunel University Research Archive