86 research outputs found

    A Highly Parallelized and Vectorized Implementation of Max-Min Ant System on Intel Xeon Phi

    Get PDF
    The increasing trend in processor design towards many-core architectures with wide vector processing units is largely motivated by the fact that single core performance has hit a ‘power wall’, meaning that performance gains are currently achievable only through increasingly parallel and vectorized execution models. Consequently, applications can only exploit the full performance of modern processors if they achieve high parallel and vector efficiencies. In this paper, we illustrate how this might be achieved for the well-established Ant Colony Optimization metaheuristic. We describe a highly parallel and vectorized variant of the Max-Min Ant System algorithm applied to the Traveling Salesman Problem, and present two novel vectorized algorithms for selecting cities during the tour construction phase. We present experimental results from an implementation on the Intel R Xeon PhiTM platform, which show that very high parallel and vector efficiencies are achieved, and significant speedups are obtained compared to both the reference serial implementation and the previous best Xeon Phi implementation available in the literature

    Accelerating supply chains with Ant Colony Optimization across range of hardware solutions

    Get PDF
    This pre-print, arXiv:2001.08102v1 [cs.NE], was published subsequently by Elsevier in Computers and Industrial Engineering, vol. 147, 106610, pp. 1-14 on 29 Jun 2020 and is available at https://doi.org/10.1016/j.cie.2020.106610Ant Colony algorithm has been applied to various optimization problems, however most of the previous work on scaling and parallelism focuses on Travelling Salesman Problems (TSPs). Although, useful for benchmarks and new idea comparison, the algorithmic dynamics does not always transfer to complex real-life problems, where additional meta-data is required during solution construction. This paper looks at real-life outbound supply chain problem using Ant Colony Optimization (ACO) and its scaling dynamics with two parallel ACO architectures - Independent Ant Colonies (IAC) and Parallel Ants (PA). Results showed that PA was able to reach a higher solution quality in fewer iterations as the number of parallel instances increased. Furthermore, speed performance was measured across three different hardware solutions - 16 core CPU, 68 core Xeon Phi and up to 4 Geforce GPUs. State of the art, ACO vectorization techniques such as SS-Roulette were implemented using C++ and CUDA. Although excellent for TSP, it was concluded that for the given supply chain problem GPUs are not suitable due to meta-data access footprint required. Furthermore, compared to their sequential counterpart, vectorized CPU AVX2 implementation achieved 25.4x speedup on CPU while Xeon Phi with its AVX512 instruction set reached 148x on PA with Vectorized (PAwV). PAwV is therefore able to scale at least up to 1024 parallel instances on the supply chain network problem solved

    Parallel Ant Colony Optimization: Algorithmic Models and Hardware Implementations

    Get PDF

    A Novel Fast Path Planning Approach for Mobile Devices using Hybrid Quantum Ant Colony Optimization Algorithm

    Full text link
    With IoT systems' increasing scale and complexity, maintenance of a large number of nodes using stationary devices is becoming increasingly difficult. Hence, mobile devices are being employed that can traverse through a set of target locations and provide the necessary services. In order to reduce energy consumption and time requirements, the devices are required to traverse following a Hamiltonian path. This problem can be formulated as a Travelling Salesman Problem (TSP), an NP-hard problem. Moreover, in emergency services, the devices must traverse in real-time, demanding speedy path planning from the TSP instance. Among the well-known optimization techniques for solving the TSP problem, Ant Colony Optimization has a good stronghold in providing good approximate solutions. Moreover, ACO not only provides near-optimal solutions for TSP instances but can also output optimal or near-optimal solutions for many other demanding hard optimization problems. However, to have a fast solution, the next node selection, which needs to consider all the neighbors for each selection, becomes a bottleneck in the path formation step. Moreover, classical computers are constrained to generate only pseudorandom numbers. Both these problems can be solved using quantum computing techniques, i.e., the next node can be selected with proper randomization, respecting the provided set of probabilities in just a single execution and single measurement of a quantum circuit. Simulation results of the proposed Hybrid Quantum Ant Colony Optimization algorithm on several TSP instances have shown promising results, thus expecting the proposed work to be important in implementing real-time path planning in quantum-enabled mobile devices

    Re-engineering the ant colony optimization for CMP architectures

    Full text link
    [EN] The ant colony optimization (ACO) is inspired by the behavior of real ants, and as a bioinspired method, its underlying computation is massively parallel by definition. This paper shows re-engineering strategies to migrate the ACO algorithm applied to the Traveling Salesman Problem to modern Intel-based multi- and many-core architectures in a step-by-step methodology. The paper provides detailed guidelines on how to optimize the algorithm for the intra-node (thread and vector) parallelization, showing the performance scalability along with the number of cores on different Intel architectures, reporting up to 5.5x speedup factor between the Intel Xeon Phi Knights Landing and Intel Xeon v2. Moreover, parallel efficiency is provided for all targeted architectures, finding that core load imbalance, memory bandwidth limitations, and NUMA effects on data placement are some of the key factors limiting performance. Finally, a distributed implementation is also presented, reaching up to 2.96x speedup factor when running the code on 3 nodes over the single-node counterpart version. In the latter case, the parallel efficiency is affected by the synchronization frequency, which also affects the quality of the solution found by the distributed implementation.This work was partially supported by the Fundación Séneca, Agencia de Ciencia y Tecnología de la Región de Murcia under Project 20813/PI/18, and by Spanish Ministry of Science, Innovation and Universities as well as European Commission FEDER funds under Grants TIN2015-66972-C5-3-R, RTI2018-098156-B-C53, TIN2016-78799-P (AEI/FEDER, UE), and RTC-2017-6389-5. We acknowledge the excellent work done by Victor Montesinos while he was doing a research internship supported by the University of Murcia.Cecilia-Canales, JM.; García Carrasco, JM. (2020). Re-engineering the ant colony optimization for CMP architectures. The Journal of Supercomputing (Online). 76(6):4581-4602. https://doi.org/10.1007/s11227-019-02869-8S45814602766Yang XS (2010) Nature-inspired metaheuristic algorithms. Luniver Press, LebanonAkila M, Anusha P, Sindhu M, Selvan Krishnasamy T (2017) Examination of PSO, GA-PSO and ACO algorithms for the design optimization of printed antennas. In: IEEE Applied Electromagnetics Conference (AEMC)Dorigo M, Stützle T (2004) Ant colony optimization. A bradford book. The MIT Press, CambridgeCecilia JM, García JM, Nisbet A, Amos M, Ujaldón M (2013) Enhancing data parallelism for ant colony optimization on GPUs. J Parallel Distrib Comput 73(1):42–51Dawson L, Stewart I (2013) Improving ant colony optimization performance on the GPU using CUDA. In: IEEE Conference on Evolutionary Computation, pp 1901–1908Llanes A, Cecilia JM, Sánchez A, García JM, Amos M, Ujaldón M (2016) Dynamic load balancing on heterogeneous clusters for parallel ant colony optimization. Cluster Comput 19(1):1–11Cecilia JM, Llanes A, Abellán JL, Gómez-Luna J, Chang L, Hwu WW (2018) High-throughput ant colony optimization on graphics processing units. J Parallel Distrib Comput 113:261–274Lloyd H, Amos M (2016) A Highly Parallelized and Vectorized Implementation of Max–Min Ant System on Intel Xeon Phi. In: IEEE computational intelligenceTirado F, Barrientos RJ, González P, Mora M (2017) Efficient exploitation of the Xeon Phi architecture for the ant colony optimization (ACO) metaheuristic. J Supercomput 73(11):5053–5070Montesinos V, García JM (2018) Vectorization strategies for ant colony optimization on intel architectures. Parallel Computing is Everywhere. IOS Press, Amsterdam, pp 400–409Lawler E, Lenstra J, Kan A, Shmoys D (1987) The Traveling salesman problem. Wiley, New YorkMontesinos V (June 2018) Performance analysis of ant colony optimization on intel architectures. Master’s Thesis, University of Murcia (Spain)Lloyd H, Amos M (2017) Analysis of independent roulette selection in parallel ant colony optimization. In: Genetic and Evolutionary Computation Conference, ACM, pp 19–26Dorigo M (1992) Optimization, learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, ItalyDuran A, Klemm M (2012) The intel many integrated core architecture. In: Internal Conference on High Performance Computing and Simulation (HPCS), pp 365–366The OpenMP API specification for parallel programming. URL: https://www.openmp.org . [Last accessed 14 June 2018]The Message Passing Interface (MPI) standard. URL: http://www.mcs.anl.gov/research/projects/mpi/ . [Last accessed 15 June 2018]Vladimirov A, Asai R (2016) Clustering modes in Knights landing processors: developer’s guide. Colfax international. URL: https://colfaxresearch.com/knl-numa/ . [Last accessed: 16 June 2018]Intel Developer Zone. URL: https://software.intel.com/en-us/modern-code . [Last accessed 02 Oct 2018]Pearce M (2018) What is code modernization? Intel developer zone. URL: http://software.intel.com/en-us/articles/what-is-code-modernization . [Last accessed 15 Feb 2018]Stützle T ACOTSP v1.03. Last accessed 15 Feb 2018. URL: http://iridia.ulb.ac.be/~mdorigo/ACO/downloads/ACOTSP-1.03.tgzReinelt G (1991) TSPLIB—a traveling salesman problem library. ORSA J Comput 3:376–384Crainic TG, Toulouse M (2003) Parallel strategies for meta-heuristics. State-of-the-art handbook in metaheuristics. Kluwer Academic Publishers, Dordrecht, pp 475–513Delévacq A, Delisle P, Gravel M, Krajecki M (2013) Parallel ant colony optimization on graphics processing units. J Parallel Distrib Comput 73(1):52–61Skinderowicz R (2016) The GPU-based parallel ant colony system. J Parallel Distrib Comput 98:48–60Zhou Y, He F, Hou N, Qiu Y (2018) Parallel ant colony optimization on multi-core SIMD CPUs. Future Gener Comput Syst 79:473–487Peake J, Amos M, Yiapanis P, Lloyd H (2018) Vectorized candidate set selection for parallel ant colony optimization. In: Genetic and Evolutionary Computation Conference, ACM, pp 1300–1306Stützle T (1998) Parallelization strategies for ant colony optimization. In: Eiben AE, Bäck T, Schoenauer M, Schwefel HP (eds) Parallel problem solving from nature—PPSN V. PPSN. Lecture Notes in Computer Science, vol 1498. Springer, Berlin, HeidelbergAbdelkafi O, Lepagnot J, Idoumghar L (2014) Multi-level parallelization for hybrid ACO. In: Siarry P, Idoumghar L, Lepagnot J (eds) Swarm Intelligence Based Optimization. ICSIBO 2014. Lecture Notes in Computer Science, vol 8472. Springer, ChamMichel R, Middendorf M (1998) An island model based ant system with lookahead for the shortest super sequence problem. In: Eiben AE, Bäck T, Schoenauer M, Schwefel HP (eds) Parallel problem solving from nature— PPSN V. PPSN. Lecture Notes in Computer Science, vol 1498. Springer, Berlin, HeidelbergChen L, Sun H, Wang S (2008) Parallel implementation of ant colony optimization on MPP. In: International Conference on Machine Learning and CyberneticsLin Y, Cai H, Xiao J, Zhang J (2007) Pseudo parallel ant colony optimization for continuous functions. In: International Conference on Natural Computatio

    Synthesis of reconfigurable high-performance multicore systems

    Full text link

    Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data

    Get PDF
    ANTECEDENTES: las metaheurísticas se utilizan ampliamente para resolver grandes problemas de optimización combinatoria en bioinformática debido al enorme conjunto de posibles soluciones. Dos problemas representativos son la selección de genes para la clasificación del cáncer y el agrupamiento de los datos de expresión génica. En la mayoría de los casos, estas metaheurísticas, así como otras técnicas no lineales, aplican una función de adecuación a cada solución posible con una población de tamaño limitado, y ese paso involucra latencias más altas que otras partes de los algoritmos, lo cual es la razón por la cual el tiempo de ejecución de las aplicaciones dependerá principalmente del tiempo de ejecución de la función de aptitud. Además, es habitual encontrar formulaciones aritméticas de punto flotante para las funciones de fitness. De esta manera, una paralelización cuidadosa de estas funciones utilizando la tecnología de hardware reconfigurable acelerará el cálculo, especialmente si se aplican en paralelo a varias soluciones de la población. RESULTADOS: una paralelización de grano fino de dos funciones de aptitud de punto flotante de diferentes complejidades y características involucradas en el biclustering de los datos de expresión génica y la selección de genes para la clasificación del cáncer permitió obtener mayores aceleraciones y cómputos de potencia reducida con respecto a los microprocesadores habituales. CONCLUSIONES: Los resultados muestran mejores rendimientos utilizando tecnología de hardware reconfigurable en lugar de los microprocesadores habituales, en términos de tiempo de consumo y consumo de energía, no solo debido a la paralelización de las operaciones aritméticas, sino también gracias a la evaluación de aptitud concurrente para varios individuos de la población en La metaheurística. Esta es una buena base para crear soluciones aceleradas y de bajo consumo de energía para escenarios informáticos intensivos.BACKGROUND: Metaheuristics are widely used to solve large combinatorial optimization problems in bioinformatics because of the huge set of possible solutions. Two representative problems are gene selection for cancer classification and biclustering of gene expression data. In most cases, these metaheuristics, as well as other non-linear techniques, apply a fitness function to each possible solution with a size-limited population, and that step involves higher latencies than other parts of the algorithms, which is the reason why the execution time of the applications will mainly depend on the execution time of the fitness function. In addition, it is usual to find floating-point arithmetic formulations for the fitness functions. This way, a careful parallelization of these functions using the reconfigurable hardware technology will accelerate the computation, specially if they are applied in parallel to several solutions of the population. RESULTS: A fine-grained parallelization of two floating-point fitness functions of different complexities and features involved in biclustering of gene expression data and gene selection for cancer classification allowed for obtaining higher speedups and power-reduced computation with regard to usual microprocessors. CONCLUSIONS: The results show better performances using reconfigurable hardware technology instead of usual microprocessors, in computing time and power consumption terms, not only because of the parallelization of the arithmetic operations, but also thanks to the concurrent fitness evaluation for several individuals of the population in the metaheuristic. This is a good basis for building accelerated and low-energy solutions for intensive computing scenarios.• Ministerio de Economía y Competitividad y Fondos FEDER. Contrato TIN2012-30685 (I+D+i) • Gobierno de Extremadura. Ayuda GR15011 para grupos TIC015 • CONICYT/FONDECYT/REGULAR/1160455. Beca para Ricardo Soto Guzmán • CONICYT/FONDECYT/REGULAR/1140897. Beca para Broderick CrawfordpeerReviewe

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció
    corecore