146 research outputs found

    A Novel Ant based Clustering of Gene Expression Data using MapReduce Framework

    Get PDF
    Genes which exhibit similar patterns are often functionally related. Microarray technology provides a unique tool to examine how a cells gene expression pattern chang es under various conditions. Analyzing and interpreting these gene expression data is a challenging task. Clustering is one of the useful and popular methods to extract useful patterns from these gene expression data. In this paper multi colony ant based clustering approach is proposed. The whole processing procedure is divided into two parts: The first is the construction of Minimum spanning tree from the gene expression data using MapReduce version of ant colony optimization techniques. The second part is clustering, which is done by cutting the costlier edges from the minimum spanning tree, followed by one step k - means clustering procedure. Applied to different file sizes of gene expression data over different number of processors, the proposed approach exhibits good scalability and accuracy

    Multi-Objective Big Data Optimization with jMetal and Spark

    Get PDF
    Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    An Efficient Ant Colony Optimization Framework for HPC Environments

    Get PDF
    Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG[Abstract] Combinatorial optimization problems arise in many disciplines, both in the basic sciences and in applied fields such as engineering and economics. One of the most popular combinatorial optimization methods is the Ant Colony Optimization (ACO) metaheuristic. Its parallel nature makes it especially attractive for implementation and execution in High Performance Computing (HPC) environments. Here we present a novel parallel ACO strategy making use of efficient asynchronous decentralized cooperative mechanisms. This strategy seeks to fulfill two objectives: (i) acceleration of the computations by performing the ants’ solution construction in parallel; (ii) convergence improvement through the stimulation of the diversification in the search and the cooperation between different colonies. The two main features of the proposal, decentralization and desynchronization, enable a more effective and efficient response in environments where resources are highly coupled. Examples of such infrastructures include both traditional HPC clusters, and also new distributed environments, such as cloud infrastructures, or even local computer networks. The proposal has been evaluated using the popular Traveling Salesman Problem (TSP), as a well-known NP-hard problem widely used in the literature to test combinatorial optimization methods. An exhaustive evaluation has been carried out using three medium and large size instances from the TSPLIB library, and the experiments show encouraging results with superlinear speedups compared to the sequential algorithm (e.g. speedups of 18 with 16 cores), and a very good scalability (experiments were performed with up to 384 cores improving execution time even at that scale).This work was supported by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), and by Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019–2022, ref. ED431G 2019/01; Consolidation Program of Competitive Reference Groups, ref. ED431C 2021/30). JRB acknowledges funding from the Ministry of Science and Innovation of Spain MCIN / AEI / 10.13039/501100011033 through grant PID2020-117271RB-C22 (BIODYNAMICS), and from MCIN / AEI / 10.13039/501100011033 and “ERDF A way of making Europe” through grant DPI2017-82896-C2-2-R (SYNBIOCONTROL). Authors also acknowledge the Galician Supercomputing Center (CESGA) for the access to its facilities. Funding for open access charge: Universidade da Coruña/CISUGXunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2021/3

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    Distributed evolutionary algorithms and their models: A survey of the state-of-the-art

    Get PDF
    The increasing complexity of real-world optimization problems raises new challenges to evolutionary computation. Responding to these challenges, distributed evolutionary computation has received considerable attention over the past decade. This article provides a comprehensive survey of the state-of-the-art distributed evolutionary algorithms and models, which have been classified into two groups according to their task division mechanism. Population-distributed models are presented with master-slave, island, cellular, hierarchical, and pool architectures, which parallelize an evolution task at population, individual, or operation levels. Dimension-distributed models include coevolution and multi-agent models, which focus on dimension reduction. Insights into the models, such as synchronization, homogeneity, communication, topology, speedup, advantages and disadvantages are also presented and discussed. The study of these models helps guide future development of different and/or improved algorithms. Also highlighted are recent hotspots in this area, including the cloud and MapReduce-based implementations, GPU and CUDA-based implementations, distributed evolutionary multiobjective optimization, and real-world applications. Further, a number of future research directions have been discussed, with a conclusion that the development of distributed evolutionary computation will continue to flourish

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Methodology for modified whale optimization algorithm for solving appliances scheduling problem

    Get PDF
    Whale Optimization Algorithm (WOA) is considered as one of the newest metaheuristic algorithms to be used for solving a type of NP-hard problems. WOA is known of having slow convergence and at the same time, the computation of the algorithm will also be increased exponentially with multiple objectives and huge request from n users. The current constraints surely limit for solving and optimizing the quality of Demand Side Management (DSM) case, such as the energy consumption of indoor comfort index parameters which consist of thermal comfort, air quality, humidity and vision comfort.To address these issues, this proposed work will firstly justify and validate the constraints related to the appliances scheduling problem, and later proposes a new model of the Cluster based Multi-Objective WOA with multiple restart strategy. In order to achieve the objectives, different initialization strategy and cluster-based approaches will be used for tuning the main parameter of WOA under different MapReduce application which helps to control exploration and exploitation, and the proposed model will be tested on a set of well-known test functions and finally, will be applied on a real case project i.e. appliances scheduling problem. It is anticipating that the approach can expedite the convergence of meta-heuristic technique with quality solution

    Adaptive Dispatching of Tasks in the Cloud

    Full text link
    The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure
    corecore