Search CORE

501 research outputs found

Multi-Objective Big Data Optimization with jMetal and Spark

Author: A Cabanas-Abascal
AW McNabb
C Coello
E Kitzler
G Luque
JA Cordero
JJ Durillo
K Deb
KB Tannahill
R Lammel
SA Thomas
Z Zhou
Publication venue
Publication date: 01/01/2017
Field of study

Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Crossref

Repositorio Institucional Universidad de Málaga

idUS. Depósito de Investigación Universidad de Sevilla

Load Forecasting Based Distribution System Network Reconfiguration-A Distributed Data-Driven Approach

Author: Gu Yi
Jiang Huaiguang
Muljadi Eduard
Solis Francisco J.
Zhang Jun Jason
Zhang Yingchen
Publication venue
Publication date: 29/11/2017
Field of study

In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, the proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.Comment: 5 pages, preprint for Asilomar Conference on Signals, Systems, and Computers 201

arXiv.org e-Print Archive

Crossref

Parallel particle swarm optimization based on spark for academic paper co-authorship prediction

Author: Congmin Yang
Huansheng Ning
Liming Chen
Tao Zhu
Yang Zhang
Zhenyu Liu
Publication venue: 'MDPI AG'
Publication date: 20/12/2021
Field of study

The particle swarm optimization (PSO) algorithm has been widely used in various optimization problems. Although PSO has been successful in many fields, solving optimization problems in big data applications often requires processing of massive amounts of data, which cannot be handled by traditional PSO on a single machine. There have been several parallel PSO based on Spark, however they are almost proposed for solving numerical optimization problems, and few for big data optimization problems. In this paper, we propose a new Spark-based parallel PSO algorithm to predict the co-authorship of academic papers, which we formulate as an optimization problem from massive academic data. Experimental results show that the proposed parallel PSO can achieve good prediction accuracy

Multidisciplinary Digital Publishing Institute

Ulster University's Research Portal

Recommended from our members

Hadoop performance modeling and job optimization for big data analytics

Author: Khan Mukhtaj
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda

Brunel University Research Archive

Improved K-means clustering on Hadoop

Author: Kaustubh Chaturbhuj, Gauri Chaudhary
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2016
Field of study

Clustering is the portioning method in which we grouped similar attribute items. Recently data grows rapidly so data analysis using clustering getting difficult. K-means is traditional clustering method. K-means is easy to implement and scalable but it suffers from local minima and sensitive to initial cluster centroids. Particle swarm optimization is mimic behavior based clustering algorithm based on particle’s velocity but it suffers from number of iterations. So we use PSO for finding initial cluster center and then use this centroids for K-means clustering which is running parallel on Hadoop. Hadoop is used for large database. We try to find global clusters in limited iterations

International Journal on Recent and Innovation Trends in Computing and Communication

Combining Technical Trading Rules Using Parallel Particle Swarm Optimization based on Hadoop

Author: Cheung DWL
Wang F
Yu PLH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Technical trading rules have been utilized in the stock markets to make profit for more than a century. However, no single trading rule can ever be expected to predict the stock price trend accurately. In fact, many investors and fund managers make trading decisions by combining a bunch of technical indicators. In this paper, we consider the complex stock trading strategy, called Performance-based Reward Strategy (PRS), proposed by [1]. Instead of combining two classes of technical trading rules, we expand the scope to combine the seven most popular classes of trading rules in financial markets, resulting in a total of 1059 component trading rules. Each component rule is assigned a starting weight and a reward/penalty mechanism based on rules' recent profit is proposed to update their weights over time. To determine the best parameter values of PRS, we employ an improved time variant particle swarm optimization (TVPSO) algorithm with the objective of maximizing the annual net profit generated by PRS. Due to a large number of component rules and swarm size, the optimization time is significant. A parallel PSO based on Hadoop, an open source parallel programming model of MapReduce, is employed to optimize PRS more efficiently. The experimental results show that PRS outperforms all of the component rules in the testing period.published_or_final_versio

Crossref

HKU Scholars Hub

Dynamically Iterative MapReduce

Author: [[corresponding]]Wei Hsin-Wen
Lee Wei-Tsong
Publication venue: 臺北市：臺灣學術網路管理委員會
Publication date
Field of study

[[abstract]]MapReduce is a distributed and parallel computing model for data-intensive tasks with features of optimized scheduling, flexibility, high availability, and high manageability. MapReduce can work on various platforms; however, MapReduce is not suitable for iterative programs because the performance may be lowered by frequent disk I/O operations. In order to improve system performance and resource utilization, we propose a novel MapReduce framework named Dynamically Iterative MapReduce (DIMR) to reduce numbers of disk I/O operations and the consumption of network bandwidth by means of using dynamic task allocation and memory management mechanism. We show that DIMR is promising with detail discussions in this paper.[[notice]]補正完畢[[incitationindex]]SCI[[incitationindex]]EI[[booktype]]紙本[[booktype]]電子

Tamkang University Institutional Repository

SCOPE: Scalable Composite Optimization for Learning on Spark

Author: Gao Peng
Li Wu-Jun
Shi Ying-Hao
Xiang Ru
Zhao Shen-Yi
Publication venue
Publication date: 11/12/2016
Field of study

Many machine learning models, such as logistic regression~(LR) and support vector machine~(SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods are not scalable enough. In this paper, we propose a novel DSO method, called \underline{s}calable \underline{c}omposite \underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both computation-efficient and communication-efficient. Theoretical analysis shows that SCOPE is convergent with linear convergence rate when the objective function is convex. Furthermore, empirical results on real datasets show that SCOPE can outperform other state-of-the-art distributed learning methods on Spark, including both batch learning methods and DSO methods

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Parallel swarm intelligence strategies for large-scale clustering based on MapReduce with application to epigenetics of aging

Author: Batouche M
Benmounah Z
Lio P
Meshoul S
Publication venue: Applied Soft Computing
Publication date: 01/01/2018
Field of study

Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, it becomes a challenging issue due to the huge amount of data recently collected making conventional clustering algorithms inappropriate. The use of swarm intelligence algorithms has shown promising results when applied to data clustering of moderate size due to their decentralized and self-organized behavior. However, these algorithms exhibit limited capabilities when large data sets are involved. In this paper, we developed a decentralized distributed big data clustering solution using three swarm intelligence algorithms according to MapReduce framework. The developed framework allows cooperation between the three algorithms namely particle swarm optimization, ant colony optimization and artificial bees colony to achieve largely scalable data partitioning through a migration strategy. This latter reaps advantage of the combined exploration and exploitation capabilities of these algorithms to foster diversity. The framework is tested using amazon elastic map-reduce service (EMR) deploying up to 192 computer nodes and 30 gigabytes of data. Parallel metrics such as speed-up, size-up and scale-up are used to measure the elasticity and scalability of the framework. Our results are compared with their counterparts big data clustering results and show a significant improvement in terms of time and convergence to good quality solution. The developed model has been applied to epigenetics data clustering according to methylation features in CpG islands, gene body, and gene promoter in order to study the epigenetics impact on aging. Experimental results reveal that DNA-methylation changes slightly and not aberrantly with aging corroborating previous studies

Crossref

Apollo (Cambridge)