Search CORE

193 research outputs found

An evolutionary algorithm for online, resource constrained, multi-vehicle sensing mission planning

Author: Lane David M.
Tsiogkas Nikolaos
Publication venue
Publication date: 10/01/2018
Field of study

Mobile robotic platforms are an indispensable tool for various scientific and industrial applications. Robots are used to undertake missions whose execution is constrained by various factors, such as the allocated time or their remaining energy. Existing solutions for resource constrained multi-robot sensing mission planning provide optimal plans at a prohibitive computational complexity for online application [1],[2],[3]. A heuristic approach exists for an online, resource constrained sensing mission planning for a single vehicle [4]. This work proposes a Genetic Algorithm (GA) based heuristic for the Correlated Team Orienteering Problem (CTOP) that is used for planning sensing and monitoring missions for robotic teams that operate under resource constraints. The heuristic is compared against optimal Mixed Integer Quadratic Programming (MIQP) solutions. Results show that the quality of the heuristic solution is at the worst case equal to the 5% optimal solution. The heuristic solution proves to be at least 300 times more time efficient in the worst tested case. The GA heuristic execution required in the worst case less than a second making it suitable for online execution.Comment: 8 pages, 5 figures, accepted for publication in Robotics and Automation Letters (RA-L

arXiv.org e-Print Archive

Heriot Watt Pure

Compiler Transformations to Generate Reentrant C Programs to Assist Software Parallelization

Author: Smith Adam R.
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2009
Field of study

As we move through the multi-core era into the many-core era it becomes obvi- ous that thread-based programming is here to stay. This trend in the development of general purpose hardware is augmented by the fact that while writing sequential programs is considered a non-trivial task, writing parallel applications to take ad- vantage of the advances in the number of cores in a processor severely complicates the process. Writing parallel applications requires programs and functions to be reentrant. Therefore, we cannot use globals and statics. However, globals and statics are useful in certain contexts. Globals allow an easy programming mecha- nism to share data between several functions. Statics provide the only mechanism of data hiding in C for variables that are global in scope. Writing parallel programs restricts users from using globals and statics in their programs, as doing so would make the program non-reentrant. Moreover, there is a large existing legacy code base of sequential programs that are non-reentrant, since they rely on statics and globals. Several of these sequential programs dis- play significant amounts of data parallelism by operating on independent chunks of input data, and therefore can be easily converted into parallel versions to ex- ploit multi-core processors. Indeed, several such programs have been manually converted into parallel versions. However, manually eliminating all globals and statics to make the program reentrant is tedious, time-consuming, and error-prone. In this paper we describe a system to provide a semi-automated mechanism for users to still be able to use statics and globals in their programs, and to let the compiler automatically convert them into their semantically-equivalent reentrant versions enabling their parallelization later

KU ScholarWorks

Sort-based grouping and aggregation

Author: Do Thanh
Graefe Goetz
Publication venue
Publication date: 30/09/2020
Field of study

Database query processing requires algorithms for duplicate removal, grouping, and aggregation. Three algorithms exist: in-stream aggregation is most efficient by far but requires sorted input; sort-based aggregation relies on external merge sort; and hash aggregation relies on an in-memory hash table plus hash partitioning to temporary storage. Cost-based query optimization chooses which algorithm to use based on several factors including input and output sizes, the sort order of the input, and the need for sorted output. For example, hash-based aggregation is ideal for small output (e.g., TPC-H Query 1), whereas sorting the entire input and aggregating after sorting are preferable when both aggregation input and output are large and the output needs to be sorted for a subsequent operation such as a merge join. Unfortunately, the size information required for a sound choice is often inaccurate or unavailable during query optimization, leading to sub-optimal algorithm choices. To address this challenge, this paper introduces a new algorithm for sort-based duplicate removal, grouping, and aggregation. The new algorithm always performs at least as well as both traditional hash-based and traditional sort-based algorithms. It can serve as a system's only aggregation algorithm for unsorted inputs, thus preventing erroneous algorithm choices. Furthermore, the new algorithm produces sorted output that can speed up subsequent operations. Google's F1 Query uses the new algorithm in production workloads that aggregate petabytes of data every day

arXiv.org e-Print Archive

Recommended from our members

Hadoop performance modeling and job optimization for big data analytics

Author: Khan Mukhtaj
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonBig data has received a momentum from both academia and industry. The MapReduce model has emerged into a major computing model in support of big data analytics. Hadoop, which is an open source implementation of the MapReduce model, has been widely taken up by the community. Cloud service providers such as Amazon EC2 cloud have now supported Hadoop user applications. However, a key challenge is that the cloud service providers do not a have resource provisioning mechanism to satisfy user jobs with deadline requirements. Currently, it is solely the user responsibility to estimate the require amount of resources for their job running in a public cloud. This thesis presents a Hadoop performance model that accurately estimates the execution duration of a job and further provisions the required amount of resources for a job to be completed within a deadline. The proposed model employs Locally Weighted Linear Regression (LWLR) model to estimate execution time of a job and Lagrange Multiplier technique for resource provisioning to satisfy user job with a given deadline. The performance of the propose model is extensively evaluated in both in-house Hadoop cluster and Amazon EC2 Cloud. Experimental results show that the proposed model is highly accurate in job execution estimation and jobs are completed within the required deadlines following on the resource provisioning scheme of the proposed model. In addition, the Hadoop framework has over 190 configuration parameters and some of them have significant effects on the performance of a Hadoop job. Manually setting the optimum values for these parameters is a challenging task and also a time consuming process. This thesis presents optimization works that enhances the performance of Hadoop by automatically tuning its parameter values. It employs Gene Expression Programming (GEP) technique to build an objective function that represents the performance of a job and the correlation among the configuration parameters. For the purpose of optimization, Particle Swarm Optimization (PSO) is employed to find automatically an optimal or a near optimal configuration settings. The performance of the proposed work is intensively evaluated on a Hadoop cluster and the experimental results show that the proposed work enhances the performance of Hadoop significantly compared with the default settings.Abdul Wali Khan University Marda

Brunel University Research Archive

Fuzzy Differential Evolution Algorithm

Author: Vucetic Dejan
Publication venue: Scholarship@Western
Publication date: 01/05/2012
Field of study

The Differential Evolution (DE) algorithm is a powerful search technique for solving global optimization problems over continuous space. The search initialization for this algorithm does not adequately capture vague preliminary knowledge from the problem domain. This thesis proposes a novel Fuzzy Differential Evolution (FDE) algorithm, as an alternative approach, where the vague information of the search space can be represented and used to deliver a more efficient search. The proposed FDE algorithm utilizes fuzzy set theory concepts to modify the traditional DE algorithm search initialization and mutation components. FDE, alongside other key DE features, is implemented in a convenient decision support system software package. Four benchmark functions are used to demonstrate performance of the new FDE and its practical utility. Additionally, the application of the algorithm is illustrated through a water management case study problem. The new algorithm shows faster convergence for most of the benchmark functions

Scholarship@Western

Balancer genetic algorithm-a novel task scheduling optimization approach in cloud computing

Author: Alotaibi Abdullah Alhumaidi
Althobaiti Turke
Anjum Nadeem
Gulbaz Rohail
Ramzan Naeem
Siddiqui Abdul Basit
Publication venue: 'MDPI AG'
Publication date: 06/07/2021
Field of study

Task scheduling is one of the core issues in cloud computing. Tasks are heterogeneous, and they have intensive computational requirements. Tasks need to be scheduled on Virtual Machines (VMs), which are resources in a cloud environment. Due to the immensity of search space for possible mappings of tasks to VMs, meta-heuristics are introduced for task scheduling. In scheduling makespan and load balancing, Quality of Service (QoS) parameters are crucial. This research contributes a novel load balancing scheduler, namely Balancer Genetic Algorithm (BGA), which is presented to improve makespan and load balancing. Insufficient load balancing can cause an overhead of utilization of resources, as some of the resources remain idle. BGA inculcates a load balancing mechanism, where the actual load in terms of million instructions assigned to VMs is considered. A need to opt for multi-objective optimization for improvement in load balancing and makespan is also emphasized. Skewed, normal and uniform distributions of workload and different batch sizes are used in experimentation. BGA has exhibited significant improvement compared with various state-of-the-art approaches for makespan, throughput and load balancing

Multidisciplinary Digital Publishing Institute

Research Repository and Portal - University of the West of Scotland

Improving Data Locality in Distributed Processing of Multi-Channel Remote Sensing Data with Potentially Large Stencils

Author: Posovszky Philipp
Publication venue
Publication date: 01/02/2020
Field of study

Distributing a multi-channel remote sensing data processing with potentially large stencils is a difficult challenge. The goal of this master thesis was to evaluate and investigate the performance impacts of such a processing on a distributed system and if it is possible to improve the total execution time by exploiting data locality or memory alignments. The thesis also gives a brief overview of the actual state of the art in remote sensing distributed data processing and points out why distributed computing will become more important for it in the future. For the experimental part of this thesis an application to process huge arrays on a distributed system was implemented with DASH, a C++ Template Library for Distributed Data Structures with Support for Hierarchical Locality for High Performance Computing and Data-Driven Science. On the basis of the first results an optimization model was developed which has the goal to reduce network traffic while initializing a distributed data structure and executing computations on it with potentially large stencils. Furthermore, a software to estimate the memory layouts with the least network communication cost for a given multi-channel remote sensing data processing workflow was implemented. The results of this optimization were executed and evaluated afterwards. The results show that it is possible to improve the initialization speed of a large image by considering the brick locality by 25%. The optimization model also generate valid decisions for the initialization of the PGAS memory layouts. However, for a real implementation the optimization model has to be modified to reflect implementation-dependent sources of overhead. This thesis presented some approaches towards solving challenges of the distributed computing world that can be used for real-world remote sensing imaging applications and contributed towards solving the challenges of the modern Big Data world for future scientific data exploitation

Institute of Transport Research:Publications

하천 오염물질 혼합 해석을 위한 저장대 모형의 매개변수 산정법 및 경험식 개발

Author: 노효섭
Publication venue: 서울대학교 대학원
Publication date: 01/08/2019
Field of study

학위논문(석사)--서울대학교 대학원 :공과대학 건설환경공학부,2019. 8. 서일원.Analyses of solute transport and retention mechanism are essential to manage water quality and river ecosystem. As reported by tracer injection studies that have been conducted to identify solute transport mechanism, concentration curves measured in natural stream have steep rising and long tail parts. This phenomenon is due to solute exchange process between transient storage zones and the main river stream. The transient storage model (TSM) is one of the most widely used models for describing solute transport in natural stream, taking transient storage exchange process into consideration. In order to use this model, calibration of four TSM parameters is necessary. Inverse modelling using measured breakthrough curves (BTCs) from tracer injection test is general method for TSM parameter calibration. However, it is not feasible to carry out performing tracer injection tests, for every parameter calibration. For that reasons, empirical formulae with hydraulic data, which is comparatively easier to obtain, have been proposed for the purpose of parameter estimation. This study presents two methods for TSM parameter estimation. At first, inverse modelling method employing global optimization framework Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL), that incorporating famous evolutionary algorithms in water resource management field, was suggested. Second, TSM parameter empirical equations were derived adopting Multigene Genetic Programming (MGGP) based symbolic regression library GPTIPS and using Principal Components Regression (PCR). In terms of general performance, equations of this study were superior to published empirical equations.하천의 수질을 관리하기 위해서는 자연하천에서 유입된 물질이 이송되고 지체되는 메카니즘을 규명하고 이해하는 것이 필요하다. 하천에서의 물질 혼합을 이해하기 위해 수행된 추적자 실험 연구들에 따르면 자연하천에서 계측되는 농도곡선에서는 가파른 상승부와 긴 꼬리기 관측되는 것으로 알려졌다. 이러한 현상은 주로 물질이 흐르는 본류대와 잠시 물질이 포획되었다가 재방출되는 본류대와 저장대 간의 물질교환 효과 때문에 일어난다고 알려져 있다. 이러한 저장대 물질교환 효과를 모사하는 저장대모형 중 Transient Storage zone Model (TSM)은 가장 광범위하게 이용되는 모형으로, 이를 이용하기 위해선 네 가지의 저장대 매개변수를 보정하여야 한다. 네 가지 저장대 매개변수를 결정하는 방법으로는 일반적으로 현장실험에서 측정된 농도곡선을 이용한 역산모형이 이용된다. 그러나 매개변수가 필요할 때마다 추적자실험을 수행하여 역산모형을 이용하는 것은 현실적으로 불가능한 경우가 있어 이러한 경우에는 비교적 취득하기 쉬운 수리지형학적 인자들을 이용해 매개변수를 산정하는 방법이 이용될 수 있다. 따라서 본 연구에서는 TSM 매개변수를 결정하기 위해 두 가지 방법을 제시하였다. 첫 번째로, 전역 최적화 프레임워크인 Shuffled Complex-Self Adaptive Hybrid EvoLution (SC-SAHEL)을 이용한 역산모형 기반 TSM 매개변수 산정 프레임워크를 제시하였다. 둘째로는 기호회귀법 라이브러리인 GPTIPS를 이용한 다중유전자 유전 프로그래밍(Multigene Genetic Programming, MGGP) 과 주성분회귀법(Principal Components Regression, PCR)을 통해 네 가지 매개변수 별로 각 두 개씩의 경험식이 개발되었다. 개발된 경험식들의 성능평가 결과, 선행 연구에서 제시된 저장대 매개변수 식에 비해 본 연구에서 제시된 방법이 대체적으로 우수한 것으로 나타났다. 결과적으로 본 연구에서는 분석을 통해 실무적으로 활용 가능한 TSM 매개변수 산정 프레임워크와 경험식들이 제시되었으며, 이 방법들은 추적자 실험 자료의 유무에 따라 TSM의 매개변수 결정에 유용하게 사용될 것으로 기대된다.Chapter 1. Introduction 1 1.1 Necessity and Background of Research 1 1.2 Objectives 12 Chapter 2. Theoretical Background 15 2.1 Transient Storage Model 15 2.1.1. Mechanisms of Transient Storage 15 2.1.2. Models Accounting for Transient Storage 21 2.1.2.1 The one Zone Transient Storage Model (1Z-TSM) 24 2.1.2.2 The two Zone Transient Storage Model (2Z-TSM) 25 2.1.2.3 The Continuous Time Random Walk Approach (CTRW) 26 2.1.2.4 The Modified Advection Dispersion Model (MADE) 27 2.1.2.5 The Fractional Advection Dispersion Equation Model (FADE) 28 2.1.2.6 The Multirate Mass Transfer Model (MRMT) 29 2.1.2.7 The Advective Storage Path Model (ASP) 30 2.1.2.8 The Solute Transport in Rivers Model (STIR) 31 2.1.2.9 The Aggregate Dead Zone Model (ADZ) 34 2.2 Empirical Equations for Predicting Transient Storage Model Parameters 39 2.3 Parameter Estimation 47 2.3.1. The SC-SAHEL Framework 50 2.3.1.1 Modified Competitive Complex Evolution (MCCE) 52 2.3.1.2 Modified Frog Leaping (MFL) 52 2.3.1.3 Modified Grey Wolf Optimizer (GWO) 53 2.3.1.4 Modified Differential Evolution (DE) 53 2.4 Regression Method 54 2.4.1. The Multi-Gene Genetic Programming (MGGP) 56 2.4.1.1 The Simple Genetic Programming 56 2.4.1.2 Scaled Symbolic Regression via Multi-Gene Genetic Programming 57 2.4.2. Evolutionary Polynomial Regression (EPR) 61 2.4.2.1 Main Flow of EPR Procedure 62 Chapter 3. Model Development 66 3.1 Numerical Model 66 3.1.1. Model Validation 69 3.2 Merger of TSM-SC-SAHEL 73 3.3 Further assessments for the parameter estimation framework 76 3.3.1. Tracer Test Description 76 3.3.2. Grid Independency of Estimation 81 3.3.3. Choice of Optimization Setting 85 Chapter 4. Development of Formulae for Predicting TSM Parameter 91 4.1 Dimensional Analysis 91 4.2 Data Collection via Meta Analysis 95 4.3 Formulae Development 106 Chapter 5. Result and Discussion 110 5.1 Model Performances 110 5.2 Sensitivity Analysis 118 5.3 In-stream Application of Empirical Equations 130 Chapter 6. Conclusion 140 References 144 Appendix. I. The mean, minimum, and maximum values of the model fitness value and number of evolution using the SC-SAHEL with single-EA and multi-EA 159 Appendix. II. Used dimensionless datasets for development of empirical equations 161 국문초록 165Maste

SNU Open Repository and Archive

Forecasting Government Bond Spreads with Heuristic Models:Evidence from the Eurozone Periphery

Author: A Breedam Van
A Timmermann
AF Shapiro
AH Gandomi
B Schölkopf
C Ferreira
C Leschinski
C Stasinakis
CA Favero
Charalampos Stasinakis
CJ Lin
CJ Lu
CL Dunis
CL Dunis
CM Fonseca
CM Jarque
CY Yeh
D Aristei
D Duffie
D Karaboga
DE Goldberg
DE Lee
EG Talbi
FC Yuan
Filipa Da Silva Fernandes
FX Diebold
FX Diebold
G Sermpinis
G Sermpinis
G Sermpinis
GG Wang
H Ahn
H Dewachter
H Kim
H Yang
I Kaastra
J Alcaraz
J Holland
J Koza
J Paniagua
J Patel
JJ Liang
JL Elman
KS Chan
L Wang
LJ Kao
M Dorigo
M Gilli
M Gilli
M Gómez-Puig
MH Pesaran
N Antonakakis
P Abad
P Tenti
R Aguilar-Rivera
S Lautenschläger
S Li
S Manganelli
S Mirjalili
S Wu
V Cherkassky
V Vapnik
WS Cleveland
X Li
XS Yang
XS Yang
Zivile Zekaite
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/03/2018
Field of study

This study investigates the predictability of European long-term government bond spreads through the application of heuristic and metaheuristic support vector regression (SVR) hybrid structures. Genetic, krill herd and sine–cosine algorithms are applied to the parameterization process of the SVR and locally weighted SVR (LSVR) methods. The inputs of the SVR models are selected from a large pool of linear and non-linear individual predictors. The statistical performance of the main models is evaluated against a random walk, an Autoregressive Moving Average, the best individual prediction model and the traditional SVR and LSVR structures. All models are applied to forecast daily and weekly government bond spreads of Greece, Ireland, Italy, Portugal and Spain over the sample period 2000–2017. The results show that the sine–cosine LSVR is outperforming its counterparts in terms of statistical accuracy, while metaheuristic approaches seem to benefit the parameterization process more than the heuristic ones

Crossref

Coventry University Pure Portal

Enlighten

Parallelizing Set Similarity Joins

Author: Fier Fabian
Publication venue: Humboldt-Universität zu Berlin
Publication date: 24/01/2022
Field of study

Eine der größten Herausforderungen in Data Science ist heutzutage, Daten miteinander in Beziehung zu setzen und ähnliche Daten zu finden. Hierzu kann der aus relationalen Datenbanken bekannte Join-Operator eingesetzt werden. Das Konzept der Ähnlichkeit wird häufig durch mengenbasierte Ähnlichkeitsfunktionen gemessen. Um solche Funktionen als Join-Prädikat nutzen zu können, setzt diese Arbeit voraus, dass Records aus Mengen von Tokens bestehen. Die Arbeit fokussiert sich auf den mengenbasierten Ähnlichkeitsjoin, Set Similarity Join (SSJ). Die Datenmenge, die es heute zu verarbeiten gilt, ist groß und wächst weiter. Der SSJ hingegen ist eine rechenintensive Operation. Um ihn auf großen Daten ausführen zu können, sind neue Ansätze notwendig. Diese Arbeit fokussiert sich auf das Mittel der Parallelisierung. Sie leistet folgende drei Beiträge auf dem Gebiet der SSJs. Erstens beschreibt und untersucht die Arbeit den aktuellen Stand paralleler SSJ-Ansätze. Diese Arbeit vergleicht zehn Map-Reduce-basierte Ansätze aus der Literatur sowohl analytisch als auch experimentell. Der größte Schwachpunkt aller Ansätze ist überraschenderweise eine geringe Skalierbarkeit aufgrund zu hoher Datenreplikation und/ oder ungleich verteilter Daten. Keiner der Ansätze kann den SSJ auf großen Daten berechnen. Zweitens macht die Arbeit die verfügbare hohe CPU-Parallelität moderner Rechner für den SSJ nutzbar. Sie stellt einen neuen daten-parallelen multi-threaded SSJ-Ansatz vor. Der vorgestellte Ansatz ermöglicht erhebliche Laufzeit-Beschleunigungen gegenüber der Ausführung auf einem Thread. Drittens stellt die Arbeit einen neuen hoch skalierbaren verteilten SSJ-Ansatz vor. Mit einer kostenbasierten Heuristik und einem daten-unabhängigen Skalierungsmechanismus vermeidet er Daten-Replikation und wiederholte Berechnungen. Der Ansatz beschleunigt die Join-Ausführung signifikant und ermöglicht die Ausführung auf erheblich größeren Datenmengen als bisher betrachtete parallele Ansätze.One of today's major challenges in data science is to compare and relate data of similar nature. Using the join operation known from relational databases could help solving this problem. Given a collection of records, the join operation finds all pairs of records, which fulfill a user-chosen predicate. Real-world problems could require complex predicates, such as similarity. A common way to measure similarity are set similarity functions. In order to use set similarity functions as predicates, we assume records to be represented by sets of tokens. In this thesis, we focus on the set similarity join (SSJ) operation. The amount of data to be processed today is typically large and grows continually. On the other hand, the SSJ is a compute-intensive operation. To cope with the increasing size of input data, additional means are needed to develop scalable implementations for SSJ. In this thesis, we focus on parallelization. We make the following three major contributions to SSJ. First, we elaborate on the state-of-the-art in parallelizing SSJ. We compare ten MapReduce-based approaches from the literature analytically and experimentally. Their main limit is surprisingly a low scalability due to too high and/or skewed data replication. None of the approaches could compute the join on large datasets. Second, we leverage the abundant CPU parallelism of modern commodity hardware, which has not yet been considered to scale SSJ. We propose a novel data-parallel multi-threaded SSJ. Our approach provides significant speedups compared to single-threaded executions. Third, we propose a novel highly scalable distributed SSJ approach. With a cost-based heuristic and a data-independent scaling mechanism we avoid data replication and recomputation. A heuristic assigns similar shares of compute costs to each node. Our approach significantly scales up the join execution and processes much larger datasets than all parallel approaches designed and implemented so far

Dokumenten-Publikationsserver der Humboldt-Universität zu Berlin