Search CORE

8,597 research outputs found

A Framework for Genetic Algorithms Based on Hadoop

Author: Ferrucci F
Kechadi M
others
Salza P
Sarro F
Publication venue
Publication date: 01/01/2013
Field of study

Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are naturally parallel and accessing a parallel platform such as Cloud is easy and cheap. Apache Hadoop is one of the common services that can be used for parallel applications. However, using Hadoop to develop a parallel version of GAs is not simple without facing its inner workings. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets and shown very promising performance

arXiv.org e-Print Archive

UCL Discovery

Archivio della Ricerca - Università di Salerno

Feature selection in high-dimensional dataset using MapReduce

Author: Bontempi Gianluca
Borgne Yann-Aël Le
Reggiani Claudio
Publication venue
Publication date: 07/09/2017
Field of study

This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features

arXiv.org e-Print Archive

DI-fusion

Efficient Processing of k Nearest Neighbor Joins using MapReduce

Author: Chen Su
Lu Wei
Ooi Beng Chin
Shen Yanyan
Publication venue
Publication date: 01/01/2012
Field of study

k nearest neighbor join (kNN join), designed to find k nearest neighbors from a dataset S for every object in another dataset R, is a primitive operation widely adopted by many data mining applications. As a combination of the k nearest neighbor query and the join operation, kNN join is an expensive operation. Given the increasing volume of data, it is difficult to perform a kNN join on a centralized machine efficiently. In this paper, we investigate how to perform kNN join using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. In brief, the mappers cluster objects into groups; the reducers perform the kNN join on each group of objects separately. We design an effective mapping mechanism that exploits pruning rules for distance filtering, and hence reduces both the shuffling and computational costs. To reduce the shuffling cost, we propose two approximate algorithms to minimize the number of replicas. Extensive experiments on our in-house cluster demonstrate that our proposed methods are efficient, robust and scalable.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

ScholarBank@NUS

Parallel and distributed Gr\"obner bases computation in JAS

Author: Kredel Heinz
Publication venue
Publication date: 01/01/2010
Field of study

This paper considers parallel Gr\"obner bases algorithms on distributed memory parallel computers with multi-core compute nodes. We summarize three different Gr\"obner bases implementations: shared memory parallel, pure distributed memory parallel and distributed memory combined with shared memory parallelism. The last algorithm, called distributed hybrid, uses only one control communication channel between the master node and the worker nodes and keeps polynomials in shared memory on a node. The polynomials are transported asynchronous to the control-flow of the algorithm in a separate distributed data structure. The implementation is generic and works for all implemented (exact) fields. We present new performance measurements and discuss the performance of the algorithms.Comment: 14 pages, 8 tables, 13 figure

arXiv.org e-Print Archive

CiteSeerX

Two-Stage Eagle Strategy with Differential Evolution

Author: Deb Suash
Yang Xin-She
Publication venue
Publication date: 01/01/2012
Field of study

Efficiency of an optimization process is largely determined by the search algorithm and its fundamental characteristics. In a given optimization, a single type of algorithm is used in most applications. In this paper, we will investigate the Eagle Strategy recently developed for global optimization, which uses a two-stage strategy by combing two different algorithms to improve the overall search efficiency. We will discuss this strategy with differential evolution and then evaluate their performance by solving real-world optimization problems such as pressure vessel and speed reducer design. Results suggest that we can reduce the computing effort by a factor of up to 10 in many applications

arXiv.org e-Print Archive

CiteSeerX

Crossref

Middlesex University Research Repository

Towards co-designed optimizations in parallel frameworks: A MapReduce case study

Author: Barrett Colin
Kotselidis Christos
Luján Mikel
Publication venue
Publication date: 01/01/2016
Field of study

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs automatically without requiring modification to application code. The optimizer is able to speedup the execution time of MR4J by up to 2.0x. The introduced optimization not only improves the performance of the generated code, during the map phase, but also reduces the pressure on the garbage collector. This demonstrates how semantic information can be harnessed without sacrificing sound software engineering practices when using parallel software frameworks.Comment: 8 page

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Finding Top-k Dominance on Incomplete Big Data Using Map-Reduce Framework

Author: Ezatpoor Payam
Publication venue: Digital Scholarship@UNLV
Publication date: 01/05/2017
Field of study

Incomplete data is one major kind of multi-dimensional dataset that has random-distributed missing nodes in its dimensions. It is very difficult to retrieve information from this type of dataset when it becomes huge. Finding top-k dominant values in this type of dataset is a challenging procedure. Some algorithms are present to enhance this process but are mostly efficient only when dealing with a small-size incomplete data. One of the algorithms that make the application of TKD query possible is the Bitmap Index Guided (BIG) algorithm. This algorithm strongly improves the performance for incomplete data, but it is not originally capable of finding top-k dominant values in incomplete big data, nor is it designed to do so. Several other algorithms have been proposed to find the TKD query, such as Skyband Based and Upper Bound Based algorithms, but their performance is also questionable. Algorithms developed previously were among the first attempts to apply TKD query on incomplete data; however, all these had weak performances or were not compatible with the incomplete data. This thesis proposes MapReduced Enhanced Bitmap Index Guided Algorithm (MRBIG) for dealing with the aforementioned issues. MRBIG uses the MapReduce framework to enhance the performance of applying top-k dominance queries on huge incomplete datasets. The proposed approach uses the MapReduce parallel computing approach using multiple computing nodes. The framework separates the tasks between several computing nodes that independently and simultaneously work to find the result. This method has achieved up to two times faster processing time in finding the TKD query result in comparison to previously presented algorithms

University of Nevada, Las Vegas Repository

Techno-economic analysis of chemical looping combustion with humid air turbine power cycle

Author: Brandvoll
Fan
Ishida
Lyngfelt
Mattison
Parsons
Peltola
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

Power generation from fossil fuel-fired power plant is the largest single source of CO₂ emission. CO₂ emission contributes to climate change. On the other hand, renewable energy is hindered by complex constraints in dealing with large scale application and high price. Power generation from fossil fuels with CO₂ capture is therefore necessary to meet the increasing energy demand, and reduce the emission of CO₂. This paper presents a process simulation and economic analysis of the chemical looping combustion (CLC) integrated with humid air turbine (HAT) cycle for natural gas-fired power plant with CO₂ capture. The study shows that the CLC–HAT including CO₂ capture has a thermal efficiency of 57% at oxidizing temperature of 1200 °C and reducer inlet temperature of 530 °C. The economic evaluation shows that the 50 MWth plant with a projected lifetime of 30 years will have a payback period of 7 years and 6 years for conventional HAT and CLC–HAT cycles respectively. The analysis indicates that CLC–HAT process has a high potential to be commercialised

Repository@Hull - Worktribe

Crossref