35,820 research outputs found
Multi-Objective Big Data Optimization with jMetal and Spark
Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization
problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
A Reduction of the Elastic Net to Support Vector Machines with an Application to GPU Computing
The past years have witnessed many dedicated open-source projects that built
and maintain implementations of Support Vector Machines (SVM), parallelized for
GPU, multi-core CPUs and distributed systems. Up to this point, no comparable
effort has been made to parallelize the Elastic Net, despite its popularity in
many high impact applications, including genetics, neuroscience and systems
biology. The first contribution in this paper is of theoretical nature. We
establish a tight link between two seemingly different algorithms and prove
that Elastic Net regression can be reduced to SVM with squared hinge loss
classification. Our second contribution is to derive a practical algorithm
based on this reduction. The reduction enables us to utilize prior efforts in
speeding up and parallelizing SVMs to obtain a highly optimized and parallel
solver for the Elastic Net and Lasso. With a simple wrapper, consisting of only
11 lines of MATLAB code, we obtain an Elastic Net implementation that naturally
utilizes GPU and multi-core CPUs. We demonstrate on twelve real world data
sets, that our algorithm yields identical results as the popular (and highly
optimized) glmnet implementation but is one or several orders of magnitude
faster.Comment: 10 page
Introducing Molly: Distributed Memory Parallelization with LLVM
Programming for distributed memory machines has always been a tedious task,
but necessary because compilers have not been sufficiently able to optimize for
such machines themselves. Molly is an extension to the LLVM compiler toolchain
that is able to distribute and reorganize workload and data if the program is
organized in statically determined loop control-flows. These are represented as
polyhedral integer-point sets that allow program transformations applied on
them. Memory distribution and layout can be declared by the programmer as
needed and the necessary asynchronous MPI communication is generated
automatically. The primary motivation is to run Lattice QCD simulations on IBM
Blue Gene/Q supercomputers, but since the implementation is not yet completed,
this paper shows the capabilities on Conway's Game of Life
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
- …