29 research outputs found
Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures
The QR factorization and the SVD are two fundamental matrix decompositions
with applications throughout scientific computing and data analysis. For
matrices with many more rows than columns, so-called "tall-and-skinny
matrices," there is a numerically stable, efficient, communication-avoiding
algorithm for computing the QR factorization. It has been used in traditional
high performance computing and grid computing environments. For MapReduce
environments, existing methods to compute the QR decomposition use a
numerically unstable approach that relies on indirectly computing the Q factor.
In the best case, these methods require only two passes over the data. In this
paper, we describe how to compute a stable tall-and-skinny QR factorization on
a MapReduce architecture in only slightly more than 2 passes over the data. We
can compute the SVD with only a small change and no difference in performance.
We present a performance comparison between our new direct TSQR method, a
standard unstable implementation for MapReduce (Cholesky QR), and the classic
stable algorithm implemented for MapReduce (Householder QR). We find that our
new stable method has a large performance advantage over the Householder QR
method. This holds both in a theoretical performance model as well as in an
actual implementation
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures
Scientific problems that depend on processing large amounts of data require
overcoming challenges in multiple areas: managing large-scale data
distribution, co-placement and scheduling of data with compute resources, and
storing and transferring large volumes of data. We analyze the ecosystems of
the two prominent paradigms for data-intensive applications, hereafter referred
to as the high-performance computing and the Apache-Hadoop paradigm. We propose
a basis, common terminology and functional factors upon which to analyze the
two approaches of both paradigms. We discuss the concept of "Big Data Ogres"
and their facets as means of understanding and characterizing the most common
application workloads found across the two paradigms. We then discuss the
salient features of the two paradigms, and compare and contrast the two
approaches. Specifically, we examine common implementation/approaches of these
paradigms, shed light upon the reasons for their current "architecture" and
discuss some typical workloads that utilize them. In spite of the significant
software distinctions, we believe there is architectural similarity. We discuss
the potential integration of different implementations, across the different
levels and components. Our comparison progresses from a fully qualitative
examination of the two paradigms, to a semi-quantitative methodology. We use a
simple and broadly used Ogre (K-means clustering), characterize its performance
on a range of representative platforms, covering several implementations from
both paradigms. Our experiments provide an insight into the relative strengths
of the two paradigms. We propose that the set of Ogres will serve as a
benchmark to evaluate the two paradigms along different dimensions.Comment: 8 pages, 2 figure
Desarrollo de algoritmos genéticos utilizando diferentes frameworks de MapReduce: MPI vs. Hadoop
MapReduce es un paradigma popular, que permite a los usuarios no especializados utilizar grandes plataformas computacionales paralelas de manera transparente. Hadoop es la implementación más utilizada de este paradigma y, de hecho, para una gran cantidad de usuarios, la palabra Hadoop y MapReduce son intercambiables.
Pero, hay otros framewoks que implementan este paradigma de programación, como MapReduce-MPI. Dado que las técnicas de optimización pueden beneficiarse enormemente de este tipo de modelado informático de uso intensivo de datos, en esta lÃnea de investigacón analizamos el efecto del rendimiento del desarrollo de algoritmos genéticos (GA) utilizando diferentes marcos de MapReduce (MRGA).Eje: Agentes y Sistemas Inteligentes.Red de Universidades con Carreras en Informátic
Desarrollo de algoritmos genéticos utilizando diferentes frameworks de MapReduce: MPI vs. Hadoop
MapReduce es un paradigma popular, que permite a los usuarios no especializados utilizar grandes plataformas computacionales paralelas de manera transparente. Hadoop es la implementación más utilizada de este paradigma y, de hecho, para una gran cantidad de usuarios, la palabra Hadoop y MapReduce son intercambiables.
Pero, hay otros framewoks que implementan este paradigma de programación, como MapReduce-MPI. Dado que las técnicas de optimización pueden beneficiarse enormemente de este tipo de modelado informático de uso intensivo de datos, en esta lÃnea de investigacón analizamos el efecto del rendimiento del desarrollo de algoritmos genéticos (GA) utilizando diferentes marcos de MapReduce (MRGA).Eje: Agentes y Sistemas Inteligentes.Red de Universidades con Carreras en Informátic