451 research outputs found
Benchmarking Hadoop Performance in the Cloud - An in Depth Study of Resource Management and Energy Consumption
International audienceVirtual technologies have proven their capabilities to ensure good performance in the context of high performance computing (HPC). During the last decade, the big data tools have been emerging, they have their own needs in performance and infrastructure. Having a wide breadth of experience in the HPC domain, the experts can evaluate the infrastructures used to run big data tools easily. The outcome of this paper is the evaluation of two technologies of virtualization in the context of big data tools. We compare the performance and the energy consumption of two technologies of virtualization (Docker containers and VMware) and benchmark the software Hadoop (JoshBaer, 2015) using these environments. Firstly, the aim is the reduction of the Hadoop deployment cost using the cloud. Secondly, we discuss and analyze the assumptions learned from the HPC experiments and their applicability in the big data context. Thirdly, the Hadoop community finds an in-depth study of the resource consumption depending on the deployment environment. We come to the point that the use of the Docker container gives better performance in most experiments. Besides, the energy consumption varies according to the executed workload
Problème d'ordonnancement dans Hadoop, ordonnancement sur des machines parallèles équivalentes et distribuées
International audienceLes outils de gestion de gros volumes de données sont connus pour leur capacité à exécuter un grand nombre de travaux sur des volumes de données énormes (de l'ordre de plusieurs Pétaoctets). De ce fait, ces outils utilisent de grandes infrastructures capables de fournir la puissance de calcul demandée de manière à effectuer les traitements dans un temps raisonnable.Dans ce papier, on s'intéresse à l'amélioration des performances du logiciel "Hadoop", le logiciel libre de référence dans l'univers des logiciels de traitement de gros volumes de données. Dans une première partie, on modélise le problème d'ordonnancement à l'aide de la programmation linéaire, on évalue le modèle et on calcule ainsi une borne inférieure pour la version hors ligne du problème. On propose une heuristique et on évalue la solution qu'on propose sur des instances de taille moyenne
Virtualization Technologies for Hadoop-based applications
International audienceToday, consumers request virtual resources like CPU, RAM, disk (etc.) supplied by the service providers (like Amazon) and they pay on a " pay-as-you-go " basis. Generally, the supervisors adopt virtualization technologies, which optimize resources usage and limit the operating cost. The virtualiza-tion technologies are classified in two categories. The first one concerns the heavy virtualization, which is based on virtual machines (VM) concept. Each VM emulates hardware and embeds its own operating system (OS) that is completely isolated from the host OS. The second one concerns the light virtualization, which is based on the management of containers. The containers share the host OS kernel [5] while ensuring isolation. In this paper, we benchmark the performance and the energy consumption of an infrastructure that is based on the software Hadoop regarding the two technologies of virtual-ization. At first, we will identify the points to be improved concerning Hadoop performances and then we will reduce the deployment cost on the cloud. Second, the Hadoop community finds an in-depth study of the resources consumption depending on the environment of deployment. Our experiments are based on the comparison of the Docker technology (light virtualization) and VMware technology R (heavy vir-tualization). We come to the point that in most experiments the light technology offers better performances in completion time of workloads and it is more adapted to be used with the Hadoop software
Offline Scheduling of Map and Reduce Tasks on Hadoop Systems
International audienceMapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the time-indexed formulation. We aim consider the maximum of constraints of the MapReduce environment. Solutions for the presented model would be a reference for the on-line Schedules in the case of low and medium instances. Our work is useful in term of the problem definition: constraints are based on observations and take into account resources consumption, data locality, heterogeneous machines and workflow management; this paper defines boundaries references to evaluate the online model
Optimisation du Probìème d'Ordonnancement à Machines Parallèles dans Hadoop
National audienceOn s’intéresse dans ce travail `a l’amélioration du fonctionnement d’un logiciel de traitement distribué de gros volumes de données nommé Hadoop. Notre objective est l’optimisation de l’ordonnancement d’un ensemble de travaux sur une architecture `a machines parallèles, en se restreignant aux travaux du type Map / Reduce. Puisque le problème est NP-difficile et les instances considérées sont de grande taille, on propose deux heuristiques de résolution basées sur des algorithmes de listes.Le modèle Map / Reduce est un modèle de développement introduit par Google en 2004 dont l’objectif est de faciliter le développement et l’exécution d’applications parallèles. Ce modèle impose un cadre de développement : (1) Un travail Map / Reduce est composé de deux types de tâches : les tâches Map et les tâches Reduce. Les tâches Map s’exécutent avant les tâches Reduce. Les tâches Map s’occupent d’effectuer les calculs alors que les tâches Reduce les agrègent. Implicitement, les données de sortie des tâches Map sont transférées à travers le réseau aux machines exécutant les tâches Reduce associées. (2) Les tâches Map et les tâches Reduce ne supportent pas d’être interrompues : si une tache est interrompue, elle sera relancée comme si elle s’exécutait pour la première fois
A Back-reaction Induced Lower Bound on the Tensor-to-Scalar Ratio
There are large classes of inflationary models, particularly popular in the
context of string theory and brane world approaches to inflation, in which the
ratio of linearized tensor to scalar metric fluctuations is very small. In such
models, however, gravitational waves produced by scalar modes cannot be
neglected. We derive the lower bound on the tensor-to-scalar ratio by
considering the back-reaction of the scalar perturbations as a source of
gravitational waves. These results show that no cosmological model that is
compatible with a metric scalar amplitude of can have a ratio
of the tensor to scalar power spectra less than at
recombination and that higher-order terms leads to logarithmic growth for r
during radiation domination. Our lower bound also applies to non-inflationary
models which produce an almost scale-invariant spectrum of coherent
super-Hubble scale metric fluctuations.Comment: 5 pages, version 3, minor changes from version
An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems
International audienceGraph edit distance is an error tolerant matching technique emerged as a powerful and flexible graph matching paradigm that can be used to address different tasks in pattern recognition, machine learning and data mining; it represents the minimum-cost sequence of basic edit operations to transform one graph into another by means of insertion, deletion and substitution of vertices and/or edges. A widely used method for exact graph edit distance computation is based on the A* algorithm. To overcome its high memory load while traversing the search tree for storing pending solutions to be explored, we propose a depth-first graph edit distance algorithm which requires less memory and searching time. An evaluation of all possible solutions is performed without explicitly enumerating them all. Candidates are discarded using an upper and lower bounds strategy. A solid experimental study is proposed; experiments on a publicly available database empirically demonstrated that our approach is better than the A* graph edit distance computation in terms of speed, accuracy and classification rate
The Effects of Gravitational Back-Reaction on Cosmological Perturbations
Because of the non-linearity of the Einstein equations, the cosmological
fluctuations which are generated during inflation on a wide range of
wavelengths do not evolve independently. In particular, to second order in
perturbation theory, the first order fluctuations back-react both on the
background geometry and on the perturbations themselves. I this paper, the
gravitational back-reaction of long wavelength (super-Hubble) scalar metric
fluctuations on the perturbations themselves is investigated for a large class
of inflationary models. Specifically, the equations describing the evolution of
long wavelength cosmological metric and matter perturbations in an inflationary
universe are solved to second order in both the amplitude of the perturbations
and in the slow roll expansion parameter. Assuming that the linear fluctuations
have random phases, we show that the fractional correction to the power
spectrum due to the leading infrared back-reaction terms does not change the
shape of the spectrum. The amplitude of the effect is suppressed by the product
of the inflationary slow-roll parameter and the amplitude of the linear power
spectrum. The non-gaussianity of the spectrum induced by back-reaction is
commented upon.Comment: 9 page
Brane-Antibrane Inflation in Orbifold and Orientifold Models
We analyse the cosmological implications of brane-antibrane systems in
string-theoretic orbifold and orientifold models. In a class of realistic
models, consistency conditions require branes and antibranes to be stuck at
different fixed points, and so their mutual attraction generates a potential
for one of the radii of the underlying torus or the 4D string dilaton. Assuming
that all other moduli have been fixed by string effects, we find that this
potential leads naturally to a period of cosmic inflation with the radion or
dilaton field as the inflaton. The slow-roll conditions are satisfied more
generically than if the branes were free to move within the space. The
appearance of tachyon fields at certain points in moduli space indicates the
onset of phase transitions to different non-BPS brane systems, providing ways
of ending inflation and reheating the corresponding observable brane universe.
In each case we find relations between the inflationary parameters and the
string scale to get the correct spectrum of density perturbations. In some
examples the small numbers required as inputs are no smaller than 0.01, and are
the same small quantities which are required to explain the gauge hierarchy.Comment: 30 pages, 2 figures. Substantial changes on version 1. New
cosmological scenarios proposed including the dilaton as the inflaton. Main
conclusions unchange
Branonium
We study the bound states of brane/antibrane systems by examining the motion
of a probe antibrane moving in the background fields of N source branes. The
classical system resembles the point-particle central force problem, and the
orbits can be solved by quadrature. Generically the antibrane has orbits which
are not closed on themselves. An important special case occurs for some
Dp-branes moving in three transverse dimensions, in which case the orbits may
be obtained in closed form, giving the standard conic sections but with a
nonstandard time evolution along the orbit. Somewhat surprisingly, in this case
the resulting elliptical orbits are exact solutions, and do not simply apply in
the limit of asymptotically-large separation or non-relativistic velocities.
The orbits eventually decay through the radiation of massless modes into the
bulk and onto the branes, and we estimate this decay time. Applications of
these orbits to cosmology are discussed in a companion paper.Comment: 34 pages, LaTeX, 4 figures, uses JHEP
- …
