451 research outputs found

    Benchmarking Hadoop Performance in the Cloud - An in Depth Study of Resource Management and Energy Consumption

    Get PDF
    International audienceVirtual technologies have proven their capabilities to ensure good performance in the context of high performance computing (HPC). During the last decade, the big data tools have been emerging, they have their own needs in performance and infrastructure. Having a wide breadth of experience in the HPC domain, the experts can evaluate the infrastructures used to run big data tools easily. The outcome of this paper is the evaluation of two technologies of virtualization in the context of big data tools. We compare the performance and the energy consumption of two technologies of virtualization (Docker containers and VMware) and benchmark the software Hadoop (JoshBaer, 2015) using these environments. Firstly, the aim is the reduction of the Hadoop deployment cost using the cloud. Secondly, we discuss and analyze the assumptions learned from the HPC experiments and their applicability in the big data context. Thirdly, the Hadoop community finds an in-depth study of the resource consumption depending on the deployment environment. We come to the point that the use of the Docker container gives better performance in most experiments. Besides, the energy consumption varies according to the executed workload

    Problème d'ordonnancement dans Hadoop, ordonnancement sur des machines parallèles équivalentes et distribuées

    Get PDF
    International audienceLes outils de gestion de gros volumes de données sont connus pour leur capacité à exécuter un grand nombre de travaux sur des volumes de données énormes (de l'ordre de plusieurs Pétaoctets). De ce fait, ces outils utilisent de grandes infrastructures capables de fournir la puissance de calcul demandée de manière à effectuer les traitements dans un temps raisonnable.Dans ce papier, on s'intéresse à l'amélioration des performances du logiciel "Hadoop", le logiciel libre de référence dans l'univers des logiciels de traitement de gros volumes de données. Dans une première partie, on modélise le problème d'ordonnancement à l'aide de la programmation linéaire, on évalue le modèle et on calcule ainsi une borne inférieure pour la version hors ligne du problème. On propose une heuristique et on évalue la solution qu'on propose sur des instances de taille moyenne

    Virtualization Technologies for Hadoop-based applications

    Get PDF
    International audienceToday, consumers request virtual resources like CPU, RAM, disk (etc.) supplied by the service providers (like Amazon) and they pay on a " pay-as-you-go " basis. Generally, the supervisors adopt virtualization technologies, which optimize resources usage and limit the operating cost. The virtualiza-tion technologies are classified in two categories. The first one concerns the heavy virtualization, which is based on virtual machines (VM) concept. Each VM emulates hardware and embeds its own operating system (OS) that is completely isolated from the host OS. The second one concerns the light virtualization, which is based on the management of containers. The containers share the host OS kernel [5] while ensuring isolation. In this paper, we benchmark the performance and the energy consumption of an infrastructure that is based on the software Hadoop regarding the two technologies of virtual-ization. At first, we will identify the points to be improved concerning Hadoop performances and then we will reduce the deployment cost on the cloud. Second, the Hadoop community finds an in-depth study of the resources consumption depending on the environment of deployment. Our experiments are based on the comparison of the Docker technology (light virtualization) and VMware technology R (heavy vir-tualization). We come to the point that in most experiments the light technology offers better performances in completion time of workloads and it is more adapted to be used with the Hadoop software

    Offline Scheduling of Map and Reduce Tasks on Hadoop Systems

    Get PDF
    International audienceMapReduce is a model to manage quantities massive of data. It is based on the distributed and parallel execution of tasks over the cluster of machines. Hadoop is an implementation of MapReduce model, it is used to offer BigData services on the cloud. In this paper, we expose the scheduling problem on Hadoop systems. We focus on the offline-scheduling, expose the problem in a mathematic model and use the time-indexed formulation. We aim consider the maximum of constraints of the MapReduce environment. Solutions for the presented model would be a reference for the on-line Schedules in the case of low and medium instances. Our work is useful in term of the problem definition: constraints are based on observations and take into account resources consumption, data locality, heterogeneous machines and workflow management; this paper defines boundaries references to evaluate the online model

    Optimisation du Probìème d'Ordonnancement à Machines Parallèles dans Hadoop

    Get PDF
    National audienceOn s’intéresse dans ce travail `a l’amélioration du fonctionnement d’un logiciel de traitement distribué de gros volumes de données nommé Hadoop. Notre objective est l’optimisation de l’ordonnancement d’un ensemble de travaux sur une architecture `a machines parallèles, en se restreignant aux travaux du type Map / Reduce. Puisque le problème est NP-difficile et les instances considérées sont de grande taille, on propose deux heuristiques de résolution basées sur des algorithmes de listes.Le modèle Map / Reduce est un modèle de développement introduit par Google en 2004 dont l’objectif est de faciliter le développement et l’exécution d’applications parallèles. Ce modèle impose un cadre de développement : (1) Un travail Map / Reduce est composé de deux types de tâches : les tâches Map et les tâches Reduce. Les tâches Map s’exécutent avant les tâches Reduce. Les tâches Map s’occupent d’effectuer les calculs alors que les tâches Reduce les agrègent. Implicitement, les données de sortie des tâches Map sont transférées à travers le réseau aux machines exécutant les tâches Reduce associées. (2) Les tâches Map et les tâches Reduce ne supportent pas d’être interrompues : si une tache est interrompue, elle sera relancée comme si elle s’exécutait pour la première fois

    A Back-reaction Induced Lower Bound on the Tensor-to-Scalar Ratio

    Full text link
    There are large classes of inflationary models, particularly popular in the context of string theory and brane world approaches to inflation, in which the ratio of linearized tensor to scalar metric fluctuations is very small. In such models, however, gravitational waves produced by scalar modes cannot be neglected. We derive the lower bound on the tensor-to-scalar ratio by considering the back-reaction of the scalar perturbations as a source of gravitational waves. These results show that no cosmological model that is compatible with a metric scalar amplitude of 105\approx 10^{-5} can have a ratio of the tensor to scalar power spectra less than 108\approx 10^{-8} at recombination and that higher-order terms leads to logarithmic growth for r during radiation domination. Our lower bound also applies to non-inflationary models which produce an almost scale-invariant spectrum of coherent super-Hubble scale metric fluctuations.Comment: 5 pages, version 3, minor changes from version

    An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems

    Get PDF
    International audienceGraph edit distance is an error tolerant matching technique emerged as a powerful and flexible graph matching paradigm that can be used to address different tasks in pattern recognition, machine learning and data mining; it represents the minimum-cost sequence of basic edit operations to transform one graph into another by means of insertion, deletion and substitution of vertices and/or edges. A widely used method for exact graph edit distance computation is based on the A* algorithm. To overcome its high memory load while traversing the search tree for storing pending solutions to be explored, we propose a depth-first graph edit distance algorithm which requires less memory and searching time. An evaluation of all possible solutions is performed without explicitly enumerating them all. Candidates are discarded using an upper and lower bounds strategy. A solid experimental study is proposed; experiments on a publicly available database empirically demonstrated that our approach is better than the A* graph edit distance computation in terms of speed, accuracy and classification rate

    The Effects of Gravitational Back-Reaction on Cosmological Perturbations

    Full text link
    Because of the non-linearity of the Einstein equations, the cosmological fluctuations which are generated during inflation on a wide range of wavelengths do not evolve independently. In particular, to second order in perturbation theory, the first order fluctuations back-react both on the background geometry and on the perturbations themselves. I this paper, the gravitational back-reaction of long wavelength (super-Hubble) scalar metric fluctuations on the perturbations themselves is investigated for a large class of inflationary models. Specifically, the equations describing the evolution of long wavelength cosmological metric and matter perturbations in an inflationary universe are solved to second order in both the amplitude of the perturbations and in the slow roll expansion parameter. Assuming that the linear fluctuations have random phases, we show that the fractional correction to the power spectrum due to the leading infrared back-reaction terms does not change the shape of the spectrum. The amplitude of the effect is suppressed by the product of the inflationary slow-roll parameter and the amplitude of the linear power spectrum. The non-gaussianity of the spectrum induced by back-reaction is commented upon.Comment: 9 page

    Brane-Antibrane Inflation in Orbifold and Orientifold Models

    Get PDF
    We analyse the cosmological implications of brane-antibrane systems in string-theoretic orbifold and orientifold models. In a class of realistic models, consistency conditions require branes and antibranes to be stuck at different fixed points, and so their mutual attraction generates a potential for one of the radii of the underlying torus or the 4D string dilaton. Assuming that all other moduli have been fixed by string effects, we find that this potential leads naturally to a period of cosmic inflation with the radion or dilaton field as the inflaton. The slow-roll conditions are satisfied more generically than if the branes were free to move within the space. The appearance of tachyon fields at certain points in moduli space indicates the onset of phase transitions to different non-BPS brane systems, providing ways of ending inflation and reheating the corresponding observable brane universe. In each case we find relations between the inflationary parameters and the string scale to get the correct spectrum of density perturbations. In some examples the small numbers required as inputs are no smaller than 0.01, and are the same small quantities which are required to explain the gauge hierarchy.Comment: 30 pages, 2 figures. Substantial changes on version 1. New cosmological scenarios proposed including the dilaton as the inflaton. Main conclusions unchange

    Branonium

    Full text link
    We study the bound states of brane/antibrane systems by examining the motion of a probe antibrane moving in the background fields of N source branes. The classical system resembles the point-particle central force problem, and the orbits can be solved by quadrature. Generically the antibrane has orbits which are not closed on themselves. An important special case occurs for some Dp-branes moving in three transverse dimensions, in which case the orbits may be obtained in closed form, giving the standard conic sections but with a nonstandard time evolution along the orbit. Somewhat surprisingly, in this case the resulting elliptical orbits are exact solutions, and do not simply apply in the limit of asymptotically-large separation or non-relativistic velocities. The orbits eventually decay through the radiation of massless modes into the bulk and onto the branes, and we estimate this decay time. Applications of these orbits to cosmology are discussed in a companion paper.Comment: 34 pages, LaTeX, 4 figures, uses JHEP
    corecore