3,555 research outputs found

    Monitoring Challenges and Approaches for P2P File-Sharing Systems

    Get PDF
    Since the release of Napster in 1999, P2P file-sharing has enjoyed a dramatic rise in popularity. A 2000 study by Plonka on the University of Wisconsin campus network found that file-sharing accounted for a comparable volume of traffic to HTTP, while a 2002 study by Saroiu et al. on the University of Washington campus network found that file-sharing accounted for more than treble the volume of Web traffic observed, thus affirming the significance of P2P in the context of Internet traffic. Empirical studies of P2P traffic are essential for supporting the design of next-generation P2P systems, informing the provisioning of network infrastructure and underpinning the policing of P2P systems. The latter is of particular significance as P2P file-sharing systems have been implicated in supporting criminal behaviour including copyright infringement and the distribution of illegal pornograph

    ALOJA: A benchmarking and predictive platform for big data performance analysis

    Get PDF
    The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectivenessof Big Data deployments. The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and web-based analytic tools to gather insights about system's cost-performance1. This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under dif- ferent configuration and deployments options, presents results, and dis cusses the motivation both technical and market-based of such changes. During this time, ALOJA's target has evolved from a previous low-level profiling of Hadoop runtime, passing through extensive benchmarking and evaluation of a large body of results via aggregation, to currently leveraging Predictive Analytics (PA) techniques. Modeling benchmark executions allow us to estimate the results of new or untested configu- rations or hardware set-ups automatically, by learning techniques from past observations saving in benchmarking time and costs.This work is partially supported the BSC-Microsoft Research Centre, the Span- ish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Metascheduling of HPC Jobs in Day-Ahead Electricity Markets

    Full text link
    High performance grid computing is a key enabler of large scale collaborative computational science. With the promise of exascale computing, high performance grid systems are expected to incur electricity bills that grow super-linearly over time. In order to achieve cost effectiveness in these systems, it is essential for the scheduling algorithms to exploit electricity price variations, both in space and time, that are prevalent in the dynamic electricity price markets. In this paper, we present a metascheduling algorithm to optimize the placement of jobs in a compute grid which consumes electricity from the day-ahead wholesale market. We formulate the scheduling problem as a Minimum Cost Maximum Flow problem and leverage queue waiting time and electricity price predictions to accurately estimate the cost of job execution at a system. Using trace based simulation with real and synthetic workload traces, and real electricity price data sets, we demonstrate our approach on two currently operational grids, XSEDE and NorduGrid. Our experimental setup collectively constitute more than 433K processors spread across 58 compute systems in 17 geographically distributed locations. Experiments show that our approach simultaneously optimizes the total electricity cost and the average response time of the grid, without being unfair to users of the local batch systems.Comment: Appears in IEEE Transactions on Parallel and Distributed System

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    System analysis of a Peer-to-Peer Video-on-Demand architecture : Kangaroo

    Get PDF
    Architectural design and deployment of Peer-to-Peer Video-on-Demand (P2PVoD) systems which support VCR functionalities is attracting the interest of an increasing number of research groups within the scientific community; especially due to the intrinsic characteristics of such systems and the benefits that peers could provide at reducing the server load. This work focuses on the performance analysis of a P2P-VoD system considering user behaviors obtained from real traces together with other synthetic user patterns. The experiments performed show that it is feasible to achieve a performance close to the best possible. Future work will consider monitoring the physical characteristics of the network in order to improve the design of different aspects of a VoD system.El disseny arquitectònic i el desplegament de sistemes de Vídeo sota Demanda "Peer-to-Peer" que soporten funcionalitats VCR està captant l'interès d'un nombre creixent de grups de recerca a la comunitat científica, degut especialment a les característiques intrínsiques dels mencionats sistemes i als beneficis que els peers podrien proporcionar a la reducció de la càrrega en el servidor. Aquest treball tracta l'anàlisi del rendiment d'un sistema P2P-VoD considerant el comportament d'usuaris obtingut amb traçes reals i amb patrons sintètics. Els experiments realitzats mostren que és viable assolir un rendiment proper al cas més óptim. Com treball futur es considerarà la monitorització de les característiques físiques de la xarxa per a poder millorar el disseny dels diferents aspectes que formen un sistema de VoD.El diseño arquitectónico y el despliegue de sistemas de Video bajo Demanda "Peer-to-Peer" que soportan funcionalidades VCR está captando el interés de un número creciente de grupos de investigación dentro de la comunidad científica; especialmente debido a las características intrínsecas de tales sistemas y a los beneficios que los peers podrían proporcionar en la reducción de la carga en el servidor. Este trabajo se enfoca en el análisis de rendimiento de un sistema P2PVoD considerando el comportamiento de usuarios obtenido de trazas reales, junto a otros patrones sintéticos. Los experimentos realizados muestran que es viable lograr un rendimiento cercano al caso más óptimo. El trabajo futuro considerará la monitorización de las características físicas de la red para poder mejorar el diseño de los diferentes aspectos que conforman un sistema de VoD

    Collocation Games and Their Application to Distributed Resource Management

    Full text link
    We introduce Collocation Games as the basis of a general framework for modeling, analyzing, and facilitating the interactions between the various stakeholders in distributed systems in general, and in cloud computing environments in particular. Cloud computing enables fixed-capacity (processing, communication, and storage) resources to be offered by infrastructure providers as commodities for sale at a fixed cost in an open marketplace to independent, rational parties (players) interested in setting up their own applications over the Internet. Virtualization technologies enable the partitioning of such fixed-capacity resources so as to allow each player to dynamically acquire appropriate fractions of the resources for unencumbered use. In such a paradigm, the resource management problem reduces to that of partitioning the entire set of applications (players) into subsets, each of which is assigned to fixed-capacity cloud resources. If the infrastructure and the various applications are under a single administrative domain, this partitioning reduces to an optimization problem whose objective is to minimize the overall deployment cost. In a marketplace, in which the infrastructure provider is interested in maximizing its own profit, and in which each player is interested in minimizing its own cost, it should be evident that a global optimization is precisely the wrong framework. Rather, in this paper we use a game-theoretic framework in which the assignment of players to fixed-capacity resources is the outcome of a strategic "Collocation Game". Although we show that determining the existence of an equilibrium for collocation games in general is NP-hard, we present a number of simplified, practically-motivated variants of the collocation game for which we establish convergence to a Nash Equilibrium, and for which we derive convergence and price of anarchy bounds. In addition to these analytical results, we present an experimental evaluation of implementations of some of these variants for cloud infrastructures consisting of a collection of multidimensional resources of homogeneous or heterogeneous capacities. Experimental results using trace-driven simulations and synthetically generated datasets corroborate our analytical results and also illustrate how collocation games offer a feasible distributed resource management alternative for autonomic/self-organizing systems, in which the adoption of a global optimization approach (centralized or distributed) would be neither practical nor justifiable.NSF (CCF-0820138, CSR-0720604, EFRI-0735974, CNS-0524477, CNS-052016, CCR-0635102); Universidad Pontificia Bolivariana; COLCIENCIAS–Instituto Colombiano para el Desarrollo de la Ciencia y la Tecnología "Francisco José de Caldas
    corecore