20 research outputs found
Experimenting on Architectures for High Performance Computing
National audienceOverview of HPC architectures, of challenges of reproducible research, and of the Grid'5000 testbe
Reproducible Research in Computer Science
National audienceThe ability to reproduce experiments and results is condition to a solid scientific method. Scientific communities in physics or bio-informatique that use computing resources intensively for simulations or data-mining have initiated a movement towards higher-quality experimental methodology, sharing of experimental processes and software used. This movement now reaches computer science. In this talk, I will provide an overview of initiatives around reproducible research, and tools that already allow to improve one's daily practices. I will finish by talking about some long-term challenges
A survey of general-purpose experiment management tools for distributed systems
International audienceIn the field of large-scale distributed systems, experimentation is particularly difficult. The studied systems are complex, often nondeterministic and unreliable, software is plagued with bugs, whereas the experiment workflows are unclear and hard to reproduce. These obstacles led many independent researchers to design tools to control their experiments, boost productivity and improve quality of scientific results. Despite much research in the domain of distributed systems experiment management, the current fragmentation of efforts asks for a general analysis. We therefore propose to build a framework to uncover missing functionality of these tools, enable meaningful comparisons be-tween them and find recommendations for future improvements and research. The contribution in this paper is twofold. First, we provide an extensive list of features offered by general-purpose experiment management tools dedicated to distributed systems research on real platforms. We then use it to assess existing solutions and compare them, outlining possible future paths for improvements
Towards Complete Tracking of Provenance in Experimental Distributed Systems Research
International audienceRunning experiments on modern systems like supercomput-ers, cloud infrastructures or P2P networks became very complex, both technically and methodologically. It is difficult to rerun an experiment or understand its results even with technical background on the technology and methods used. Storing the provenance of experimental data, i.e., storing information about how the results were produced, proved to be a powerful tool to address similar problems in computational natural sciences. In this paper, we (1) survey provenance collection in various domains of computer science, (2) introduce a new classification of prove-nance types, and (3) sketch a design of a provenance system inspired by this classification
Une approche générique pour l'automatisation des expériences sur les réseaux informatiques
This thesis proposes a generic approach to automate network experiments for scenarios involving any networking technology on any type of network evaluation platform. The proposed approach is based on abstracting the experiment life cycle of the evaluation platforms into generic steps from which a generic experiment model and experimentation primitives are derived. A generic experimentation architecture is proposed, composed of an experiment model, a programmable experiment interface and an orchestration algorithm that can be adapted to network simulators, emulators and testbeds alike. The feasibility of the approach is demonstrated through the implementation of a framework capable of automating experiments using any combination of these platforms. Three main aspects of the framework are evaluated: its extensibility to support any type of platform, its efficiency to orchestrate experiments and its flexibility to support diverse use cases including education, platform management and experimentation with multiple platforms. The results show that the proposed approach can be used to efficiently automate experimentation on diverse platforms for a wide range of scenarios.Cette thèse propose une approche générique pour automatiser des expériences sur des réseaux quelle que soit la technologie utilisée ou le type de plate-forme d'évaluation. L'approche proposée est basée sur l'abstraction du cycle de vie de l'expérience en étapes génériques à partir desquelles un modèle d'expérience et des primitives d'expérimentation sont dérivés. Une architecture générique d'expérimentation est proposée, composée d'un modèle d'expérience générique, d'une interface pour programmer des expériences et d'un algorithme d'orchestration qui peux être adapté aux simulateurs, émulateurs et bancs d'essai de réseaux. La faisabilité de cette approche est démontrée par la mise en œuvre d'un framework capable d'automatiser des expériences sur toute combinaison de ces plateformes. Trois aspects principaux du framework sont évalués : son extensibilité pour s'adapter à tout type de plate-forme, son efficacité pour orchestrer des expériences et sa flexibilité pour permettre des cas d'utilisation divers, y compris l'enseignement, la gestion des plate-formes et l'expérimentation avec des plates-formes multiples. Les résultats montrent que l'approche proposée peut être utilisée pour automatiser efficacement l'expérimentation sur les plates-formes d'évaluation hétérogènes et pour un éventail de scénarios variés
A HyperNet Architecture
Network virtualization is becoming a fundamental building block of future Internet architectures. By adding networking resources into the “cloud”, it is possible for users to rent virtual routers from the underlying network infrastructure, connect them with virtual channels to form a virtual network, and tailor the virtual network (e.g., load application-specific networking protocols, libraries and software stacks on to the virtual routers) to carry out a specific task. In addition, network virtualization technology allows such special-purpose virtual networks to co-exist on the same set of network infrastructure without interfering with each other.
Although the underlying network resources needed to support virtualized networks are rapidly becoming available, constructing a virtual network from the ground up and using the network is a challenging and labor-intensive task, one best left to experts.
To tackle this problem, we introduce the concept of a HyperNet, a pre-built, pre-configured network package that a user can easily deploy or access a virtual network to carry out a specific task (e.g., multicast video conferencing). HyperNets package together the network topology configuration, software, and network services needed to create and deploy a custom virtual network. Users download HyperNets from HyperNet repositories and then “run” them on virtualized network infrastructure much like users download and run virtual appliances on a virtual machine. To support the HyperNet abstraction, we created a Network Hypervisor service that provides a set of APIs that can be called to create a virtual network with certain characteristics.
To evaluate the HyperNet architecture, we implemented several example Hyper-Nets and ran them on our prototype implementation of the Network Hypervisor. Our experiments show that the Hypervisor API can be used to compose almost any special-purpose network – networks capable of carrying out functions that the current Internet does not provide. Moreover, the design of our HyperNet architecture is highly extensible, enabling developers to write high-level libraries (using the Network Hypervisor APIs) to achieve complicated tasks
Precisão e repetibilidade de experimentos no planetlab
Orientador : Prof. Dr. Elias P. Duarte Jr.Co-orientador : Prof. Dr. Luis C. E. BonaDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 02/09/2014Inclui referênciasResumo: Desenvolvida como uma rede acadêmica, a Internet cresceu e evoluiu rapidamente. Hoje,
40 anos mais tarde, se tornou uma ampla plataforma de informação, interação social e comércio.
Os usuários e as aplicações da Internet atualmente demandam características de
desempenho, segurança, escalabilidade e mobilidade que não foram previstas no momento
de sua criação. Testbeds de larga escala, como o PlanetLab, são peças chave para o desenvolvimento
e avaliação de novas aplicações e arquiteturas que atendam a estas demandas.
O PlanetLab é um testbed de escala planetária, composto por mais de mil nodos espalhados
ao redor do mundo, que oferece aos seus usuários um ambiente real para a execução de
experimentos. No entanto, a execução de experimentos no PlanetLab é uma atividade que
pode se tornar muito complexa, especialmente por envolver uma grande quantidade de
nodos e a existência de instabilidades no desempenho nos nodos e na rede que os conecta,
prejudicando a precisão e repetibilidade dos resultados obtidos. Por estes motivos, existem
diversas ferramentas para gerenciamento de experimentos e descoberta de recursos
no PlanetLab. Neste trabalho, apresentamos uma avaliação experimental do impacto da
utilização de subconjuntos de nodos selecionados por uma ferramenta de monitoramento
da conectividade entre os nodos na precisão e repetibilidade dos resultados obtidos. São
realizados experimentos utilizando aplicações com diferentes perfis de utilização de recursos
e os resultados obtidos por diferentes subconjuntos de nodos são comparados. A
estratégia de seleção de nodos estudada reduziu a variação dos resultados obtidos em até
27% e obteve média de execução até 26% mais baixa que uma das estratégias alternativas.
Pode-se concluir que a utilização de subconjuntos de nodos selecionados por esta
ferramenta contribui para a precisão, repetibilidade e reprodutibilidade de experimentos
realizados no PlanetLab. Este trabalho também apresenta uma proposta de integração da
ferramenta de seleção de nodos ao portal de gerenciamento de experimentos PlanetMon,
com o objetivo de permitir que usuários do PlanetLab obtenham acesso a ferramenta de
seleção de modo conveniente e transparente enquanto gerenciam seus experimentos.Abstract: The Internet was originally developed as an academic network more than four decades ago.
Today, it has established itself as a global platform for human communications, allowing
information exchange, social interaction, and e-commerce. Current Internet users and
applications require high levels of performance, security, scalability and mobility. These
characteristics were not predicted at the time of its creation. Large-scale testbeds like
PlanetLab have been developed to allow the design and realistic evaluation of applications
and architectures to supply the new Internet demands. PlanetLab is a planetary scale
testbed, consisting of more than one thousand nodes spread around the globe, offering its
users a realistic environment for experiment execution. However, experiment execution
on PlanetLab can become a complex activity, especially because it involves configuring
a large number of nodes, and because the environment is highly unstable, due to performance
variations of both nodes and their network connections. These instabilities affect
the precision and repeatability of the results obtained. There are several tools for experiment
management and resource discovery on PlanetLab. In this work, we present
an experimental evaluation of the impact of using subsets of nodes selected with different
strategies on the precision and repeatability of the results obtained. Experiments
using applications with different resource requirements were carried out and are reported.
Results show that the selection strategy based on k-cores reduces the variation on
the results obtained by up to 27% and resulted on an average execution time up to 26%
faster compared to other alternatives. The utilization of subsets of nodes selected with
this strategy can thus contribute to the precision, repeatability and reproducibility of
experiments executed on PlanetLab. This work also presents the integration of the node
selection strategy to the experiment management framework PlanetMon. This integration
is intended to allow PlanetLab users to have access to the node selection tool in a
convenient and transparent way for managing their experiments
Virtual Machine Image Management for Elastic Resource Usage in Grid Computing
Grid Computing has evolved from an academic concept to a powerful paradigm in the area of high performance computing (HPC). Over the last few years, powerful Grid computing solutions were developed that allow the execution of computational tasks on distributed computing resources. Grid computing has recently attracted many commercial customers. To enable commercial customers to be able to execute sensitive data in the Grid, strong security mechanisms must be put in place to secure the customers' data.
In contrast, the development of Cloud Computing, which entered the scene in 2006, was driven by industry: it was designed with respect to security from the beginning. Virtualization technology is used to separate the users e.g., by putting the different users of a system inside a virtual machine, which prevents them from accessing other users' data.
The use of virtualization in the context of Grid computing has been examined early and was found to be a promising approach to counter the security threats that have appeared with commercial customers.
One main part of the work presented in this thesis is the Image Creation Station (ICS), a component which allows users to administer their virtual execution environments (virtual machines) themselves and which is responsible for managing and distributing the virtual machines in the entire system.
In contrast to Cloud computing, which was designed to allow even inexperienced users to execute their computational tasks in the Cloud easily, Grid computing is much more complex to use. The ICS makes it easier to use the Grid by overcoming traditional limitations like installing needed software on the compute nodes that users use to execute the computational tasks. This allows users to bring commercial software to the Grid for the first time, without the need for local administrators to install the software to computing nodes that are accessible by all users. Moreover, the administrative burden is shifted from the local Grid site's administrator to the users or experienced software providers that allow the provision of individually tailored virtual machines to each user. But the ICS is not only responsible for enabling users to manage their virtual machines themselves, it also ensures that the virtual machines are available on every site that is part of the distributed Grid system.
A second aspect of the presented solution focuses on the elasticity of the system by automatically acquiring free external resources depending on the system's current workload. In contrast to existing systems, the presented approach allows the system's administrator to add or remove resource sets during runtime without needing to restart the entire system. Moreover, the presented solution allows users to not only use existing Grid resources but allows them to scale out to Cloud resources and use these resources on-demand. By ensuring that unused resources are shut down as soon as possible, the computational costs of a given task are minimized. In addition, the presented solution allows each user to specify which resources can be used to execute a particular job. This is useful when a job processes sensitive data e.g., that is not allowed to leave the company. To obtain a comparable function in today's systems, a user must submit her computational task to a particular resource set, losing the ability to automatically schedule if more than one set of resources can be used.
In addition, the proposed solution prioritizes each set of resources by taking different metrics into account (e.g. the level of trust or computational costs) and tries to schedule the job to resources with the highest priority first. It is notable that the priority often mimics the physical distance from the resources to the user: a locally available Cluster usually has a higher priority due to the high level of trust and the computational costs, that are usually lower than the costs of using Cloud resources. Therefore, this scheduling strategy minimizes the costs of job execution by improving security at the same time since data is not necessarily transferred to remote resources and the probability of attacks by malicious external users is minimized.
Bringing both components together results in a system that adapts automatically to the current workload by using external (e.g., Cloud) resources together with existing locally available resources or Grid sites and provides individually tailored virtual execution environments to the system's users
Leveraging business workflows in distributed systems research for the orchestration of reproducible and scalable experiments
National audienceWhile rapid research on distributed systems is observed, experiments in this field are often difficult to design, describe, conduct and reproduce. By overcoming these difficulties the research could be further stimulated and add more credibility to results in distributed systems research. The key factors responsible for this situation are technical (software bugs and hardware errors), methodological (incorrect practices), as well as social (reluctance to share work). In this paper, the existing approaches for the management of experiments on distributed systems are described and a novel approach using business process management (BPM) is presented to address their shortcomings. Then, the questions arising when such approach is taken, are addressed. We show that it can be a better alternative to the traditional way of performing experiments as it encourages better scientific practices and results in more valuable research and publications. Finally, a plan of our future work is outlined and other applications of this work are discussed.Malgré une activité de recherche sur les systèmes distribués très importante et très active, les expériences dans ce domaine sont souvent difficiles à concevoir, décrire, mener et reproduire. Surmonter ces difficultés pourrait permettre à ce domaine d'être encore plus stimulé, et aux résultats de gagner en crédibilité, à la fois dans le domaine des systèmes distribués. Les facteurs principaux responsables de cette situation sont techniques (bugs logiciels, problèmes matériels), méthodologiques (mauvaises pratiques), et sociaux (réticence à partager son travail). Dans cet article, les approches existantes pour la description et la conduite d'expériences sur les systèmes distribués sont décrites, et une nouvelle approche, utilisant le \textsl{Business Process Management (BPM)}, est présentée pour répondre à leurs limitations. Puis diverses questions se posant lors de l'utilisation d'une telle approche sont discutées. Nous montrons que cette approche peut être une meilleure alternative à la manière traditionnelle de conduire des expériences, qui encourage de meilleures pratiques scientifiques, et qui résulte en une recherche et des publications de meilleure qualité. Pour finir, notre plan de travail est décrit, et des applications possibles de ce travail dans d'autres domaines sont décrites
Doctor of Philosophy
dissertationNetwork emulation has become an indispensable tool for the conduct of research in networking and distributed systems. It offers more realism than simulation and more control and repeatability than experimentation on a live network. However, emulation testbeds face a number of challenges, most prominently realism and scale. Because emulation allows the creation of arbitrary networks exhibiting a wide range of conditions, there is no guarantee that emulated topologies reflect real networks; the burden of selecting parameters to create a realistic environment is on the experimenter. While there are a number of techniques for measuring the end-to-end properties of real networks, directly importing such properties into an emulation has been a challenge. Similarly, while there exist numerous models for creating realistic network topologies, the lack of addresses on these generated topologies has been a barrier to using them in emulators. Once an experimenter obtains a suitable topology, that topology must be mapped onto the physical resources of the testbed so that it can be instantiated. A number of restrictions make this an interesting problem: testbeds typically have heterogeneous hardware, scarce resources which must be conserved, and bottlenecks that must not be overused. User requests for particular types of nodes or links must also be met. In light of these constraints, the network testbed mapping problem is NP-hard. Though the complexity of the problem increases rapidly with the size of the experimenter's topology and the size of the physical network, the runtime of the mapper must not; long mapping times can hinder the usability of the testbed. This dissertation makes three contributions towards improving realism and scale in emulation testbeds. First, it meets the need for realistic network conditions by creating Flexlab, a hybrid environment that couples an emulation testbed with a live-network testbed, inheriting strengths from each. Second, it attends to the need for realistic topologies by presenting a set of algorithms for automatically annotating generated topologies with realistic IP addresses. Third, it presents a mapper, assign, that is capable of assigning experimenters' requested topologies to testbeds' physical resources in a manner that scales well enough to handle large environments